Corpus Linguistics for Writing Development provides a practical introduction to using corpora in the study of first and second language learners’ written language over time and across different levels of proficiency. Focusing on development in the use of vocabulary, formulaic language, and grammar, this book
• discusses how corpus research can contribute to our understanding of writing development and to pedagogical practice;
• reviews a range of corpus techniques for studying writing development from the perspectives of vocabulary, grammar, and formulaic language and interrogates the methodological bases of those techniques; and
• guides readers to perform practical analyses of learner writing using the R open-source programming language.
Aimed at the novice researcher, this book will be key reading for advanced undergraduate and postgraduate students in the fields of education, language, and linguistics. It will be of particular interest to those interested in first or second language writing, language assessment, and learner corpus research.
Author(s): Philip Durrant
Series: Routledge Corpus Linguistics Series
Publisher: Routledge
Year: 2022
Language: English
Pages: 193
City: London
Cover
Half Title
Series Page
Title Page
Copyright Page
Table of Contents
Acknowledgements
Part One: Foundations
Chapter 1: Studying writing development with a corpus
1.1 Introduction
1.2 Using a corpus to study writing development
1.3 How does writing development relate to vocabulary, grammar, formulaic language?
1.4 Outline of the book
Note
References
Chapter 2: Learner corpus analysis in practice: Some basics
2.1 Introduction
2.2 Some housekeeping: getting your computer ready
2.3 Getting to know R and RStudio
2.3.1 Introduction: why learn R?
2.3.2 Entering commands: the Console and Scripts
2.3.3 Functions
2.3.4 Vectors
2.3.5 Getting help
2.4 Some fundamentals of corpus research: encoding, markup, annotation, and metadata
2.5 Corpora used in this book
2.6 Automatically annotating your corpus for part of speech and syntactic relationships
2.6.1 Introduction
2.6.2 Make sure you have the required software
2.6.3 Prepare the corpus for parsing
2.6.4 Make a list of the files you want to process
2.6.5 Run the CoreNLP pipeline
2.7 Conclusion
2.8 Taking it further
Notes
References
Part Two: Studying vocabulary in writing development
Chapter 3: Understanding vocabulary in learner writing
3.1 Introduction
3.2 Theorizing development in vocabulary
3.2.1 Introduction
3.2.2 Breadth, depth, and fluency
3.2.3 Aspects of word knowledge
3.3 Measures of vocabulary development
3.3.1 Introduction
3.3.2 Lexical diversity
3.3.3 Lexical sophistication
3.3.3.1 Word length
3.3.3.2 Word frequency
3.3.3.3 Register-based measures
3.3.3.4 Contextual distinctiveness
3.3.3.5 Semantic measures
3.3.3.6 Psycholinguistic measures
3.4 Complicating factors
3.4.1 Introduction
3.4.2 What is a ‘word’?
3.4.2.1 Defining words
3.4.2.2 Defining word tokens
3.4.2.3 Defining word types
3.4.3 Choosing a suitable reference corpus
3.4.4 Relationships between measures of diversity and sophistication
3.4.5 Vocabulary knowledge depth
3.5 Conclusion
3.6 Taking it further
Notes
References
Chapter 4: Vocabulary research in practice: Diversity and academic vocabulary
4.1 Introduction
4.2 Measuring vocabulary diversity
4.2.1 Getting the metadata and corpus filenames
4.2.2 Generating CTTR scores
4.2.3 Recording the results
4.2.4 Analyzing vocabulary diversity
4.3 Studying academic vocabulary
4.3.1 Preparing the list of academic vocabulary
4.3.2 Converting the parsed corpus to an easier-to-use format
4.3.3 Identifying AVL words in the learner corpus
4.3.4 Visualizing variation in measures
4.3.5 Investigating the patterns
4.4 Conclusion
Notes
References
Part Three: Studying grammar in writing development
Chapter 5: Understanding grammar in learner writing
5.1 Introduction
5.2 Studying development through grammar
5.2.1 Models of grammar
5.2.2 Selecting and interpreting grammatical features
5.3 Approaches to grammatical development
5.3.1 Varieties of grammatical approaches
5.3.2 Development in grammatical complexity
5.3.3 Multi-dimensional analysis
5.3.4 Usage-based models of development
5.4 Conclusion
5.5 Taking it further
Notes
References
Chapter 6: Grammar research in practice: Evaluating parser accuracy
6.1 Introduction
6.2 Reading a parsed corpus
6.3 Accuracy evaluation and fixtagging: an introduction
6.4 Accuracy evaluation and fixtagging: a worked example
6.4.1 Hand-annotating a sample of texts
6.4.2 Getting metadata and filenames
6.4.3 Identifying and counting adjectives
6.4.4 Identifying true positives, false positives, and false negatives
6.4.5 Calculating precision and recall
6.4.6 Identifying matches and differences in hand vs. computer parses
6.4.7 Identifying and fixing parsing errors
6.5 Tracing development in a grammatical feature
6.5.1 Counting a feature in texts
6.5.2 Visualizing variation across learner groups
6.6 Conclusion
Notes
References
Part Four: Studying formulaic language in writing development
Chapter 7: Understanding formulaic language in learner writing
7.1 Introduction
7.2 Defining formulaic language
7.3 How can we study formulaic language in a corpus?
7.3.1 A frequency-based approach to studying formulaic language
7.3.2 Lexical bundles
7.4 Collocations
7.5 Conclusion
7.6 Taking it further
References
Chapter 8: Formulaic language research in practice: Academic collocations
8.1 Introduction
8.2 Identifying collocations in a reference corpus
8.2.1 Editing the parsed corpus
8.2.2 Identifying lemmas and verb + noun combinations
8.2.3 Identifying collocations
8.3 Quantifying the use of academic collocations across learner groups
8.3.1 Preparing the learner corpus
8.3.2 Identifying academic collocations in the learner corpus
8.3.3 Understanding the use of academic collocations across levels
8.4 Conclusion
Note
References
Index
Index of R functions and concepts