This book combines insights from language assessment literacy and critical language testing through critical analyses and research about challenges in language assessment around the world. It investigates problematic practices in language testing which are relevant to language test users such as language program directors, testing centers, and language teachers, as well as teachers-in-training in Graduate Diploma and Master of Arts in Applied Linguistics programs. These issues involve aspects of language testing such as test development, test administration, scoring, and interpretation/use of test results.
Chapters in this volume discuss insights about language testing policy, testing world languages, developing program-level language tests and tests of specific language skills, and language assessment literacy. In addition, this book identifies two needs in language testing for further examination: the need for collaboration between language test developers, language test users, and language users, and the need to base language tests on real-world language use.
Author(s): Betty Lanteigne, Christine Coombe, James Dean Brown
Edition: 1
Publisher: Springer
Year: 2021
Language: English
Pages: 576
Foreword
Acknowledgments
Contents
Editors and Contributors
1 Introducing Challenges in Language Testing Around the World
1.1 Why There Is a Need for This Book
1.2 Audience for the Book
1.3 Structure of the Chapters
1.4 Themes Addressed Within the Volume
1.5 Entering a World of Challenges
References
Part ILearning from Language Test Interpretation Problems, Negative Effects, or Misuse
2 Problems Caused by Ignoring Descriptive Statistics in Language Testing
2.1 Introduction: Purpose and Testing Context
2.1.1 Descriptive and Item Analysis Statistics
2.1.2 Reliability and Validity Statistics
2.2 Testing Problems Encountered
2.2.1 Not Including Descriptive Statistics in Thinking About and Interpreting Other Testing Statistics
2.2.2 Forgetting that All Testing Statistics Are for Scores Based on Performances of a Certain Group of Examinees Under Certain Conditions
2.3 Insights Gained
2.4 Solution/Resolution of the Problem
2.5 Conclusion: Implications for Test Users
References
3 Disregarding Data Due Diligence Versus Checking and Communicating Parametric Statistical Testing Procedure Assumptions
3.1 Introduction: Purpose and Testing Context
3.2 Testing Problem Encountered
3.3 Solution of the Problem
3.4 Insights Gained
3.4.1 Descriptive Statistics: Arithmetic Mean
3.4.2 Descriptive Statistics: Standard Deviation
3.4.3 Descriptive Statistics: Pearson’s Product-Moment Correlation Coefficient
3.4.4 Inferential Statistics: One-Way Independent Analysis of Variance
3.4.5 Inferential Statistics: Independent T-Test
3.5 Conclusion: Implications for Test Users
References
4 Washback of the Reformed College English Test Band 4 (CET-4) in English Learning and Teaching in China, and Possible Solutions
4.1 Introduction: Purpose and the Testing Context of the CET-4
4.2 Testing Problems Encountered
4.3 Solution/Resolution of the Problems
4.4 Insights Gained
4.5 Conclusion: Implications for Test Users
Appendix 4.1: Summary of the Key Studies on the Washback of the Reformed CET-4
References
5 Fairness in College Entrance Exams in Japan and the Planned Use of External Tests in English
5.1 Introduction
5.2 Testing Problem Encountered
5.3 Unsolved Problems
5.4 Insights Gained
5.5 Conclusion: Implications for Test Users
References
6 (Mis)Use of High-Stakes Standardized Tests for Multiple Purposes in Canada? A Call for an Evidence-Based Approach to Language Testing and Realignment of Instruction
6.1 Introduction: Purpose and Testing Context
6.2 Testing Problems Encountered
6.2.1 Test Use
6.2.2 Test Fairness and Justice
6.2.3 Test Consequences
6.3 Solutions to the Problems
6.4 Insights Gained
6.5 Conclusion: Implications for Test Users
References
7 Testing in ESP: Approaches and Challenges in Aviation and Maritime English
7.1 Introduction: Purpose and Testing Context
7.2 Testing Problems Encountered
7.2.1 Validity, Reliability and Quality—How Global Are the Standards?
7.2.2 Test Takers and Test Developers—the Great Disconnect
7.2.3 A Non-Collaborative Approach to ESP Testing—More Than just Language
7.3 Solution/Resolution of the Problem
7.3.1 A Collaborative Effort—Communication in a Very Specific Domain
7.4 Insights Gained
7.5 Conclusion: Implications for Test Users
References
8 A Conceptual Framework on the Power of Language Tests as Social Practice
8.1 Introduction: Purpose and Testing Context
8.2 Testing Problem Encountered
8.3 Review of the Literature
8.4 Methodology
8.5 Findings: Key Concepts Generating the Power of Tests
8.5.1 The Roles of Testers
8.5.2 The Meaning of Tests in Public
8.5.3 Feelings and Meaning a Test Evokes in Test Takers
8.5.4 The Functions of Tests
8.6 Insights Gained: A Framework on the Power of Tests as Social Practice
8.7 Conclusion: Implications for Test Users
References
9 The Washback Effect of the Vietnam Six-Levels of Foreign Language Proficiency Framework (KNLNNVN): The Case of the English Proficiency Graduation Benchmark in Vietnam
9.1 Introduction: Purpose and Testing Context
9.2 Testing Problem Encountered
9.3 Review of Literature
9.3.1 Defining Washback and Language Proficiency Framework
9.3.2 Previous Washback Studies
9.4 Methodology
9.4.1 Sample
9.4.2 Instruments
9.4.3 Analysis Procedures
9.5 Findings
9.5.1 Findings from Document Analysis
9.5.2 Curriculum and Methods of Assessment
9.5.3 Results from the Questionnaire
9.5.4 Findings from Observations
9.5.5 Findings from Interviews
9.6 Insight(s) Gained
9.7 Conclusion: Practical Implications for Test Users
Appendix 1
Appendix 2
Appendix 3
Materials: Percentages of Observation Time by VISIT 2 (%)
Appendix 4
References
10 Avoiding Scoring Malpractice: Supporting Reliable Scoring of Constructed-Response Items in High-Stakes Exams
10.1 Introduction: Purpose and Testing Context
10.2 Testing Problem Encountered
10.3 Review of Literature
10.4 Methodology
10.4.1 Participants
10.4.2 Research Instrument: Listening Test
10.4.3 Scoring Procedures
10.4.4 Methods of Data Analyzes
10.5 Findings
10.5.1 Facility Values
10.5.2 Discrimination and Reliability
10.6 Insights Gained
10.7 Conclusion: Implications for Test Users
References
11 Score Changes with Repetition of Paper Version(s) of the TOEFL in an Arab Gulf State: A Natural Experiment
11.1 Introduction: Purpose and Testing Context
11.2 Testing Problem Encountered
11.3 Review of Literature
11.4 Methodology
11.4.1 Data Collection
11.4.2 The Test Takers
11.4.3 Data Analysis
11.5 Findings
11.5.1 Research Question 1. What Is the Incidence of 1-Time, 2-Time, …, n-Time Test Taking?
11.5.2 Research Question 2. What Are the Patterns of Change in TOEFL Performance for Repeating Examinees by Number of Times Tested?
11.5.3 Research Question 3. What Is the Relationship Between Consecutive Score Changes Greater Than the SEM, and Number of Attempts?
11.5.4 Research Question 4. What Is the Relationship Between Consecutive Score Changes Greater Than the SEM, and Time Interval Between Consecutive Attempts?
11.6 Insights Gained
11.7 Conclusion: Implications for Test Users
References
Part IILearning from Tests of World Languages
12 Whose English(es) Are We Assessing and by Whom?
12.1 Introduction: Purpose and Testing Context
12.2 Testing Problem Encountered
12.2.1 The Case of TOEIC’s Construct Representation
12.2.2 The Case of Native and Non-Native English-Speaking Teachers’ Assessment of University ESL Student’s Oral Presentation
12.3 Solution/Resolution of the Problem and Insight Gained
12.4 Conclusion
References
13 Challenges in Developing Standardized Tests for Arabic Reading Comprehension for Secondary Education in the Netherlands
13.1 Introduction: Purpose and Testing Context
13.1.1 Backgrounds of the Position of Arabic in the Dutch Educational Context
13.1.2 The Final Exams for Arabic in Secondary Education
13.2 Testing Problem Encountered: How to Operationalize Test Specifications for Arabic?
13.3 Solution/Resolution of the Problem
13.3.1 Reading Literacy Within the Context of Exams for Arabic
13.3.2 Selection of Arabic Texts
13.3.3 CEFR Levels of the Exams for Arabic
13.4 Insights Gained
13.5 Conclusion: Implications for Test Users
References
14 The Conflict and Consequences of Two Assessment Measures in Israel: Global PISA vs. the National MEITZAV
14.1 Introduction: Purpose and Testing Context
14.1.1 Purpose
14.1.2 The Testing Context
14.2 Testing Problem Encountered
14.2.1 The MEITZAV
14.2.2 The PISA
14.3 Resolutions of the Problem
14.4 Insights Gained
14.5 Conclusion: Implications for Test Users
References
15 How to Challenge Prejudice in Assessing the Productive Skills of Speakers of Closely Related Languages (the Case of Slovenia)
15.1 Introduction: Purpose and Testing Context
15.1.1 Purpose
15.1.2 Historical and Linguistic Context
15.1.3 Testing Context
15.2 Testing Problem Encountered
15.3 Review of Literature
15.4 Methodology
15.5 Findings
15.5.1 Rating Analysis
15.5.2 Attitudes Toward Rating Criteria
15.6 Insights Gained
15.7 Conclusion: Implications for Test Users
References
Part IIILearning from Program-Level Language Tests
16 EFL Placement Testing in Japan
16.1 Introduction: Purpose and Context
16.1.1 Testing Context
16.2 Testing Problem Encountered
16.2.1 Filiopietism
16.3 Solution/Resolution of the Problem
16.4 Insights Gained
16.5 Conclusion: Implications for Test Users
References
17 TEFL Test Practices at a Ukrainian University: Summative Test Design Through Teacher Collaboration
17.1 Introduction: Purpose and Testing Context
17.2 Testing Problems Encountered
17.3 Resolution of the Problems
17.4 Insights Gained
17.5 Conclusion: Implications for Test Users
Appendix 17.1a
Appendix 17.1b
References
18 Designing a Multilingual Large-Scale Placement Test with a Formative Perspective: A Case Study at the University of Grenoble Alpes
18.1 Introduction: Purpose and Testing Context
18.1.1 SELF Conceptual Foundations
18.2 Testing Problems Encountered: Communicative Constructs and Standardized Language Tests
18.3 Solution of the Problem: The Testing Cycle for a Good Culture in Evaluation
18.3.1 SELF: An in-House Conception of a Multi-Task Platform
18.4 Insights Gained: Looking Back at Process and Choice
18.5 Conclusion: Implications for Test Users
References
19 The Relationship Between English Placement Assessments and an Institution: From Challenge to Innovation for an Intensive English Program in the USA
19.1 Introduction: Purpose and Testing Context
19.2 Testing Problem Encountered
19.2.1 Pathway Programming Challenge
19.2.2 Assessment NEED
19.3 Solution to the Problem
19.3.1 Adoption of AFL
19.3.2 Assessment Descriptors
19.4 Insights Gained
19.5 Conclusion: Implications for Test Users
References
20 Placement Decisions in Private Language Schools in Iran
20.1 Introduction: Purpose and Testing Context
20.2 Testing Problem Encountered
20.3 Review of Literature
20.4 Methodology
20.4.1 Participants
20.4.2 Instrumentation and Data Collection
20.5 Findings
20.5.1 Test Considerations in Making Placement Decisions
20.5.2 Learners’ Characteristics in Placement Decisions
20.5.3 Institutional Considerations in Placing Students
20.5.4 Power Issues in Making Placement Decisions
20.5.5 User Considerations in Placement Decision Making
20.6 Insights Gained
20.7 Conclusion: Implications for Test Users
References
21 Perceptions of (Un)Successful PET Results at a Private University in Mexico
21.1 Introduction: Purpose and Testing Context
21.2 Testing Problem Encountered
21.3 Review of the Literature
21.3.1 Cambridge English: Preliminary
21.3.2 Validity
21.3.3 Reliability
21.3.4 Standard Error of Measurement (SEM)
21.4 Methodology
21.5 Findings
21.6 Insights Gained
21.7 Conclusion: Implications for Test Users
References
Part IVLearning from Tests of Language Skills
22 Completing the Triangle of Reading Fluency Assessment: Accuracy, Speed, and Prosody
22.1 Introduction: Purpose and Testing Context
22.2 Testing Problem Encountered
22.2.1 Mismatch Between the Definition and Assessment of Reading Fluency
22.2.2 Negative Consequences of Using WCPM as an Isolated Measure of Reading Fluency
22.3 Solution/Resolution of the Problem
22.3.1 Aligning Assessment with the Definition of Reading Fluency
22.4 Insights Gained
22.5 Conclusion: Implications for Test Users
References
23 (Re)Creating Listening Source Texts for a High-Stakes Standardized English Test at a Vietnamese University: Abandoning the Search in Vain
23.1 Introduction: Purpose and Testing Context
23.2 Testing Problem Encountered
23.3 Resolution of the Problem
23.4 Insight Gained: Listening Source Texts as a Hybrid Genre
23.5 Conclusion: Implications for Test Users
References
24 The Oral Standardized English Proficiency Test: Opportunities Provided and Challenges Overcome in an Egyptian Context
24.1 Introduction: Purpose and Testing Context
24.2 Testing Problem Encountered
24.3 Solution of the Problem
24.4 Insights Gained
24.5 Conclusion: Implications for Test Users
References
25 Opening the Black Box: Exploring Automated Speaking Evaluation
25.1 Introduction: Purpose and Testing Context
25.2 Testing Problem Encountered
25.3 Solution/Resolution of the Problem
25.4 Issues and Challenges
25.4.1 Performance of the Speech Recognizer
25.4.2 Task Types and Scoring Features
25.4.3 Test Impact
25.5 Insights Gained
25.6 Conclusion: Implications for Test Users
References
26 Developing a Meaningful Measure of L2 Reading Comprehension for Graduate Programs at a USA Research University: The Role of Primary Stakeholders’ Understanding of the Construct
26.1 Introduction: Purpose and Testing Context
26.2 Testing Problem Encountered
26.3 Solution/Resolution of the Problem
26.3.1 ARCA Design and Implementation
26.3.2 Training Stakeholders in Test Construct
26.3.3 Methodology and Process
26.4 Insights Gained
26.4.1 Factors that Affected Stakeholder Buy-in
26.4.2 Opinions on the Clarity of the Training
26.4.3 Stakeholders’ Recollections of Departmental Discussions
26.4.4 Remaining Issues
26.5 Conclusion: Implications for Test Users
26.5.1 Gaining Buy-in from Stakeholders When Implementing Changes in Testing Practices
References
27 Challenging the Role of Rubrics: Perspectives from a Private University in Lebanon
27.1 Introduction: Purpose and Testing Context
27.2 Testing Problem Encountered
27.3 Review of Literature
27.4 Methodology
27.4.1 Design
27.4.2 Participants
27.4.3 Instruments and Procedures
27.5 Findings
27.6 Insights Gained
27.7 Conclusion: Implications for Test Users
References
28 A Mixed-Methods Approach to Study the Effects of Rater Training on the Scoring Validity of Local University High-Stakes Writing Tests in Spain
28.1 Introduction: Purpose and Testing Context
28.2 Testing Problem Encountered
28.3 Review of Literature
28.4 Methodology
28.4.1 Design
28.4.2 Participants
28.4.3 Instruments and Procedures
28.4.4 Data Collection
28.4.5 Data Analysis
28.5 Findings
28.5.1 How Do CertAcles Rater Training Modules for Writing Tasks Affect the Raters’ Recollection of the Rating Process?
28.5.2 To What Degree Do Rater Training Modules for the CertAcles Writing Paper Affect Inter-Rater Reliability, Rater Severity, and Consistency?
28.5.3 Research Question 3. To What Degree Do Rater Training Modules for the CertAcles Writing Paper Affect How Raters Apply the CertAcles Rating Scale?
28.6 Insights Gained
28.6.1 Insights Gained from a Reliability Standpoint
28.6.2 Insights Gained from a Validity Standpoint
28.7 Conclusion: Implications for Test Users
References
Part VLearning from Tests, Teachers, and Language Assessment Literacy
29 A Critical Evaluation of the Language Assessment Literacy of Turkish EFL Teachers: Suggestions for Policy Directions
29.1 Introduction: Purpose and Testing Context
29.2 Testing Problem Encountered
29.3 Solution of the Problem
29.4 Insights Gained
29.5 Conclusion: Implications for Test Users
References
30 Some Practical Consequences of Quality Issues in CEFR Translations: The Case of Arabic
30.1 Introduction: Purpose and Testing Context
30.2 Testing Problem Encountered
30.2.1 Central Terminology
30.2.2 Level Designations: Waystage
30.2.3 Style
30.3 Resolution of the Problem
30.4 Insights Gained
30.5 Conclusion: Implications for Test Users
References
31 Assessment Literacy and Assessment Practices of Teachers of English in a South-Asian Context: Issues and Possible Washback
31.1 Introduction: Purpose and Testing Context
31.2 Testing Problem Encountered
31.3 Review of the Literature
31.4 Methodology
31.5 Findings
31.5.1 Problems Faced by Teachers
31.5.2 Practices in Setting and Scoring Tests
31.5.3 Assessment Literacy of Teachers as Revealed by the Tests Constructed by the Selected Sample
31.6 Insights Gained
31.7 Conclusion: Implications for Test Users
Appendix 31.1
Appendix 31.2
Appendix 31.3
References
32 English Language Testing Practices at the Secondary Level: A Case Study from Bangladesh
32.1 Introduction: Purpose and Testing Context
32.2 Testing Problem Encountered
32.3 Review of the Literature
32.3.1 The Guiding Principles of Good Tests
32.3.2 Summative and Formative Assessment
32.3.3 Washback and Test Impact
32.4 Methodology
32.4.1 Data Collection
32.4.2 Data Analysis
32.5 Findings
32.5.1 Formal Assessment
32.5.2 School-Based Assessment of Speaking and Listening
32.5.3 Findings from the Questionnaire, Interview, FGD, Class Observation Data
32.6 Insight(s) Gained
32.7 Conclusion: Implications for Test Users
References
33 A New Model for Assessing Classroom-Based English Language Proficiency in the UAE
33.1 Introduction: Purpose and Testing Context
33.2 Testing Problem Encountered
33.3 Review of Literature
33.3.1 Language Testing
33.3.2 Validity
33.3.3 Language Proficiency
33.3.4 Lexical Diversity
33.3.5 Classroom Interaction
33.4 Methodology
33.4.1 Data Collection Procedures
33.4.2 Data Collection Instruments
33.4.3 Data Analysis Tools
33.5 Findings
33.5.1 The Quantitative Analysis
33.5.2 The Qualitative Analysis
33.6 Insights Gained
33.6.1 Lack of Correspondence
33.6.2 Validity Issues
33.7 Conclusion: Implications for Test Users
References
34 Assessing Teacher Discourse in a Pre-Service Spoken English Proficiency Test in Malta
34.1 Introduction: Purpose and Testing Context
34.2 Testing Problem Encountered
34.3 Solution/Resolution of the Problem
34.4 Insights Gained
34.5 Conclusion: Implications for Test Users
References
35 High-Stakes Test Preparation in Iran: The Interplay of Pedagogy, Test Content, and Context
35.1 Introduction: Purpose and Testing Context
35.2 Testing Problem Encountered
35.3 Review of Literature
35.4 Methodology
35.4.1 Participants
35.4.2 Instruments and Procedure
35.4.3 Data Analysis
35.5 Findings
35.5.1 Test Center: Context and Administration
35.5.2 Teachers and Features of TP Instruction
35.5.3 Students’ Perceptions
35.6 Insights Gained
35.7 Conclusion: Implications for Test Users
Appendix 1
Administrator Interview
Appendix 2
Teacher Interview
Appendix 3
Student Questionnaire (English Version)
Appendix 4
Appendix 5
Student Group Interview
Appendix 6
Appendix 7
References
36 Development of a Profile-Based Writing Scale: How Collaboration with Teachers Enhanced Assessment Practice in a Post-Admission ESL Writing Program at a USA University
36.1 Introduction: Purpose and Testing Context
36.2 Testing Problems Encountered
36.2.1 Restricted Range of Proficiency
36.2.2 Lack of Diagnostic Functions of Writing Placement Tests
36.3 Review of the Literature
36.3.1 Different Approaches to Scale Development
36.3.2 Collaboration with Teachers in Test Development: Challenges and Benefits
36.4 Methodology
36.4.1 Stage 1: Analysis of Existing Materials and Instructor Interviews
36.4.2 Stage 2: Developing the New Scale
36.4.3 Stage 3: Refining the Scale and Training for Its Use
36.5 Findings
36.5.1 Stage 1: Lexico-Grammar vs. Argumentation: Mismatch Between Curricular Emphasis and Actual Writing Performance and Needs
36.5.2 Stage 2: Content vs. Structural Features: Conflicting Ratings Between Teachers and Testers
36.5.3 Stage 3: Solution: Emergence of a Profile-Based Rating Scale
36.6 Insights Gained
36.6.1 The Usefulness of a Profile-Based Scale: Combining Placement and Diagnostic Assessment
36.6.2 Impact of Tester-Teacher Collaboration: Scale Descriptors as a Lingua Franca for Writing Assessment
36.7 Conclusion: Implications for Teachers and Testers
References
Part VIClosing Thoughts
37 Reflecting on Challenges in Language Testing Around the World
37.1 Learning from Challenges
37.2 Connections Throughout the Volume
37.3 Insights Gained and Implications for Test Users
37.4 Future Directions—Suggestions, Recommendations
References