This book demonstrates how data quality issues affect all surveys and proposes methods that can be utilised to deal with the observable components of survey error in a statistically sound manner. This book begins by profiling the post-Apartheid period in South Africa's history when the sampling frame and survey methodology for household surveys was undergoing periodic changes due to the changing geopolitical landscape in the country. This book profiles how different components of error had disproportionate magnitudes in different survey years, including coverage error, sampling error, nonresponse error, measurement error, processing error and adjustment error. The parameters of interest concern the earnings distribution, but despite this outcome of interest, the discussion is generalizable to any question in a random sample survey of households or firms. This book then investigates questionnaire design and item nonresponse by building a response propensity model for the employee income question in two South African labour market surveys: the October Household Survey (OHS, 1997-1999) and the Labour Force Survey (LFS, 2000-2003). This time period isolates a period of changing questionnaire design for the income question. Finally, this book is concerned with how to employee income data with a mixture of continuous data, bounded response data and nonresponse. A variable with this mixture of data types is called coarse data. Because the income question consists of two parts -- an initial, exact income question and a bounded income follow-up question -- the resulting statistical distribution of employee income is both continuous and discrete. The book shows researchers how to appropriately deal with coarse income data using multiple imputation. The take-home message from this book is that researchers have a responsibility to treat data quality concerns in a statistically sound manner, rather than making adjustments to public-use data in arbitrary ways, often underpinned by undefensible assumptions about an implicit unobservable loss function in the data. The demonstration of how this can be done provides a replicable concept map with applicable methods that can be utilised in any sample survey.
Author(s): Reza Che Daniels
Edition: 1
Publisher: Springer
Year: 2022
Language: English
Commentary: TruePDF
Pages: 128
Tags: Statistical Theory And Methods; Survey Methodology; Data Analysis And Big Data; Methodology Of Data Collection And Processing; African Economics; African History
Preface
Acknowledgements
Contents
About the Author
List of Figures
List of Tables
1 Introduction
1.1 The Income Construct in Household Surveys
1.2 Objectives and Chapter Typology
References
2 A Framework for Investigating Microdata Quality, with Application to South African Labour Market Household Surveys
2.1 Introduction
2.2 Framing the Discourse on Data Quality
2.2.1 Data Quality Elements in the Data Production Process
2.2.2 The Total Survey Error (TSE) Framework
2.3 The Interaction Between TSE and Data Quality
2.3.1 Validity of the Construct of Interest
2.3.2 Measurement Error
2.3.3 Processing Error
2.3.4 Coverage Error
2.3.5 Sampling Error
2.3.6 Nonresponse Error
2.3.7 Adjustment Error
2.4 Data Quality and Survey Errors in Statistics South Africa Household Surveys
2.4.1 Representation of the Population of Interest
2.4.2 Measurement of the Construct of Interest
2.5 Discussion
2.6 Conclusion
References
3 Questionnaire Design and Response Propensities for Labour Income Microdata
3.1 Introduction
3.2 Questionnaire Design and the Income Question
3.2.1 The Response Process and the Cognitive Burden of Answering Income Questions
3.2.2 Different Types of Income Questions
3.2.3 Analysing Response Groups in the Income Question
3.2.4 Questionnaire Design Changes in SA Labour Market Household Surveys
3.3 Methodology
3.3.1 Response Propensity Models for the Employee Income Question
3.3.2 Questionnaire Design Changes and the Resulting Structure of Income Data in Publicly Released Datasets
3.3.3 Estimation, Specification and Testing
3.4 Results
3.4.1 A Descriptive Analysis of Employee Income Response Type
3.4.2 Sequential Response Propensity Models
3.4.3 Diagnostics of the Sequential Response Models
3.5 Conclusion
References
4 Univariate Multiple Imputation for Coarse Employee Income Data
4.1 Introduction
4.2 Preliminaries
4.2.1 Coarse Income Data
4.2.2 Multiple Imputation
4.3 Setup of the Problem
4.3.1 Data Preparation
4.3.2 The Imputation Algorithm
4.3.3 Estimation and Inference from Multiply Imputed Data
4.4 Results: Univariate Multiple Imputations for Coarse Income
4.4.1 Quantiles and Moments Across Four Imputation Models
4.4.2 The Distribution of Multiply Imputed Bounded Income Values
4.4.3 The Distribution of Multiply Imputed Missing Income Values
4.4.4 The Distribution of Multiply Imputed Refusals and Don't Know Income Values
4.4.5 Unspecified Responses as a Source of Error
4.4.6 Stability of Parameter Estimates as the Number of Multiple Imputations Increase
4.5 Conclusion
References
5 Conclusion: How Data Quality Affects Our Understanding of the Earnings Distribution