Data Quality and Record Linkage Techniques

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book helps practitioners gain a deeper understanding, at an applied level, of the issues involved in improving data quality through editing, imputation, and record linkage. The first part of the book deals with methods and models. Here, we focus on the Fellegi-Holt edit-imputation model, the Little-Rubin multiple-imputation scheme, and the Fellegi-Sunter record linkage model. Brief examples are included to show how these techniques work.

In the second part of the book, the authors present real-world case studies in which one or more of these techniques are used. They cover a wide variety of application areas. These include mortgage guarantee insurance, medical, biomedical, highway safety, and social insurance as well as the construction of list frames and administrative lists.

Readers will find this book a mixture of practical advice, mathematical rigor, management insight and philosophy. The long list of references at the end of the book enables readers to delve more deeply into the subjects discussed here. The authors also discuss the software that has been developed to apply the techniques described in our text.

Author(s): Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
Edition: 1
Publisher: Springer
Year: 2007

Language: English
Commentary: 96443
Pages: 241
City: New York; London

Data Quality and RecordLinkage Techniques......Page 1
Preface......Page 4
Contents......Page 6
About the Authors......Page 11
Part1 Data Quality: What It Is, Why It Is Important, and How to Achieve It......Page 13
When Are Data of High Quality?......Page 15
Why Care About Data Quality?......Page 18
How Do You Obtain High-Quality Data?......Page 19
Where Are We Now?......Page 21
Data Quality as a Competitive Advantage......Page 25
Data Quality Problems and their Consequences......Page 28
How Many People Really Live to 100 and Beyond?......Page 33
Completeness and Accuracy of a Billing Database: Why It Is Important to the Bottom Line......Page 34
Where Are We Now?......Page 35
Desirable Properties of Databases/Lists......Page 37
Examples of Merging Two or More Lists and the Issues that May Arise......Page 39
Metrics Used when Merging Lists......Page 41
Where Are We Now?......Page 43
Data Elements......Page 45
Requirements Document......Page 46
A Dictionary of Tests......Page 47
Deterministic Tests......Page 48
Exploratory Data Analysis Techniques......Page 52
Practical Tips......Page 54
Where Are We Now?......Page 56
Part2 Specialized Tools for Database Improvement......Page 57
Conditional Independence1......Page 59
Statistical Paradigms......Page 61
Capture--Recapture Procedures and Applications......Page 62
Introduction......Page 69
Early Editing Efforts......Page 71
Fellegi--Holt Model for Editing......Page 72
Practical Tips......Page 73
Imputation......Page 74
Constructing a Unified Edit/Imputation Model......Page 79
Implicit Edits -- A Key Construct of Editing Software......Page 81
Editing Software......Page 83
Is Automatic Editing Taking Up Too Much Time and Money?......Page 86
Tips on Automatic Editing and Imputation......Page 87
Where Are We Now?......Page 88
Introduction......Page 89
Deterministic Record Linkage......Page 90
Probabilistic Record Linkage -- A Frequentist Perspective......Page 91
Probabilistic Record Linkage -- A Bayesian Perspective......Page 99
Where Are We Now?......Page 100
Basic Estimation of Parameters Under Simple Agreement/Disagreement Patterns......Page 101
Parameter Estimates Obtained via Frequency-Based Matching......Page 102
Parameter Estimates Obtained Using Data from Current Files......Page 104
Parameter Estimates Obtained via the EM Algorithm......Page 105
Advantages and Disadvantages of Using the EM Algorithm......Page 109
General Parameter Estimation Using the EM Algorithm......Page 111
Where Are We Now?......Page 114
Standardization and Parsing......Page 115
Obtaining and Understanding Computer Files......Page 117
Standardization of Terms......Page 118
Parsing of Fields......Page 119
Where Are We Now?......Page 122
Soundex System of Names......Page 123
NYSIIS Phonetic Decoder......Page 127
Where Are We Now?......Page 129
Blocking......Page 131
Independence of Blocking Strategies......Page 132
Blocking Variables......Page 133
Using Blocking Strategies to Identify Duplicate List Entries......Page 134
Using Blocking Strategies to Match Records Between Two Sample Surveys......Page 136
Where Are We Now?......Page 138
Jaro String Comparator Metric for Typographical Error......Page 139
Winkler String Comparator Metric for Typographical Error......Page 141
Adjusting the Weights for the Winkler Comparator Metric......Page 142
Where are We Now?......Page 143
Part3 Record Linkage Case Studies......Page 145
Introduction......Page 147
Duplicate Mortgage Records......Page 149
Mortgage Records with an Incorrect Termination Status......Page 153
Estimating the Number of Duplicate Mortgage Records......Page 156
Biomedical and Genetic Research Studies......Page 159
Who goes to a Chiropractor?......Page 161
National Master Patient Index......Page 162
Provider Access to Immunization Register Securely (PAiRS) System......Page 163
Studies Required by the Intermodal Surface Transportation Efficiency Act of 1991......Page 164
Crash Outcome Data Evaluation System......Page 165
Constructing List Frames and Administrative Lists......Page 167
National Address Register of Residences in Canada......Page 168
USDA List Frame of Farms in the United States......Page 170
List Frame Development for the US Census of Agriculture......Page 173
Post-enumeration Studies of US Decennial Census......Page 174
Hidden Multiple Issuance of Social Security Numbers......Page 177
How Social Security Stops Benefit Payments after Death......Page 181
CPS--IRS--SSA Exact Match File......Page 183
Record Linkage and Terrorism......Page 185
Part4 Other Topics......Page 187
Confidentiality: Maximizing Accessto Micro-data while Protecting Privacy......Page 189
Importance of High Quality of Datain the Original File......Page 190
Checking Re-identifiability......Page 191
Elementary Masking Methods and Statistical Agencies......Page 194
Protecting Confidentiality of Medical Data......Page 201
More-Advanced Masking Methods -- Synthetic Datasets......Page 203
Where Are We Now?......Page 206
Government......Page 209
Checklist for Evaluating Record Linkage Software......Page 210
Summary Chapter......Page 217
Scope......Page 221
Structure......Page 222
Bibliography......Page 225
Index......Page 235