Data Science for Public Policy (Springer Series in the Data Sciences)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This textbook presents the essential tools and core concepts of data science to public officials, policy analysts, and economists among others in order to further their application in the public sector. An expansion of the quantitative economics frameworks presented in policy and business schools, this book emphasizes the process of asking relevant questions to inform public policy. Its techniques and approaches emphasize data-driven practices, beginning with the basic programming paradigms that occupy the majority of an analyst’s time and advancing to the practical applications of statistical learning and machine learning. The text considers two divergent, competing perspectives to support its applications, incorporating techniques from both causal inference and prediction. Additionally, the book includes open-sourced data as well as live code, written in R and presented in notebook form, which readers can use and modify to practice working with data.

Author(s): Jeffrey C. Chen, Edward A. Rubin, Gary J. Cornwall
Publisher: Springer
Year: 2021

Language: English
Pages: 377

Preface
Contents
1 An Introduction
1.1 Why we wrote this book
1.2 What we assume
1.3 How this book is structured
2 The Case for Programming
2.1 Doing visual analytics since the 1780s
2.2 How does programming work?
2.3 Setting up R and RStudio
2.3.1 Installing R
2.3.2 Installing RStudio
2.3.3 DIY: Running your first code snippet
2.4 Making the case for open-source software
3 Elements of Programming
3.1 Data are everywhere
3.2 Data types
3.2.1 numeric
3.2.2 character
3.2.3 logical
3.2.4 factor
3.2.5 date
3.2.6 The class function
3.3 Objects in R
3.4 R's object classes
3.4.1 vector
3.4.2 matrix
3.4.3 data.frame
3.4.4 list
3.4.5 The class function, v2
3.4.6 More classes
3.5 Packages
3.5.1 Base R and the need to extend functionality
3.5.2 Installing packages
3.5.3 Loading packages
3.5.4 Package management and pacman
3.6 Data input/output
3.6.1 Directories
3.6.2 Load functions
3.6.3 Datasets
3.7 Finding help
3.7.1 Help function
3.7.2 Google and online communities
3.8 Beyond this chapter
3.8.1 Best practices
3.8.2 Further study
3.9 DIY: Loading solar energy data from the web
4 Transforming Data
4.1 Importing and assembling data
4.1.1 Loading files
4.2 Manipulating values
4.2.1 Text manipulation functions
4.2.2 Regular Expressions (RegEx)
4.2.3 DIY: Working with PII
4.2.4 Working with dates
4.3 The structure of data
4.3.1 Matrix or data frame?
4.3.2 Array indexes
4.3.3 Subsetting
4.3.4 Sorting and re-ordering
4.3.5 Aggregating data
4.3.6 Reshaping data
4.4 Control structures
4.4.1 If statement
4.4.2 For-loops
4.4.3 While
4.5 Functions
4.6 Beyond this chapter
4.6.1 Best practices
4.6.2 Further study
5 Record Linkage
5.1 Edward Kennedy, Bill de Blasio, and Bayerische Motoren Werke
5.2 How does record linkage work?
5.3 Pre-processing the data
5.4 De-duplication
5.5 Deterministic record linkage
5.6 Comparison functions
5.6.1 Edit distances
5.6.2 Phonetic algorithms
5.6.3 New tricks, same heuristics
5.7 Probabilistic record linkage
5.8 Data privacy
5.9 DIY: Matching people in the UK-UN sanction lists
5.10 Beyond this chapter
5.10.1 Best practices
5.10.2 Further study
6 Exploratory Data Analysis
6.1 Visually detecting patterns
6.2 The gist of EDA
6.3 Visualizing distributions
6.3.1 Skewed variables
6.4 Exploring missing values
6.4.1 Encodings
6.4.2 Missing value functions
6.4.3 Exploring missingness
6.4.4 Treating missingness
6.5 Analyzing time series
6.6 Finding visual correlations
6.6.1 Visual analysis on high-dimensional datasets
6.7 Beyond this chapter
7 Regression Analysis
7.1 Measuring and predicting the preferences of society
7.2 Simple linear regression
7.2.1 Mean squared error
7.2.2 Ordinary least squares
7.2.3 DIY: A simple hedonic model
7.3 Checking for linearity
7.4 Multiple regression
7.4.1 Non-linearities
7.4.2 Discrete variables
7.4.3 Discontinuities
7.4.4 Measures of model fitness
7.4.5 DIY: Choosing between models
7.4.6 DIY: Housing prices over time
7.5 Beyond this chapter
8 Framing Classification
8.1 Playing with fire
8.1.1 FireCast
8.1.2 What's a classifier?
8.2 The basics of classifiers
8.2.1 The anatomy of a classifier
8.2.2 Finding signal in classification contexts
8.2.3 Measuring accuracy
8.3 Logistic regression
8.3.1 The social science workhorse
8.3.2 Telling the story from coefficients
8.3.3 How are coefficients learned?
8.3.4 In practice
8.3.5 DIY: Expanding health care coverage
8.4 Regularized regression
8.4.1 From regularization to interpretation
8.4.2 DIY: Re-visiting health care coverage
8.5 Beyond this chapter
9 Three Quantitative Perspectives
9.1 Descriptive analysis
9.2 Causal inference
9.2.1 Potential outcomes framework
9.2.2 Regression discontinuity
9.2.3 Difference-in-differences
9.3 Prediction
9.3.1 Understanding accuracy
9.3.2 Model validation
9.4 Beyond this chapter
10 Prediction
10.1 The role of algorithms
10.2 Data science pipelines
10.3 K-Nearest Neighbors (k-NN)
10.3.1 Under the hood
10.3.2 DIY: Predicting the extent of storm damage
10.4 Tree-based learning
10.4.1 Classification and Regression Trees (CART)
10.4.2 Random forests
10.4.3 In practice
10.4.4 DIY: Wage prediction with CART and random forests
10.5 An introduction to other algorithms
10.5.1 Gradient boosting
10.5.2 Neural networks
10.6 Beyond this chapter
11 Cluster Analysis
11.1 Things closer together are more related
11.2 Foundational concepts
11.3 k-means
11.3.1 Under the hood
11.3.2 In Practice
11.3.3 DIY: Clustering for economic development
11.4 Hierarchical clustering
11.4.1 Under the hood
11.4.2 In Practice
11.4.3 DIY: Clustering time series
11.5 Beyond this chapter
12 Spatial Data
12.1 Anticipating climate impacts
12.2 Classes of spatial data
12.3 Rasters
12.3.1 Raster files
12.3.2 Rasters and math
12.3.3 DIY: Working with raster math
12.4 Vectors
12.4.1 Vector files
12.4.2 Converting points to spatial objects
12.4.3 Coordinate Reference Systems
12.4.4 DIY: Converting coordinates into point vectors
12.4.5 Reading shapefiles
12.4.6 Spatial joins
12.4.7 DIY: Analyzing spatial relationships
12.5 Beyond this chapter
13 Natural Language
13.1 Transforming text into data
13.1.1 Processing textual data
13.1.2 TF-IDF
13.1.3 Document similarities
13.1.4 DIY: Basic text processing
13.2 Sentiment Analysis
13.2.1 Sentiment lexicons
13.2.2 Calculating sentiment scores
13.2.3 DIY: Scoring text for sentiment
13.3 Topic modeling
13.3.1 A conceptual base
13.3.2 How do topics models work?
13.3.3 DIY: Finding topics in presidential speeches
13.4 Beyond this chapter
13.4.1 Best practices
13.4.2 Further study
14 The Ethics of Data Science
14.1 An emerging debate
14.2 Bias
14.2.1 Sampling bias
14.2.2 Measurement bias
14.2.3 Prejudicial bias
14.3 Fairness
14.3.1 Score-based fairness
14.3.2 Accuracy-based fairness
14.3.3 Other considerations
14.4 Transparency and Interpretability
14.4.1 Interpretability
14.4.2 Explainability
14.5 Privacy
14.5.1 An evolving landscape
14.5.2 Privacy strategies
14.6 Beyond this chapter
15 Developing Data Products
15.1 Meeting people where they are
15.2 Designing for impact
15.2.1 Identify a user need
15.2.2 Size up the situation
15.2.3 Build a lean ``V1''
15.2.4 Test and evaluate its impact, then iterate
15.3 Communicating data science projects
15.3.1 Presentations
15.3.2 Written reports
15.4 Reporting dashboards
15.5 Prediction products
15.5.1 Prioritization and targeting lists
15.5.2 Scoring engines
15.6 Continuing to hone your craft
15.7 Where to next?
16 Building Data Teams
16.1 Establishing a baseline
16.2 Operating models
16.2.1 Center of excellence
16.2.2 Hack teams
16.2.3 Consultancy
16.2.4 Matrix organizations
16.3 Identifying roles
16.3.1 The manager
16.3.2 Analytics roles
16.3.3 Data product roles
16.3.4 Titles in the civil service system
16.4 The hiring process
16.4.1 Job postings and application review
16.4.2 Interviews
16.5 Final thoughts
Appendix A: Planning a Data Product
Appendix A: Planning a Data Product
Key Questions
Appendix B: Interview Questions
Getting to know the candidate
Business acumen
Project experience
Whiteboard questions
Statistics
Causal inference
Estimation versus prediction
Machine learning
Model evaluation
Communication and visualization
Programming
Take-home questions
References
Index