This book develops survey data analysis tools in Python, to create and analyze cross-tab tables and data visuals, weight data, perform hypothesis tests, and handle special survey questions such as Check-all-that-Apply. In addition, the basics of Bayesian data analysis and its Python implementation are presented. Since surveys are widely used as the primary method to collect data, and ultimately information, on attitudes, interests, and opinions of customers and constituents, these tools are vital for private or public sector policy decisions. As a compact volume, this book uses case studies to illustrate methods of analysis essential for those who work with survey data in either sector. It focuses on two overarching objectives:
1 • Demonstrate how to extract actionable, insightful, and useful information from survey data; and
2 • Introduce Python and Pandas for analyzing survey data.
Author(s): Walter R. Paczkowski
Edition: 1
Publisher: Springer
Year: 2022
Language: English
Commentary: TruePDF
Pages: 365
Tags: Statistics In Business; Management, Economics, Finance, Insurance, Data And Information Visualization; Marketing; Statistics And Computing; Data Science
Preface
Why Surveys?
Why Python?
Preliminaries for Getting Started
The Book's Structure
Contents
List of Figures
List of Tables
1 Introduction to Modern Survey Analytics
Contents
1.1 Information and Survey Data
1.2 Demystifying Surveys
1.2.1 Survey Objectives
1.2.2 Target Audience and Sample Size
1.2.2.1 Key Parameters to Estimate
1.2.2.2 Sample Design to Use
1.2.2.3 Population Size
1.2.2.4 Alpha
1.2.2.5 Margin of Error
1.2.2.6 Additional Information
1.2.3 Screener and Questionnaire Design
1.2.4 Fielding the Study
1.2.5 Data Analysis
1.2.6 Report Writing and Presentation
1.3 Sample Representativeness
1.3.1 Digression on Indicator Variables
1.3.2 Calculating the Population Parameters
1.4 Estimating Population Parameters
1.5 Case Studies
1.5.1 Consumer Study: Yogurt Consumption
1.5.2 Public Sector Study: VA Benefits Survey
1.5.3 Public Opinion Study: Toronto Casino Opinion Survey
1.5.4 Public Opinion Study: San Francisco Airport Customer Satisfaction Survey
1.6 Why Use Python for Survey Data Analysis?
1.7 Why Use Jupyter for Survey Data Analysis?
2 First Step: Working with Survey Data
Contents
2.1 Best Practices: First Steps to Analysis
2.1.1 Installing and Importing Python Packages
2.1.2 Organizing Routinely Used Packages, Functions, and Formats
2.1.3 Defining Data Paths and File Names
2.1.4 Defining Your Functions and Formatting Statements
2.1.5 Documenting Your Data with a Dictionary
2.2 Importing Your Data with Pandas
2.3 Handling Missing Values
2.3.1 Identifying Missing Values
2.3.2 Reporting Missing Values
2.3.3 Reasons for Missing Values
2.3.4 Dealing with Missing Values
2.3.4.1 Use the fillna( ) Method
2.3.4.2 Use the Interpolation( ) Method
2.3.4.3 An Even More Sophisticated Method
2.4 Handling Special Types of Survey Data
2.4.1 CATA Questions
2.4.1.1 Multiple Responses
2.4.1.2 Multiple Responses by ID
2.4.1.3 Multiple Responses Delimited
2.4.1.4 Indicator Variable
2.4.1.5 Frequencies
2.4.2 Categorical Questions
2.5 Creating New Variables, Binning, and Rescaling
2.5.1 Creating Summary Variables
2.5.2 Rescaling
2.5.3 Other Forms of Preprocessing
2.6 Knowing the Structure of the Data Using Simple Statistics
2.6.1 Descriptive Statistics and DataFrame Checks
2.6.2 Obtaining Value Counts
2.6.3 Styling Your DataFrame Display
2.7 Weight Calculations
2.7.1 Complex Weight Calculation: Raking
2.7.2 Types of Weights
2.8 Querying Data
3 Shallow Survey Analysis
Contents
3.1 Frequency Summaries
3.1.1 Ordinal-Based Summaries
3.1.2 Nominal-Based Summaries
3.2 Basic Descriptive Statistics
3.3 Cross-Tabulations
3.4 Data Visualization
3.4.1 Visuals Best Practice
3.4.2 Data Visualization Background
3.4.3 Pie Charts
3.4.4 Bar Charts
3.4.5 Other Charts and Graphs
3.4.5.1 Histograms and Boxplots for Distributions
3.4.5.2 Mosaic Charts
3.4.5.3 Heatmaps
3.5 Weighted Summaries: Crosstabs and Descriptive Statistics
4 Beginning Deep Survey Analysis
Contents
4.1 Hypothesis Testing
4.1.1 Hypothesis Testing Background
4.1.2 Examples of Hypotheses
4.1.3 A Formal Framework for Statistical Tests
4.1.4 A Less Formal Framework for Statistical Tests
4.1.5 Types of Tests to Use
4.2 Quantitative Data: Tests of Means
4.2.1 Test of One Mean
4.2.2 Test of Two Means for Two Populations
4.2.2.1 Standard Errors: Independent Populations
4.2.2.2 Standard Errors: Dependent Populations
4.2.3 Test of More Than Two Means
4.3 Categorical Data: Tests of Proportions
4.3.1 Single Proportions
4.3.2 Comparing Proportions: Two Independent Populations
4.3.3 Comparing Proportions: Paired Populations
4.3.4 Comparing Multiple Proportions
4.4 Advanced Tabulations
4.5 Advanced Visualization
4.5.1 Extended Visualizations
4.5.2 Geographic Maps
4.5.3 Dynamic Graphs
Appendix
Appendix
Refresher on Expected Values
Expected Value and Standard Error of the Mean
Deviations from the Mean
Some Relationships Among Probability Distributions
Normal Distribution
Chi-Square Distribution
Student's t-Distribution
F-Distribution
Equivalence of the F and t Tests for Two Populations
Code for Fig.4.37
5 Advanced Deep Survey Analysis: The Regression Family
Contents
5.1 The Regression Family and Link Functions
5.2 The Identity Link: Introduction to OLS Regression
5.2.1 OLS Regression Background
5.2.2 The Classical Assumptions
5.2.3 Example of Application
5.2.4 Steps for Estimating an OLS Regression
5.2.5 Predicting with the OLS Model
5.3 The Logit Link: Introduction to Logistic Regression
5.3.1 Logistic Regression Background
5.3.2 Example of Application
5.3.3 Steps for Estimating a Logistic Regression
5.3.4 Predicting with the Logistic Regression Model
5.4 The Poisson Link: Introduction to Poisson Regression
5.4.1 Poisson Regression Background
5.4.2 Example of Application
5.4.3 Steps for Estimating a Poisson Regression
5.4.4 Predicting with the Poisson Regression Model
Appendix
Appendix
Identity Link Function for OLS
The Sum of Squares Decomposition and the ANOVA Table
ANOVA Conjecture
Odds-Ratio Algebra
Elasticities from Logs
Other OLS Output
6 Sample of Specialized Survey Analyses
Contents
6.1 Conjoint Analysis
6.1.1 Case Study
6.1.2 Analysis Steps
6.1.3 Creating the Design Matrix
6.1.4 Fielding the Conjoint Study
6.1.5 Estimating a Conjoint Model
6.1.6 Attribute Importance Analysis
6.2 Net Promoter Score
6.3 Correspondence Analysis
6.4 Text Analysis
7 Complex Surveys
Contents
7.1 Complex Sample Survey Estimation Effects
7.2 Sample Size Calculation
7.3 Parameter Estimation
7.4 Tabulation
7.4.1 Tabulation
7.4.2 CrossTabulation
7.5 Hypothesis Testing
7.5.1 One-Sample Test: Hypothesized Mean
7.5.2 Two-Sample Test: Independence Case
7.5.3 Two-Sample Test: Paired Case
8 Bayesian Survey Analysis: Introduction
Contents
8.1 Frequentist vs Bayesian Statistical Approaches
8.2 Digression on Bayes' Rule
8.2.1 Bayes' Rule Derivation
8.2.2 Bayes' Rule Reexpressions
8.2.3 The Prior Distribution
8.2.4 The Likelihood Function
8.2.5 The Marginal Probability Function
8.2.6 The Posterior Distribution
8.2.7 Hyperparameters of the Distributions
8.3 Computational Method: MCMC
8.3.1 Digression on Markov Chain Monte Carlo Simulation
8.3.2 Sampling from a Markov Chain Monte Carlo Simulation
8.4 Python Package pyMC3: Overview
8.5 Case Study
8.5.1 Basic Data Analysis
8.6 Benchmark OLS Regression Estimation
8.7 Using pyMC3
8.7.1 pyMC3 Bayesian Regression Setup
8.7.2 Bayesian Estimation Results
8.7.2.1 The MAP Estimate
8.7.2.2 The Visualization Output
8.8 Extensions to Other Analyses
8.8.1 Sample Mean Analysis
8.8.2 Sample Proportion Analysis
8.8.3 Contingency Table Analysis
8.8.4 Logit Model for Contingency Table
8.8.5 Poisson Model for Count Data
8.9 Appendix
8.9.1 Beta Distribution
8.9.2 Half-Normal Distribution
8.9.3 Bernoulli Distribution
9 Bayesian Survey Analysis: Multilevel Extension
Contents
9.1 Multilevel Modeling: An introduction
9.1.1 Omitted Variable Bias
9.1.2 Simple Handling of Data Structure
9.1.3 Nested Market Structures
9.2 Multilevel Modeling: Some Observations
9.2.1 Aggregation and Disaggregation Issues
9.2.2 Two Fallacies
9.2.3 Terminology
9.2.4 Ubiquity of Hierarchical Structures
9.3 Data Visualization of Multilevel Data
9.3.1 Basic Data Visualization and Regression Analysis
9.4 Case Study Modeling
9.4.1 Pooled Regression Model
9.4.2 Unpooled (Dummy Variable) Regression Model
9.4.3 Multilevel Regression Model
9.5 Multilevel Modeling Using pyMC3: Introduction
9.5.1 Multilevel Model Notation
9.5.2 Multilevel Model Formulation
9.5.3 Example Multilevel Estimation Set-up
9.5.4 Example Multilevel Estimation Analyses
9.6 Multilevel Modeling with Level Explanatory Variables
9.7 Extensions of Multilevel Models
9.7.1 Logistic Regression Model
9.7.2 Possion Model
9.7.3 Panel Data
Appendix
Appendix
Multilevel Models: A High Level View
References
Index