Causal Inference in Python: Applying Causal Inference in the Tech Industry

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

How many buyers will an additional dollar of online marketing bring in? Which customers will only buy when given a discount coupon? How do you establish an optimal pricing strategy? The best way to determine how the levers at our disposal affect the business metrics we want to drive is through causal inference. In this book, author Matheus Facure, senior data scientist at Nubank, explains the largely untapped potential of causal inference for estimating impacts and effects. Managers, data scientists, and business analysts will learn classical causal inference methods like randomized control trials (A/B tests), linear regression, propensity score, synthetic controls, and difference-in-differences. Each method is accompanied by an application in the industry to serve as a grounding example. With this book, you will: • Learn how to use basic concepts of causal inference • Frame a business problem as a causal inference problem • Understand how bias gets in the way of causal inference • Learn how causal effects can differ from person to person • Use repeated observations of the same customers across time to adjust for biases • Understand how causal effects differ across geographic locations • Examine noncompliance bias and effect dilution

Author(s): Matheus Facure
Edition: 1
Publisher: O'Reilly Media
Year: 2023

Language: English
Commentary: Publisher's PDF
Pages: 406
City: Sebastopol, CA
Tags: Data Analysis; Python; Marketing; Statistics; Linear Regression; Hypothesis Testing; A/B Testing; Graphviz; Causal Diagrams; Meta-Learning; Causal Inference; Propensity Score; Panel Data

Cover
Copyright
Table of Contents
Preface
Prerequisites
Outline
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Part I. Fundamentals
Chapter 1. Introduction to Causal Inference
What Is Causal Inference?
Why We Do Causal Inference
Machine Learning and Causal Inference
Association and Causation
The Treatment and the Outcome
The Fundamental Problem of Causal Inference
Causal Models
Interventions
Individual Treatment Effect
Potential Outcomes
Consistency and Stable Unit Treatment Values
Causal Quantities of Interest
Causal Quantities: An Example
Bias
The Bias Equation
A Visual Guide to Bias
Identifying the Treatment Effect
The Independence Assumption
Identification with Randomization
Key Ideas
Chapter 2. Randomized Experiments and Stats Review
Brute-Force Independence with Randomization
An A/B Testing Example
The Ideal Experiment
The Most Dangerous Equation
The Standard Error of Our Estimates
Confidence Intervals
Hypothesis Testing
Null Hypothesis
Test Statistic
p-values
Power
Sample Size Calculation
Key Ideas
Chapter 3. Graphical Causal Models
Thinking About Causality
Visualizing Causal Relationships
Are Consultants Worth It?
Crash Course in Graphical Models
Chains
Forks
Immorality or Collider
The Flow of Association Cheat Sheet
Querying a Graph in Python
Identification Revisited
CIA and the Adjustment Formula
Positivity Assumption
An Identification Example with Data
Confounding Bias
Surrogate Confounding
Randomization Revisited
Selection Bias
Conditioning on a Collider
Adjusting for Selection Bias
Conditioning on a Mediator
Key Ideas
Part II. Adjusting for Bias
Chapter 4. The Unreasonable Effectiveness of Linear Regression
All You Need Is Linear Regression
Why We Need Models
Regression in A/B Tests
Adjusting with Regression
Regression Theory
Single Variable Linear Regression
Multivariate Linear Regression
Frisch-Waugh-Lovell Theorem and Orthogonalization
Debiasing Step
Denoising Step
Standard Error of the Regression Estimator
Final Outcome Model
FWL Summary
Regression as an Outcome Model
Positivity and Extrapolation
Nonlinearities in Linear Regression
Linearizing the Treatment
Nonlinear FWL and Debiasing
Regression for Dummies
Conditionally Random Experiments
Dummy Variables
Saturated Regression Model
Regression as Variance Weighted Average
De-Meaning and Fixed Effects
Omitted Variable Bias: Confounding Through the Lens of Regression
Neutral Controls
Noise Inducing Control
Feature Selection: A Bias-Variance Trade-Off
Key Ideas
Chapter 5. Propensity Score
The Impact of Management Training
Adjusting with Regression
Propensity Score
Propensity Score Estimation
Propensity Score and Orthogonalization
Propensity Score Matching
Inverse Propensity Weighting
Variance of IPW
Stabilized Propensity Weights
Pseudo-Populations
Selection Bias
Bias-Variance Trade-Off
Positivity
Design- Versus Model-Based Identification
Doubly Robust Estimation
Treatment Is Easy to Model
Outcome Is Easy to Model
Generalized Propensity Score for Continuous Treatment
Key Ideas
Part III. Effect Heterogeneity and Personalization
Chapter 6. Effect Heterogeneity
From ATE to CATE
Why Prediction Is Not the Answer
CATE with Regression
Evaluating CATE Predictions
Effect by Model Quantile
Cumulative Effect
Cumulative Gain
Target Transformation
When Prediction Models Are Good for Effect Ordering
Marginal Decreasing Returns
Binary Outcomes
CATE for Decision Making
Key Ideas
Chapter 7. Metalearners
Metalearners for Discrete Treatments
T-Learner
X-Learner
Metalearners for Continuous Treatments
S-Learner
Double/Debiased Machine Learning
Key Ideas
Part IV. Panel Data
Chapter 8. Difference-in-Differences
Panel Data
Canonical Difference-in-Differences
Diff-in-Diff with Outcome Growth
Diff-in-Diff with OLS
Diff-in-Diff with Fixed Effects
Multiple Time Periods
Inference
Identification Assumptions
Parallel Trends
No Anticipation Assumption and SUTVA
Strict Exogeneity
No Time Varying Confounders
No Feedback
No Carryover and No Lagged Dependent Variable
Effect Dynamics over Time
Diff-in-Diff with Covariates
Doubly Robust Diff-in-Diff
Propensity Score Model
Delta Outcome Model
All Together Now
Staggered Adoption
Heterogeneous Effect over Time
Covariates
Key Ideas
Chapter 9. Synthetic Control
Online Marketing Dataset
Matrix Representation
Synthetic Control as Horizontal Regression
Canonical Synthetic Control
Synthetic Control with Covariants
Debiasing Synthetic Control
Inference
Synthetic Difference-in-Differences
DID Refresher
Synthetic Controls Revisited
Estimating Time Weights
Synthetic Control and DID
Key Ideas
Part V. Alternative Experimental Designs
Chapter 10. Geo and Switchback Experiments
Geo-Experiments
Synthetic Control Design
Trying a Random Set of Treated Units
Random Search
Switchback Experiment
Potential Outcomes of Sequences
Estimating the Order of Carryover Effect
Design-Based Estimation
Optimal Switchback Design
Robust Variance
Key Ideas
Chapter 11. Noncompliance and Instruments
Noncompliance
Extending Potential Outcomes
Instrument Identification Assumptions
First Stage
Reduced Form
Two-Stage Least Squares
Standard Error
Additional Controls and Instruments
2SLS by Hand
Matrix Implementation
Discontinuity Design
Discontinuity Design Assumptions
Intention to Treat Effect
The IV Estimate
Bunching
Key Ideas
Chapter 12. Next Steps
Causal Discovery
Sequential Decision Making
Causal Reinforcement Learning
Causal Forecasting
Domain Adaptation
Closing Thoughts
Index
About the Author
Colophon