"Turn yourself into a Data Head. You'll become a more valuable employee and make your organization more successful."
Thomas H. Davenport, Research Fellow, Author of Competing on Analytics, Big Data @ Work, and The AI Advantage
You’ve heard the hype around data―now get the facts.
In Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning, award-winning data scientists Alex Gutman and Jordan Goldmeier pull back the curtain on data science and give you the language and tools necessary to talk and think critically about it.
You’ll learn how to:
• Think statistically and understand the role variation plays in your life and decision making
• Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace
• Understand what’s really going on with machine learning, text analytics, deep learning, and artificial intelligence
• Avoid common pitfalls when working with and interpreting data
Becoming a Data Head is a complete guide for data science in the workplace: covering everything from the personalities you’ll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head―an active participant in data science, statistics, and machine learning. Whether you’re a business professional, engineer, executive, or aspiring data scientist, this book is for you.
Author(s): Alex J. Gutman, Jordan Goldmeier
Edition: 1
Publisher: Wiley
Year: 2021
Language: English
Commentary: Vector PDF
Pages: 272
City: Indianapolis, IN
Tags: Machine Learning; Neural Networks; Deep Learning; Unsupervised Learning; Decision Trees; Data Science; Popular Science; Classification; Clustering; Principal Component Analysis; Statistics; Linear Regression; Logistic Regression; Ensemble Learning; Topic Modeling; Text Analysis; Text Classification; Probability Theory; Dimensionality Reduction; Elementary
Cover
Title Page
Copyright Page
About the Authors
About the Technical Editors
Acknowledgments
Contents
Introduction
The Data Science Industrial Complex
Why We Care
Data in the Workplace
You Can Understand the Big Picture
Who This Book Is Written For
Why We Wrote This Book
What You’ll Learn
How This Book Is Organized
One Last Thing Before We Begin
Part I Thinking Like a Data Head
Chapter 1 What Is the Problem?
Questions a Data Head Should Ask
Why Is This Problem Important?
Who Does This Problem Affect?
What If We Don’t Have the Right Data?
When Is the Project Over?
What If We Don’t Like the Results?
Understanding Why Data Projects Fail
Customer Perception
Discussion
Working on Problems That Matter
Chapter Summary
Chapter 2 What Is Data?
Data vs. Information
An Example Dataset
Data Types
How Data Is Collected and Structured
Observational vs. Experimental Data
Structured vs. Unstructured Data
Basic Summary Statistics
Chapter Summary
Chapter 3 Prepare to Think Statistically
Ask Questions
There Is Variation in All Things
Scenario: Customer Perception (The Sequel)
Case Study: Kidney-Cancer Rates
Probabilities and Statistics
Probability vs. Intuition
Discovery with Statistics
Chapter Summary
Part II Speaking Like a Data Head
Chapter 4 Argue with the Data
What Would You Do?
Missing Data Disaster
Tell Me the Data Origin Story
Who Collected the Data?
How Was the Data Collected?
Is the Data Representative?
Is There Sampling Bias?
What Did You Do with Outliers?
What Data Am I Not Seeing?
How Did You Deal with Missing Values?
Can the Data Measure What You Want It to Measure?
Argue with Data of All Sizes
Chapter Summary
Chapter 5 Explore the Data
Exploratory Data Analysis and You
Embracing the Exploratory Mindset
Questions to Guide You
The Setup
Can the Data Answer the Question?
Set Expectations and Use Common Sense
Do the Values Make Intuitive Sense?
Watch Out: Outliers and Missing Values
Did You Discover Any Relationships?
Understanding Correlation
Watch Out: Misinterpreting Correlation
Watch Out: Correlation Does Not Imply Causation
Did You Find New Opportunities in the Data?
Chapter Summary
Chapter 6 Examine the Probabilities
Take a Guess
The Rules of the Game
Notation
Conditional Probability and Independent Events
The Probability of Multiple Events
Probability Thought Exercise
Next Steps
Be Careful Assuming Independence
Don’t Fall for the Gambler’s Fallacy
All Probabilities Are Conditional
Don’t Swap Dependencies
Bayes’ Theorem
Ensure the Probabilities Have Meaning
Calibration
Rare Events Can, and Do, Happen
Chapter Summary
Chapter 7 Challenge the Statistics
Quick Lessons on Inference
Give Yourself Some Wiggle Room
More Data, More Evidence
Challenge the Status Quo
Evidence to the Contrary
Balance Decision Errors
The Process of Statistical Inference
The Questions You Should Ask to Challenge the Statistics
What Is the Context for These Statistics?
What Is the Sample Size?
What Are You Testing?
What Is the Null Hypothesis?
What Is the Significance Level?
How Many Tests Are You Doing?
Can I See the Confidence Intervals?
Is This Practically Significant?
Are You Assuming Causality?
Chapter Summary
Part III Understanding the Data Scientist’s Toolbox
Chapter 8 Search for Hidden Groups
Unsupervised Learning
Dimensionality Reduction
Creating Composite Features
Principal Component Analysis
Principal Components in Athletic Ability
PCA Summary
Potential Traps
Clustering
k-Means Clustering
Clustering Retail Locations
Potential Traps
Chapter Summary
Chapter 9 Understand the Regression Model
Supervised Learning
Linear Regression: What It Does
Least Squares Regression: Not Just a Clever Name
Linear Regression: What It Gives You
Extending to Many Features
Linear Regression: What Confusion It Causes
Omitted Variables
Multicollinearity
Data Leakage
Extrapolation Failures
Many Relationships Aren’t Linear
Are You Explaining or Predicting?
Regression Performance
Other Regression Models
Chapter Summary
Chapter 10 Understand the Classification Model
Introduction to Classification
What You’ll Learn
Classification Problem Setup
Logistic Regression
Logistic Regression: So What?
Decision Trees
Ensemble Methods
Random Forests
Gradient Boosted Trees
Interpretability of Ensemble Models
Watch Out for Pitfalls
Misapplication of the Problem
Data Leakage
Not Splitting Your Data
Choosing the Right Decision Threshold
Misunderstanding Accuracy
Confusion Matrices
Chapter Summary
Chapter 11 Understand Text Analytics
Expectations of Text Analytics
How Text Becomes Numbers
A Big Bag of Words
N-Grams
Word Embeddings
Topic Modeling
Text Classification
Naïve Bayes
Sentiment Analysis
Practical Considerations When Working with Text
Big Tech Has the Upper Hand
Chapter Summary
Chapter 12 Conceptualize Deep Learning
Neural Networks
How Are Neural Networks Like the Brain?
A Simple Neural Network
How a Neural Network Learns
A Slightly More Complex Neural Network
Applications of Deep Learning
The Benefits of Deep Learning
How Computers “See” Images
Convolutional Neural Networks
Deep Learning on Language and Sequences
Deep Learning in Practice
Do You Have Data?
Is Your Data Structured?
What Will the Network Look Like?
Artificial Intelligence and You
Big Tech Has the Upper Hand
Ethics in Deep Learning
Chapter Summary
Part IV Ensuring Success
Chapter 13 Watch Out for Pitfalls
Biases and Weird Phenomena in Data
Survivorship Bias
Regression to the Mean
Simpson’s Paradox
Confirmation Bias
Effort Bias (aka the “Sunk Cost Fallacy”)
Algorithmic Bias
Uncategorized Bias
The Big List of Pitfalls
Statistical and Machine Learning Pitfalls
Project Pitfalls
Chapter Summary
Chapter 14 Know the People and Personalities
Seven Scenes of Communication Breakdowns
The Postmortem
Storytime
The Telephone Game
Into the Weeds
The Reality Check
The Takeover
The Blowhard
Data Personalities
Data Enthusiasts
Data Cynics
Data Heads
Chapter Summary
Chapter 15 What’s Next?
Index