The second edition of a comprehensive introduction to machine learning approaches used in predictive data analytics, covering both theory and practice.
Machine learning is often used to build predictive models by extracting patterns from large datasets. These models are used in predictive data analytics applications including price prediction, risk assessment, predicting customer behavior, and document classification. This introductory textbook offers a detailed and focused treatment of the most important machine learning approaches used in predictive data analytics, covering both theoretical concepts and practical applications. Technical and mathematical material is augmented with explanatory worked examples, and case studies illustrate the application of these models in the broader business context. This second edition covers recent developments in machine learning, especially in a new chapter on deep learning, and two new chapters that go beyond predictive analytics to cover unsupervised learning and reinforcement learning.
The book is accessible, offering nontechnical explanations of the ideas underpinning each approach before introducing mathematical models and algorithms. It is focused and deep, providing students with detailed knowledge on core concepts, giving them a solid basis for exploring the field on their own. Both early chapters and later case studies illustrate how the process of learning predictive models fits into the broader business context. The two case studies describe specific data analytics projects through each phase of development, from formulating the business problem to implementation of the analytics solution. The book can be used as a textbook at the introductory level or as a reference for professionals.
Author(s): John D. Kelleher, Brian Mac Namee, Aoife D'Arcy
Edition: 2
Publisher: The MIT Press
Year: 2020
Language: English
Commentary: Vector PDF
Pages: 856
City: Cambridge, MA
Tags: Machine Learning; Data Analysis; Deep Learning; Unsupervised Learning; Reinforcement Learning; Decision Trees; Analytics; Data Mining; Supervised Learning; Bayesian Inference; Classification; Decision Making; Statistics; Linear Regression; Forecasting; Churn Rate; Data Exploration; Predictive Models
Contents
Preface
Notation
List of Figures
List of Tables
I. INTRODUCTION TO MACHINE LEARNING AND DATA ANALYTICS
1. Machine Learning for Predictive Data Analytics
1.1 What Is Predictive Data Analytics?
1.2 What Is Machine Learning?
1.3 How Does Machine Learning Work?
1.4 Inductive Bias Versus Sample Bias
1.5 What Can Go Wrong with Machine Learning?
1.6 The Predictive Data Analytics Project Lifecycle: CRISP-DM
1.7 Predictive Data Analytics Tools
1.8 The Road Ahead
1.9 Exercises
2. Data to Insights to Decisions
2.1 Converting Business Problems into Analytics Solutions
2.1.1 Case Study: Motor Insurance Fraud
2.2 Assessing Feasibility
2.2.1 Case Study: Motor Insurance Fraud
2.3 Designing the Analytics Base Table
2.3.1 Case Study: Motor Insurance Fraud
2.4 Designing and Implementing Features
2.4.1 Different Types of Data
2.4.2 Different Types of Features
2.4.3 Handling Time
2.4.4 Legal Issues
2.4.5 Implementing Features
2.4.6 Case Study: Motor Insurance Fraud
2.5 Summary
2.6 Further Reading
2.7 Exercises
3. Data Exploration
3.1 The Data Quality Report
3.1.1 Case Study: Motor Insurance Fraud
3.2 Getting to Know the Data
3.2.1 The Normal Distribution
3.2.2 Case Study: Motor Insurance Fraud
3.3 Identifying Data Quality Issues
3.3.1 Missing Values
3.3.2 Irregular Cardinality
3.3.3 Outliers
3.3.4 Case Study: Motor Insurance Fraud
3.4 Handling Data Quality Issues
3.4.1 Handling Missing Values
3.4.2 Handling Outliers
3.4.3 Case Study: Motor Insurance Fraud
3.5 Advanced Data Exploration
3.5.1 Visualizing Relationships between Features
3.5.2 Measuring Covariance and Correlation
3.6 Data Preparation
3.6.1 Normalization
3.6.2 Binning
3.6.3 Sampling
3.7 Summary
3.8 Further Reading
3.9 Exercises
II. PREDICTIVE DATA ANALYTICS
4. Information-Based Learning
4.1 Big Idea
4.2 Fundamentals
4.2.1 Decision Trees
4.2.2 Shannon’s Entropy Model
4.2.3 Information Gain
4.3 Standard Approach: The ID3 Algorithm
4.3.1 A Worked Example: Predicting Vegetation Distributions
4.4 Extensions and Variations
4.4.1 Alternative Feature Selection and Impurity Metrics
4.4.2 Handling Continuous Descriptive Features
4.4.3 Predicting Continuous Targets
4.4.4 Tree Pruning
4.4.5 Model Ensembles
4.5 Summary
4.6 Further Reading
4.7 Exercises
5. Similarity-Based Learning
5.1 Big Idea
5.2 Fundamentals
5.2.1 Feature Space
5.2.2 Measuring Similarity Using Distance Metrics
5.3 Standard Approach: The Nearest Neighbor Algorithm
5.3.1 A Worked Example
5.4 Extensions and Variations
5.4.1 Handling Noisy Data
5.4.2 Efficient Memory Search
5.4.3 Data Normalization
5.4.4 Predicting Continuous Targets
5.4.5 Other Measures of Similarity
5.4.6 Feature Selection
5.5 Summary
5.6 Further Reading
5.7 Epilogue
5.8 Exercises
6. Probability-Based Learning
6.1 Big Idea
6.2 Fundamentals
6.2.1 Bayes’ Theorem
6.2.2 Bayesian Prediction
6.2.3 Conditional Independence and Factorization
6.3 Standard Approach: The Naive Bayes Model
6.3.1 A Worked Example
6.4 Extensions and Variations
6.4.1 Smoothing
6.4.2 Continuous Features: Probability Density Functions
6.4.3 Continuous Features: Binning
6.4.4 Bayesian Networks
6.5 Summary
6.6 Further Reading
6.7 Exercises
7. Error-Based Learning
7.1 Big Idea
7.2 Fundamentals
7.2.1 Simple Linear Regression
7.2.2 Measuring Error
7.2.3 Error Surfaces
7.3 Standard Approach: Multivariable Linear Regression with Gradient Descent
7.3.1 Multivariable Linear Regression
7.3.2 Gradient Descent
7.3.3 Choosing Learning Rates and Initial Weights
7.3.4 A Worked Example
7.4 Extensions and Variations
7.4.1 Interpreting Multivariable Linear Regression Models
7.4.2 Setting the Learning Rate Using Weight Decay
7.4.3 Handling Categorical Descriptive Features
7.4.4 Handling Categorical Target Features: Logistic Regression
7.4.5 Modeling Non-Linear Relationships
7.4.6 Multinomial Logistic Regression
7.4.7 Support Vector Machines
7.5 Summary
7.6 Further Reading
7.7 Exercises
8. Deep Learning
8.1 Big Idea
8.2 Fundamentals
8.2.1 Artificial Neurons
8.2.2 Artificial Neural Networks
8.2.3 Neural Networks as Matrix Operations
8.2.4 Why Are Non-Linear Activation Functions Necessary?
8.2.5 Why Is Network Depth Important?
8.3 Standard Approach: Backpropagation and Gradient Descent
8.3.1 Backpropagation: The General Structure of the Algorithm
8.3.2 Backpropagation: Backpropagating the Error Gradients
8.3.3 Backpropagation: Updating the Weights in a Network
8.3.4 Backpropagation: The Algorithm
8.3.5 A Worked Example: Using Backpropagation to Train a Feedforward Network for a Regression Task
8.4 Extensions and Variations
8.4.1 Vanishing Gradients and ReLUs
8.4.2 Weight Initialization and Unstable Gradients
8.4.3 Handling Categorical Target Features: Softmax Output Layers and Cross-Entropy Loss Functions
8.4.4 Early Stopping and Dropout: Preventing Overfitting
8.4.5 Convolutional Neural Networks
8.4.6 Sequential Models: Recurrent Neural Networks and Long Short-Term Memory Networks
8.5 Summary
8.6 Further Reading
8.7 Exercises
9. Evaluation
9.1 Big Idea
9.2 Fundamentals
9.3 Standard Approach: Misclassification Rate on a Hold-Out Test Set
9.4 Extensions and Variations
9.4.1 Designing Evaluation Experiments
9.4.2 Performance Measures: Categorical Targets
9.4.3 Performance Measures: Prediction Scores
9.4.4 Performance Measures: Multinomial Targets
9.4.5 Performance Measures: Continuous Targets
9.4.6 Evaluating Models after Deployment
9.5 Summary
9.6 Further Reading
9.7 Exercises
III. BEYOND PREDICTION
10. Beyond Prediction: Unsupervised Learning
10.1 Big Idea
10.2 Fundamentals
10.3 Standard Approach: The k-Means Clustering Algorithm
10.3.1 A Worked Example
10.4 Extensions and Variations
10.4.1 Choosing Initial Cluster Centroids
10.4.2 Evaluating Clustering
10.4.3 Choosing the Number of Clusters
10.4.4 Understanding Clustering Results
10.4.5 Agglomerative Hierarchical Clustering
10.4.6 Representation Learning with Auto-Encoders
10.5 Summary
10.6 Further Reading
10.7 Exercises
11. Beyond Prediction: Reinforcement Learning
11.1 Big Idea
11.2 Fundamentals
11.2.1 Intelligent Agents
11.2.2 Fundamentals of Reinforcement Learning
11.2.3 Markov Decision Processes
11.2.4 The Bellman Equations
11.2.5 Temporal-Difference Learning
11.3 Standard Approach: Q-Learning, Off-Policy Temporal-Difference Learning
11.3.1 A Worked Example
11.4 Extensions and Variations
11.4.1 SARSA, On-Policy Temporal-Difference Learning
11.4.2 Deep Q Networks
11.5 Summary
11.6 Further Reading
11.7 Exercises
IV. CASE STUDIES AND CONCLUSIONS
12. Case Study: Customer Churn
12.1 Business Understanding
12.2 Data Understanding
12.3 Data Preparation
12.4 Modeling
12.5 Evaluation
12.6 Deployment
13. Case Study: Galaxy Classification
13.1 Business Understanding
13.1.1 Situational Fluency
13.2 Data Understanding
13.3 Data Preparation
13.4 Modeling
13.4.1 Baseline Models
13.4.2 Feature Selection
13.4.3 The 5-Level Model
13.5 Evaluation
13.6 Deployment
14. The Art of Machine Learning for Predictive Data Analytics
14.1 Different Perspectives on Prediction Models
14.2 Choosing a Machine Learning Approach
14.2.1 Matching Machine Learning Approaches to Projects
14.2.2 Matching Machine Learning Approaches to Data
14.3 Beyond Prediction
14.4 Your Next Steps
V. APPENDICES
A. Descriptive Statistics and Data Visualization for Machine Learning
A.1 Descriptive Statistics for Continuous Features
A.1.1 Central Tendency
A.1.2 Variation
A.2 Descriptive Statistics for Categorical Features
A.3 Populations and Samples
A.4 Data Visualization
A.4.1 Bar Plots
A.4.2 Histograms
A.4.3 Box Plots
B. Introduction to Probability for Machine Learning
B.1 Probability Basics
B.2 Probability Distributions and Summing Out
B.3 Some Useful Probability Rules
B.4 Summary
C. Differentiation Techniques for Machine Learning
C.1 Derivatives of Continuous Functions
C.2 The Chain Rule
C.3 Partial Derivatives
D. Introduction to Linear Algebra
D.1 Basic Types
D.2 Transpose
D.3 Multiplication
D.4 Summary
Bibliography
Index