MACHINE LEARNING FOR BUSINESS ANALYTICS
An up-to-date introduction to a market-leading platform for data analysis and machine learning
Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro®, 2nd ed. offers an accessible and engaging introduction to machine learning. It provides concrete examples and case studies to educate new users and deepen existing users’ understanding of their data and their business. Fully updated to incorporate new topics and instructional material, this remains the only comprehensive introduction to this crucial set of analytical tools specifically tailored to the needs of businesses.
Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro®, 2nd ed. readers will also find
Updated material which improves the book’s usefulness as a reference for professionals beyond the classroom
Four new chapters, covering topics including Text Mining and Responsible Data Science
An updated companion website with data sets and other instructor resources: www.jmp.com/dataminingbook
A guide to JMP Pro®’s new features and enhanced functionality
Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro®, 2nd ed. is ideal for students and instructors of business analytics and data mining classes, as well as data science practitioners and professionals in data-driven industries.
Author(s): Peter C. Bruce, Mia L. Stephens, Galit Shmueli
Edition: 2
Publisher: Wiley
Year: 2023
Language: English
Pages: 611
Cover
Title Page
Copyright
Contents
Foreword
Preface
Acknowledgments
PART I PRELIMINARIES
Chapter 1 Introduction
1.1 What Is Business Analytics?
1.2 What Is Machine Learning?
1.3 Machine Learning, AI, and Related Terms
Statistical Modeling vs. Machine Learning
1.4 Big Data
1.5 Data Science
1.6 Why Are There So Many Different Methods?
1.7 Terminology and Notation
1.8 Road Maps to This Book
Order of Topics
Chapter 2 Overview of the Machine Learning Process
2.1 Introduction
2.2 Core Ideas in Machine Learning
Classification
Prediction
Association Rules and Recommendation Systems
Predictive Analytics
Data Reduction and Dimension Reduction
Data Exploration and Visualization
Supervised and Unsupervised Learning
2.3 The Steps in A Machine Learning Project
2.4 Preliminary Steps
Organization of Data
Sampling from a Database
Oversampling Rare Events in Classification Tasks
Preprocessing and Cleaning the Data
2.5 Predictive Power and Overfitting
Overfitting
Creation and Use of Data Partitions
2.6 Building a Predictive Model with JMP Pro
Predicting Home Values in a Boston Neighborhood
Modeling Process
2.7 Using JMP Pro for Machine Learning
2.8 Automating Machine Learning Solutions
Predicting Power Generator Failure
Uber’s Michelangelo
2.9 Ethical Practice in Machine Learning
Machine Learning Software: The State of the Market by Herb Edelstein
Problems
PART II DATA EXPLORATION ANDDIMENSION REDUCTION
Chapter 3 Data Visualization
3.1 Introduction
3.2 Data Examples
Example 1: Boston Housing Data
Example 2: Ridership on Amtrak Trains
3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots
Distribution Plots: Boxplots and Histograms
Heatmaps
3.4 Multidimensional Visualization
Adding Variables: Color, Hue, Size, Shape, Multiple Panels, Animation
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering
Reference: Trend Line and Labels
Scaling Up: Large Datasets
Multivariate Plot: Parallel Coordinates Plot
Interactive Visualization
3.5 Specialized Visualizations
Visualizing Networked Data
Visualizing Hierarchical Data: More on Treemaps
Visualizing Geographical Data: Maps
3.6 Summary: Major Visualizations and Operations, According to Machine Learning Goal
Prediction
Classification
Time Series Forecasting
Unsupervised Learning
Problems
Chapter 4 Dimension Reduction
4.1 Introduction
4.2 Curse of Dimensionality
4.3 Practical Considerations
Example 1: House Prices in Boston
4.4 Data Summaries
Summary Statistics
Tabulating Data
4.5 Correlation Analysis
4.6 Reducing the Number of Categories in Categorical Variables
4.7 Converting a Categorical Variable to a Continuous Variable
4.8 Principal Component Analysis
Example 2: Breakfast Cereals
Principal Components
Standardizing the Data
Using Principal Components for Classification and Prediction
4.9 Dimension Reduction Using Regression Models
4.10 Dimension Reduction Using Classification and Regression Trees
Problems
PART III PERFORMANCE EVALUATION
Chapter 5 Evaluating Predictive Performance
5.1 Introduction
5.2 Evaluating Predictive Performance
Naive Benchmark: The Average
Prediction Accuracy Measures
Comparing Training and Validation Performance
5.3 Judging Classifier Performance
Benchmark: The Naive Rule
Class Separation
The Classification (Confusion) Matrix
Using the Validation Data
Accuracy Measures
Propensities and Threshold for Classification
Performance in Unequal Importance of Classes
Asymmetric Misclassification Costs
Generalization to More Than Two Classes
5.4 Judging Ranking Performance
Lift Curves for Binary Data
Beyond Two Classes
Lift Curves Incorporating Costs and Benefits
5.5 Oversampling
Creating an Over-sampled Training Set
Evaluating Model Performance Using a Nonoversampled Validation Set
Evaluating Model Performance If Only Oversampled Validation Set Exists
Problems
PART IV PREDICTION AND CLASSIFICATION METHODS
Chapter 6 Multiple Linear Regression
6.1 Introduction
6.2 Explanatory vs. Predictive Modeling
6.3 Estimating the Regression Equation and Prediction
Example: Predicting the Price of Used Toyota Corolla Automobiles
6.4 Variable Selection in Linear Regression
Reducing the Number of Predictors
How to Reduce the Number of Predictors
Manual Variable Selection
Automated Variable Selection
Regularization (Shriknage Models)
Problems
Chapter 7 k-Nearest Neighbors (k-NN)
7.1 The ?-NN Classifier (Categorical Outcome)
Determining Neighbors
Classification Rule
Example: Riding Mowers
Choosing Parameter ?
Setting the Threshold Value
Weighted ?-NN
?-NN with More Than Two Classes
Working with Categorical Predictors
7.2 ?-NN for a Numerical Response
7.3 Advantages and Shortcomings of ?-NN Algorithms
Problems
Chapter 8 The Naive Bayes Classifier
8.1 Introduction
Threshold Probability Method
Conditional Probability
Example 1: Predicting Fraudulent Financial Reporting
8.2 Applying the Full (Exact) Bayesian Classifier
Using the “Assign to the Most Probable Class” Method
Using the Threshold Probability Method
Practical Difficulty with the Complete (Exact) Bayes Procedure
8.3 Solution: Naive Bayes
The Naive Bayes Assumption of Conditional Independence
Using the Threshold Probability Method
Example 2: Predicting Fraudulent Financial Reports
Example 3: Predicting Delayed Flights
Evaluating the Performance of Naive Bayes Output from JMP
Working with Continuous Predictors
8.4 Advantages and Shortcomings of the Naive Bayes Classifier
Problems
Chapter 9 Classification and Regression Trees
9.1 Introduction
Tree Structure
Decision Rules
Classifying a New Record
9.2 Classification Trees
Recursive Partitioning
Example 1: Riding Mowers
Categorical Predictors
Standardization
9.3 Growing a Tree for Riding Mowers Example
Choice of First Split
Choice of Second Split
Final Tree
Using a Tree to Classify New Records
9.4 Evaluating the Performance of a Classification Tree
Example 2: Acceptance of Personal Loan
9.5 Avoiding Overfitting
Stopping Tree Growth: CHAID
Growing a Full Tree and Pruning It Back
How JMP Pro Limits Tree Size
9.6 Classification Rules from Trees
9.7 Classification Trees for More Than Two Classes
9.8 Regression Trees
Prediction
Evaluating Performance
9.9 Advantages and Weaknesses of a Single Tree
9.10 Improving Prediction: Random Forests and Boosted Trees
Random Forests
Boosted Trees
Problems
Chapter 10 Logistic Regression
10.1 Introduction
10.2 The Logistic Regression Model
10.3 Example: Acceptance of Personal Loan
Model with a Single Predictor
Estimating the Logistic Model from Data: Multiple Predictors
Interpreting Results in Terms of Odds (for a Profiling Goal)
10.4 Evaluating Classification Performance
10.5 Variable Selection
10.6 Logistic Regression for Multi-class Classification
Logistic Regression for Nominal Classes
Logistic Regression for Ordinal Classes
Example: Accident Data
10.7 Example of Complete Analysis: Predicting Delayed Flights
Data Preprocessing
Model Fitting, Estimation, and Interpretation---A Simple Model
Model Fitting, Estimation and Interpretation---The Full Model
Model Performance
Problems
Chapter 11 Neural Nets
11.1 Introduction
11.2 Concept and Structure of a Neural Network
11.3 Fitting a Network to Data
Example 1: Tiny Dataset
Computing Output of Nodes
Preprocessing the Data
Training the Model
Using the Output for Prediction and Classification
Example 2: Classifying Accident Severity
Avoiding Overfitting
11.4 User Input in JMP Pro
11.5 Exploring the Relationship Between Predictors and Outcome
11.6 Deep Learning
Convolutional Neural Networks (CNNs)
Local Feature Map
A Hierarchy of Features
The Learning Process
Unsupervised Learning
Conclusion
11.7 Advantages and Weaknesses of Neural Networks
Problems
Chapter 12 Discriminant Analysis
12.1 Introduction
Example 1: Riding Mowers
Example 2: Personal Loan Acceptance
12.2 Distance of an Observation from a Class
12.3 From Distances to Propensities and Classifications
12.4 Classification Performance of Discriminant Analysis
12.5 Prior Probabilities
12.6 Classifying More Than Two Classes
Example 3: Medical Dispatch to Accident Scenes
12.7 Advantages and Weaknesses
Problems
Chapter 13 Generating, Comparing, and Combining Multiple Models
13.1 Ensembles
Why Ensembles Can Improve Predictive Power
Simple Averaging or Voting
Bagging
Boosting
Stacking
Advantages and Weaknesses of Ensembles
13.2 Automated Machine Learning (AutoML)
AutoML: Explore and Clean Data
AutoML: Determine Machine Learning Task
AutoML: Choose Features and Machine Learning Methods
AutoML: Evaluate Model Performance
AutoML: Model Deployment
Advantages and Weaknesses of Automated Machine Learning
13.3 Summary
Problems
PART V INTERVENTION AND USER FEEDBACK
Chapter 14 Interventions: Experiments, Uplift Models, and Reinforcement Learning
14.1 Introduction
14.2 A/B Testing
Example: Testing a New Feature in a Photo Sharing App
The Statistical Test for Comparing Two Groups (? -Test)
Multiple Treatment Groups: A/B/n Tests
Multiple A/B Tests and the Danger of Multiple Testing
14.3 Uplift (Persuasion) Modeling
Getting the Data
A Simple Model
Modeling Individual Uplift
Creating Uplift Models in JMP Pro
Using the Results of an Uplift Model
14.4 Reinforcement Learning
Explore-Exploit: Multi-armed Bandits
Markov Decision Process (MDP)
14.5 Summary
Problems
PART VI MINING RELATIONSHIPS AMONG RECORDS
Chapter 15 Association Rules and Collaborative Filtering
15.1 Association Rules
Discovering Association Rules in Transaction Databases
Example 1: Synthetic Data on Purchases of Phone Faceplates
Data Format
Generating Candidate Rules
The Apriori Algorithm
Selecting Strong Rules
The Process of Rule Selection
Interpreting the Results
Rules and Chance
Example 2: Rules for Similar Book Purchases
15.2 Collaborative Filtering
Data Type and Format
Example 3: Netflix Prize Contest
User-Based Collaborative Filtering: “People Like You”
Item-Based Collaborative Filtering
Evaluating Performance
Advantages and Weaknesses of Collaborative Filtering
Collaborative Filtering vs. Association Rules
15.3 Summary
Problems
Chapter 16 Cluster Analysis
16.1 Introduction
Example: Public Utilities
16.2 Measuring Distance Between Two Records
Euclidean Distance
Standardizing Numerical Measurements
Other Distance Measures for Numerical Data
Distance Measures for Categorical Data
Distance Measures for Mixed Data
16.3 Measuring Distance Between Two Clusters
Minimum Distance
Maximum Distance
Average Distance
Centroid Distance
16.4 Hierarchical (Agglomerative) Clustering
Single Linkage
Complete Linkage
Average Linkage
Centroid Linkage
Ward’s Method
Dendrograms: Displaying Clustering Process and Results
Validating Clusters
Two-Way Clustering
Limitations of Hierarchical Clustering
16.5 Nonhierarchical Clustering: The ?-Means Algorithm
Choosing the Number of Clusters (?)
Problems
PART VII FORECASTING TIME SERIES
Chapter 17 Handling Time Series
17.1 Introduction
17.2 Descriptive vs. Predictive Modeling
17.3 Popular Forecasting Methods in Business
Combining Methods
17.4 Time Series Components
Example: Ridership on Amtrak Trains
17.5 Data Partitioning and Performance Evaluation
Benchmark Performance: Naive Forecasts
Generating Future Forecasts
Problems
Chapter 18 Regression-Based Forecasting
18.1 A Model with Trend
Linear Trend
Exponential Trend
Polynomial Trend
18.2 A Model with Seasonality
Additive vs. Multiplicative Seasonality
18.3 A Model with Trend and Seasonality
18.4 Autocorrelation and ARIMA Models
Computing Autocorrelation
Improving Forecasts by Integrating Autocorrelation Information
Fitting AR Models to Residuals
Evaluating Predictability
Problems
Chapter 19 Smoothing and Deep Learning Methods for Forecasting
19.1 Introduction
19.2 Moving Average
Centered Moving Average for Visualization
Trailing Moving Average for Forecasting
Choosing Window Width (?)
19.3 Simple Exponential Smoothing
Choosing Smoothing Parameter ?
Relation Between Moving Average and Simple Exponential Smoothing
19.4 Advanced Exponential Smoothing
Series With a Trend
Series With a Trend and Seasonality
19.5 Deep Learning for Forecasting
Problems
PART VIII DATA ANALYTICS
Chapter 20 Text Mining
20.1 Introduction
20.2 The Tabular Representation of Text: Document–Term Matrix and “Bag-of-Words”
20.3 Bag-of-Words vs. Meaning Extraction at Document Level
20.4 Preprocessing the Text
Tokenization
Text Reduction
Presence/Absence vs. Frequency (Occurrences)
Term Frequency-Inverse Document Frequency (TF-IDF)
From Terms to Topics: Latent Semantic Analysis and Topic Analysis
Extracting Meaning
From Terms to High Dimensional Word Vectors: Word2Vec
20.5 Implementing Machine Learning Methods
20.6 Example: Online Discussions on Autos and Electronics
Importing the Records
Text Preprocessing in JMP
Using Latent Semantic Analysis and Topic Analysis
Fitting a Predictive Model
Prediction
20.7 Example: Sentiment Analysis of Movie Reviews
Data Preparation
Latent Semantic Analysis and Fitting a Predictive Model
20.8 Summary
Problems
Chapter 21 Responsible Data Science
21.1 Introduction
Example: Predicting Recidivism
21.2 Unintentional Harm
21.3 Legal Considerations
The General Data Protection Regulation (GDPR)
Protected Groups
21.4 Principles of Responsible Data Science
Non-maleficence
Fairness
Transparency
Accountability
Data Privacy and Security
21.5 A Responsible Data Science Framework
Justification
Assembly
Data Preparation
Modeling
Auditing
21.6 Documentation Tools
Impact Statements
Model Cards
Datasheets
Audit Reports
21.7 Example: Applying the RDS Framework to the COMPAS Example
Unanticipated Uses
Ethical Concerns
Protected Groups
Data Issues
Fitting the Model
Auditing the Model
Bias Mitigation
21.8 Summary
Problems
PART IX CASES
Chapter 22 Cases
22.1 Charles Book Club
The Book Industry
Database Marketing at Charles
Machine Learning Techniques
Assignment
22.2 German Credit
Background
Data
Assignment
22.3 Tayko Software Cataloger
Background
The Mailing Experiment
Data
Assignment
22.4 Political Persuasion
Background
Predictive Analytics Arrives in US Politics
Political Targeting
Uplift
Data
Assignment
22.5 Taxi Cancellations
Business Situation
Assignment
22.6 Segmenting Consumers of Bath Soap
Business Situation
Key Problems
Data
Measuring Brand Loyalty
Assignment
22.7 Catalog Cross-Selling
Background
Assignment
22.8 Direct-Mail Fundraising
Background
Data
Assignment
22.9 Time Series Case: Forecasting Public Transportation Demand
Background
Problem Description
Available Data
Assignment Goal
Assignment
Tips and Suggested Steps
22.10 Loan Approval
Background
Regulatory Requirements
Getting Started
Assignment
References
Data Files Used in the Book
Index
EULA