Data Mining and Predictive Analytics for Business Decisions

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

With many recent advances in data science, we have many more tools and techniques available for data analysts to extract information from data sets. This book will assist data analysts to move up from simple tools such as Excel for descriptive analytics to answer more sophisticated questions using machine learning. Most of the exercises use R and Python, but rather than focus on coding algorithms, the book employs interactive interfaces to these tools to perform the analysis. Using the CRISP-DM data mining standard, the early chapters cover conducting the preparatory steps in data mining: translating business information needs into framed analytical questions and data preparation. The Jamovi and the JASP interfaces are used with R and the Orange3 data mining interface with Python. Where appropriate, Voyant and other open-source programs are used for text analytics. The techniques covered in this book range from basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics. Includes companion files with case study files, solution spreadsheets, data sets and charts, etc. from the book. FEATURES • Covers basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics • Uses R, Python, Jamovi and JASP interfaces, and the Orange3 data mining interface • Includes companion files with the case study files from the book, solution spreadsheets, data sets, etc.

Author(s): Andres Fortino
Publisher: Mercury Learning and Information
Year: 2023

Language: English
Commentary: True EPUB
Pages: 272

Acknowledgments
Chapter 1: Data Mining and Business
Data Mining Algorithms and Activities
Data is the New Oil
Data-Driven Decision-Making
Business Analytics and Business Intelligence
Algorithmic Technologies Associated with Data Mining
Data Mining and Data Warehousing
Case Study 1.1: Business Applications of Data Mining
Case A – Classification
Case B – Regression
Case C – Anomaly Detection
Case D – Time Series
Case E – Clustering
Reference
Chapter 2: The Data Mining Process
Data Mining as a Process
Exploration
Analysis
Interpretation
Exploitation
Selecting a Data Mining Process
The CRISP-DM Process Model
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Selecting Data Analytics Languages
The Choices for Languages
References
Chapter 3: Framing Analytical Questions
How Does CRISP-DM Define the Business and Data Understanding Step?
The World of the Business Data Analyst
How Does Data Analysis Relate to Business Decision-Making?
How Do We Frame Analytical Questions?
What Are the Characteristics of Well-framed Analytical Questions?
Exercise 3.1 – Framed Questions About the Titanic Disaster
Case Study 3.1 – The San Francisco Airport Survey
Case Study 3.2 – Small Business Administration Loans
References
Chapter 4: Data Preparation
How Does CRISP-DM Define Data Preparation?
Steps in Preparing the Data Set for Analysis
Data Sources and Formats
What is Data Shaping?
The Flat-File Format
Application of Tools for Data Acquisition and Preparation
Exercise 4.1 – Shaping the Data File
Exercise 4.2 – Cleaning the Data File
Ensuring the Right Variables are Included
Using SQL to Extract the Right Data Set from Data Warehouses
Case Study 4.1: Cleaning and Shaping the SFO Survey Data Set
Case Study 4.2: Shaping the SBA Loans Data Set
Case Study 4.3: Additional SQL Queries
Reference
Chapter 5: Descriptive Analysis
Getting a Sense of the Data Set
Describe the Data Set
Explore the Data Set
Verify the Quality of the Data Set
Analysis Techniques to Describe the Variables
Exercise 5.1 – Descriptive Statistics
Distributions of Numeric Variables
Correlation
Exercise 5.2 – Descriptive Analysis of the Titanic Disaster Data
Case Study 5.1: Describing the SFO Survey Data Set
Solution Using R
Solution Using Python
Case Study 5.2: Describing the SBA Loans Data Set
Solution Using R
Solution Using Python
Reference
Chapter 6: Modeling
What is a Model?
How Does CRISP-DM Define Modeling?
Selecting the Modeling Technique
Modeling Assumptions
Generate Test Design
Design of Model Testing
Build the Model
Parameter Setting
Models
Model Assessment
Where Do Models Reside in a Computer?
The Data Mining Engine
The Model
Data Sources and Outputs
Traditional Data Sources
Static Data Sources
Real-Time Data Sources
Analytic Outputs
Model Building
Step 1: Framing Questions
Step 2: Selecting the Machine
Step 3: Selecting Known Data
Step 4: Training the Machine
Step 5: Testing the Model
Step 6: Deploying the Model
Step 7: Collecting New Data
Step 8: Updating the Model
Step 9: Learning – Repeat Steps 7 and 8
Step 10: Recommending Answers to the User
Reference
Chapter 7: Predictive Analytics with Regression Models
What is Supervised Learning?
Regression to the Mean
Linear Regression
Simple Linear Regression
The R-squared Coefficient
The Use of the p-value of the Coefficients
Strength of the Correlation Between Two Variables
Exercise 7.1 – Using SLR Analysis to Understand Franchise Advertising
Multivariate Linear Regression
Preparing to Build the Multivariate Model
Exercise 7.2 – Using Multivariate Linear Regression to Model Franchise Sales
Logistic Regression
What is Logistic Regression?
Exercise 7.3 – PassClass Case Study
Multivariate Logistic Regression
Exercise 7.4 – MLR Used to Analyze the Results of a Database Marketing Initiative
Where is Logistic Regression Used?
Comparing Linear and Logistic Regressions for Binary Outcomes
Case Study 7.1: Linear Regression Using the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 7.2: Linear Regression Using the SBA Loans Data Set
Solution in R
Solution in Python
Case Study 7.3: Logistic Regression Using the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 7.4: Logistic Regression Using the SBA Loans Data Set
Solution in R
Solution in Python
Chapter 8: Classification
Classification with Decision Trees
Building a Decision Tree
Exercise 8.1 – The Iris Data Set
The Problem with Decision Trees
Classification with Random Forest
Using a Random Forest Model
Exercise 8.2 – The Iris Data Set
Classification with Naïve Bayes
Exercise 8.3 – The HIKING Data Set
Computing the Conditional Probabilities
Case Study 8.1: Classification with the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 8.2: Classification with the SBA Loans Data Set
Solution in R
Solution in Python
Case Study 8.3: Classification with the Florence Nightingale Data Set
Solution in Python
Reference
Chapter 9: Clustering
What is Unsupervised Machine Learning?
What is Clustering Analysis?
Applying Clustering to Old Faithful Eruptions
Examples of Applications of Clustering Analysis
A Simple Clustering Example Using Regression
Hierarchical Clustering
Applying Hierarchical Clustering to Old Faithful Eruptions
Exercise 9.1 – Hierarchical Clustering and the Iris Data Set
K-Means Clustering
How Does the K-Means Algorithm Compute Cluster Centroids?
Applying K-Means Clustering to Old Faithful Eruptions
Exercise 9.2 – K-Means Clustering and the Iris Data Set
Hierarchical vs. K-Means Clustering
Case Study 9.1: Clustering with the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 9.2: Clustering with the SBA Loans Data Set
Solution in R
Solution in Python
Chapter 10: Time Series Forecasting
What is a Time Series?
Time Series Analysis
Types of Time Series Analysis
What is Forecasting?
Exercise 10.1 – Analysis of the US and China GDP Data Set
Case Studies
Case Study 10.1: Time Series Analysis of the SFO Survey Data Set
Solution in Excel
Case Study 10.2: Time Series Analysis of the SBA Loans Data set
Solution in R
Solution in Python
Case Study 10.3: Time Series Analysis of a Nest Data Set
Solution in Python
Reference
Chapter 11: Feature Selection
Using the Covariance Matrix
Factor Analysis
When to Use Factor Analysis
First Step in FA – Correlation
FA for Exploratory Analysis
Selecting the Number of Factors – The Scree Plot
Example 11.1: Restaurant Feedback
Factor Interpretation
Summary Activities to Perform a Factor Analysis
Case Study 11.1: Variable Reduction with the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 11.2: Hunting Diamonds
Solution in R
Solution in Python
Chapter 12: Anomaly Detection
What is an Anomaly?
What is an Outlier?
The Case Studies for the Exercises in Anomaly Detection
Anomaly Detection by Standardization – A Single Numerical Variable
Exercise 12.1 – Outliers in the Airline Delays Data Set – Z-Score
Anomaly Detection by Quartiles – Tukey Fences – With a Single Variable
Comparing Z-scores and Tukey Fences
Exercise 12.2 – Outliers in the Airline Delays Data Set – Tukey Fences
Anomaly Detection by Category – A Single Variable
Exercise 12.3 – Outliers in the Airline Delays Data Set – Categorical
Anomaly Detection by Clustering – Multiple Variables
Exercise 12.4 – Outliers in the Airline Delays Data Set – Clustering
Anomaly Detection Using Linear Regression by Residuals – Multiple Variables
Exercise 12.5 – Outliers in the Airline Delays Data Set – Residuals
Case Study 12.1: Outliers in the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 12.2: Outliers in the SBA Loans Data Set
Solution in R
Solution in Python
References
Chapter 13: Text Data Mining
What is Text Data Mining?
What are Some Examples of Text-Based Analytical Questions?
Tools for Text Data Mining
Sources and Formats of Text Data
Term Frequency Analysis
How Does It Apply to Text Business Data Analysis?
Exercise 13.1 – Case Study Using a Training Survey Data Set
Word Frequency Analysis Using R
Keyword Analysis
Exercise 13.2 – Case Study Using Data Set D: Résumé and Job Description
Keyword Word Analysis in Voyant
Term Frequency Analysis in R
Visualizing Text Data
Exercise 13.3 – Case Study Using the Training Survey Data Set
Visualizing the Text Using Excel
Visualizing the Text Using Voyant
Visualizing the Text Using R
Text Similarity Scoring
What is Text Similarity Scoring?
Exercise 13.4 – Case Study Using the Occupation Description Data Set
Analysis Using an Online Text Similarity Scoring Tool
Similarity Scoring Analysis Using R
Exercise 13.5 – Résumé and Job Descriptions Similarly Scoring Using R
Case Study 13.1 – Term Frequency Analysis of Product Reviews
Term Frequency Analysis Using Voyant
Term Frequency Analysis Using R
References
Chapter 14: Working with Large Data Sets
Using Sampling to Work with Large Data Files
Exercise 14.1 – Big Data Analysis
Case Study 14.1 Using the BankComplaints Big Data File
Exercise 12.3 – Outliers in the Airline Delays Data Set – Categorical
Anomaly Detection by Clustering – Multiple Variables
Exercise 12.4 – Outliers in the Airline Delays Data Set – Clustering
Anomaly Detection Using Linear Regression by Residuals – Multiple Variables
Exercise 12.5 – Outliers in the Airline Delays Data Set – Residuals
Case Study 12.1: Outliers in the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 12.2: Outliers in the SBA Loans Data Set
Solution in R
Solution in Python
References
Chapter 13: Text Data Mining
What is Text Data Mining?
What are Some Examples of Text-Based Analytical Questions?
Tools for Text Data Mining
Sources and Formats of Text Data
Term Frequency Analysis
How Does It Apply to Text Business Data Analysis?
Exercise 13.1 – Case Study Using a Training Survey Data Set
Word Frequency Analysis Using R
Keyword Analysis
Exercise 13.2 – Case Study Using Data Set D: Résumé and Job Description
Keyword Word Analysis in Voyant
Term Frequency Analysis in R
Visualizing Text Data
Exercise 13.3 – Case Study Using the Training Survey Data Set
Visualizing the Text Using Excel
Visualizing the Text Using Voyant
Visualizing the Text Using R
Text Similarity Scoring
What is Text Similarity Scoring?
Exercise 13.4 – Case Study Using the Occupation Description Data Set
Analysis Using an Online Text Similarity Scoring Tool
Similarity Scoring Analysis Using R
Exercise 13.5 – Résumé and Job Descriptions Similarly Scoring Using R
Case Study 13.1 – Term Frequency Analysis of Product Reviews
Term Frequency Analysis Using Voyant
Term Frequency Analysis Using R
References
Chapter 14: Working with Large Data Sets
Using Sampling to Work with Large Data Files
Exercise 14.1 – Big Data Analysis
Case Study 14.1 Using the BankComplaints Big Data File