Python Machine Learning Case Studies: Five Case Studies for the Data Scientist

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Embrace machine learning approaches and Python to enable automatic rendering of rich insights. The book uses a hands-on case study-based approach to crack real-world applications to which machine learning concepts can be applied. These smarter machines will enable your business processes to achieve efficiencies on minimal time and resources. Python Machine Learning Case Studies takes you through the steps to improve business processes and determine the pivotal points that frame strategies. You’ll see machine learning techniques that you can use to support your products and services. Moreover you’ll learn the pros and cons of each of the machine learning concepts presented. By taking a step-by-step approach to coding in Python you’ll be able to understand the rationale behind model selection and decisions within the machine learning process. The book is equipped with practical examples along with code snippets to ensure that you understand the data science approach to solving real-world problems. What You Will Learn Gain insights into machine learning concepts Work on real-world applications of machine learning Get a hands-on overview to Python from a machine learning point of view Who This Book Is For Data scientists, data analysts, artificial intelligence engineers, big data enthusiasts, computer scientists, computer sciences students, and capital market analysts.

Author(s): Danish Haroon
Publisher: Apress
Year: 2017

Language: English
Pages: 270

Contents at a Glance
Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: Statistics and Probability
Case Study: Cycle Sharing Scheme—Determining Brand Persona
Performing Exploratory Data Analysis
Feature Exploration
Types of variables
Continuous/Quantitative Variables
True Zero Point
Interval Variables
Ratio Variables
Discrete Variables
Ordinal Variables
Nominal Variables
Dichotomous Variables
Lurking Variable
Demographic Variable
Dependent and Independent Variables
Univariate Analysis
Multivariate Analysis
Time Series Components
Seasonal Pattern
Cyclic Pattern
Trend
Measuring Center of Measure
Mean
Arithmetic Mean
Geometric Mean
Median
Mode
Variance
Standard Deviation
Changes in Measure of Center Statistics due to Presence of Constants
The Normal Distribution
Skewness
Outliers
Correlation
Pearson R Correlation
Kendall Rank Correlation
Spearman Rank Correlation
Hypothesis Testing: Comparing Two Groups
t-Statistics
t-Distributions and Sample Size
Central Limit Theorem
Case Study Findings
Applications of Statistics and Probability
Actuarial Science
Biostatistics
Astrostatistics
Business Analytics
Econometrics
Machine Learning
Statistical Signal Processing
Elections
Chapter 2: Regression
Case Study: Removing Inconsistencies in Concrete Compressive Strength
Concepts of Regression
Interpolation and Extrapolation
Linear Regression
Least Squares Regression Line of y on x
Multiple Regression
Stepwise Regression
Polynomial Regression
Assumptions of Regressions
Number of Cases
Missing Data
Outliers
Multicollinearity and Singularity
Features’ Exploration
Correlation
Overfitting and Underfitting
Regression Metrics of Evaluation
Explained Variance Score
Mean Absolute Error
Mean Squared Error
R2
Residual
Residual Plot
Residual Sum of Squares
Types of Regression
Linear Regression
Grid Search
Ridge Regression
Lasso Regression
ElasticNet
Gradient Boosting Regression
Support Vector Machines
Applications of Regression
Predicting Sales
Predicting Value of Bond
Rate of Inflation
Insurance Companies
Call Center
Agriculture
Predicting Salary
Real Estate Industry
Chapter 3: Time Series
Case Study: Predicting Daily Adjusted Closing Rate of Yahoo
Feature Exploration
Time Series Modeling
Evaluating the Stationary Nature of a Time Series Object
Properties of a Time Series Which Is Stationary in Nature
Tests to Determine If a Time Series Is Stationary
Exploratory Data Analysis
Dickey-Fuller Test
Methods of Making a Time Series Object Stationary
Applying Transformations
Log Transformation
Square Root Transformation
Estimating Trend and Removing It from the Original Series
Moving Average Smoothing
Exponentially Weighted Moving Average
Differencing
Decomposition
Tests to Determine If a Time Series Has Autocorrelation
Autocorrelation Function
Partial Autocorrelation Function
Measuring Autocorrelation
Durbin Watson Statistic
Modeling a Time Series
Tests to Validate Forecasted Series
Mean Forecast Error
Mean Absolute Error
Residual Sum of Squares
Root Mean Squared Error
Deciding Upon the Parameters for Modeling
Auto-Regressive Integrated Moving Averages
Auto-Regressive Moving Averages
Auto-Regressive
Moving Average
Combined Model
Scaling Back the Forecast
Applications of Time Series Analysis
Sales Forecasting
Weather Forecasting
Unemployment Estimates
Disease Outbreak
Stock Market Prediction
Chapter 4: Clustering
Case Study: Determination of Short Tail Keywords for Marketing
Features’ Exploration
Supervised vs. Unsupervised Learning
Supervised Learning
Unsupervised Learning
Clustering
Data Transformation for Modeling
Metrics of Evaluating Clustering Models
Clustering Models
k-Means Clustering
Elbow Method
Variance Explained
Bayesian Information Criterion Score
Silhouette Score
Applying k-Means Clustering for Optimal Number of Clusters
Principle Component Analysis
Gaussian Mixture Model
Bayesian Gaussian Mixture Model
Applications of Clustering
Identifying Diseases
Document Clustering in Search Engines
Demographic-Based Customer Segmentation
Chapter 5: Classification
Case Study: Ohio Clinic—Meeting Supply and Demand
Features’ Exploration
Performing Data Wrangling
Performing Exploratory Data Analysis
Features’ Generation
Classification
Model Evaluation Techniques
Confusion Matrix
Binary Classification: Receiver Operating Characteristic
Ensuring Cross-Validation by Splitting the Dataset
Decision Tree Classification
Kernel Approximation
SGD Classifier
Ensemble Methods
Bagging
Boosting
Random Forest Classification
Gradient Boosting
Applications of Classification
Image Classification
Music Classification
E-mail Spam Filtering
Insurance
Appendix A: Chart types and when to use them
Pie chart
Bar graph
Histogram
Stem and Leaf plot
Box plot
Index