Data Science Projects with Python: A case study approach to gaining valuable insights from real data with machine learning, 2nd Edition

Share
Add to Wishlist
PDF

Computers\\Programming

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Download

Search on Amazon

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Gain hands-on experience in Python programming with industry-standard machine learning tools using pandas, scikit-learn, and XGBoost

Key Features

Think critically about data by exploring and cleaning it
Choose an appropriate machine learning model and train it on your data
Communicate data-driven insights with confidence and clarity

Book Description

If data is the new oil, then machine learning is the drill. As companies gain access to ever-increasing quantities of raw data, the ability to deliver state-of-the-art predictive models that support business decision-making becomes more and more valuable.

In this book, you'll work on an end-to-end project based around a realistic data set and split up into bite-sized practical exercises. This creates a case-study approach that simulates the working conditions you'll experience in real-world data science projects.

You'll learn how to use key Python packages, including pandas, Matplotlib, and scikit-learn, and master the process of data exploration and data processing, before moving on to fitting, evaluating, and tuning algorithms such as regularized logistic regression and random forest.

Now in its second edition, this book will take you through the process of exploring data and delivering machine learning models. Updated to the latest version of Python, this new edition for 2021 includes brand new content on XGBoost, SHAP values, and how to evaluate and monitor machine learning models.

By the end of this data science book, you'll have the skills, understanding, and confidence to build your own machine learning models and gain insights from real data.

What You Will Learn

Load, explore, and process data using the pandas Python package
Use Matplotlib to create compelling data visualizations
Implement predictive machine learning models with scikit-learn
Use lasso and ridge regression to reduce model overfitting
Evaluate random forest and logistic regression model performance
Create state-of-the-art models with XGBoost
Learn to use SHAP values to explain model predictions
Deliver business insights by presenting clear, convincing conclusions

Who This Book Is For

Data Science Projects with Python - Second Edition is for anyone who wants to get started with data science and machine learning. If you're keen to advance your career by using data analysis and predictive modeling to generate business insights, then this book is the perfect place to begin. To quickly grasp the concepts covered, it is recommended that you have basic experience of programming with Python or another similar language, and a general interest in statistics.

Author(s): Stephen Klosterman
Publisher: Packt Publishing
Year: 2021

Language: English
Commentary: True PDF
Pages: 420

Cover
FM
Copyright
Table of Contents
Preface
Chapter 1: Data Exploration and Cleaning
Introduction
Python and the Anaconda Package Management System
Indexing and the Slice Operator
Exercise 1.01: Examining Anaconda and Getting Familiar with Python
Different Types of Data Science Problems
Loading the Case Study Data with Jupyter and pandas
Exercise 1.02: Loading the Case Study Data in a Jupyter Notebook
Getting Familiar with Data and Performing Data Cleaning
The Business Problem
Data Exploration Steps
Exercise 1.03: Verifying Basic Data Integrity
Boolean Masks
Exercise 1.04: Continuing Verification of Data Integrity
Exercise 1.05: Exploring and Cleaning the Data
Data Quality Assurance and Exploration
Exercise 1.06: Exploring the Credit Limit and Demographic Features
Deep Dive: Categorical Features
Exercise 1.07: Implementing OHE for a Categorical Feature
Exploring the Financial History Features in the Dataset
Activity 1.01: Exploring the Remaining Financial Features in the Dataset
Summary
Chapter 2: Introduction to Scikit-Learn and Model Evaluation
Introduction
Exploring the Response Variable and Concluding the Initial Exploration
Introduction to Scikit-Learn
Generating Synthetic Data
Data for Linear Regression
Exercise 2.01: Linear Regression in Scikit-Learn
Model Performance Metrics for Binary Classification
Splitting the Data: Training and Test Sets
Classification Accuracy
True Positive Rate, False Positive Rate, and Confusion Matrix
Exercise 2.02: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python
Discovering Predicted Probabilities: How Does Logistic Regression Make Predictions?
Exercise 2.03: Obtaining Predicted Probabilities from a Trained Logistic Regression Model
The Receiver Operating Characteristic (ROC) Curve
Precision
Activity 2.01: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve
Summary
Chapter 3: Details of Logistic Regression and Feature Exploration
Introduction
Examining the Relationships Between Features and the Response Variable
Pearson Correlation
Mathematics of Linear Correlation
F-test
Exercise 3.01: F-test and Univariate Feature Selection
Finer Points of the F-test: Equivalence to the t-test for Two Classes and Cautions
Hypotheses and Next Steps
Exercise 3.02: Visualizing the Relationship Between the Features and Response Variable
Univariate Feature Selection: What it Does and Doesn't Do
Understanding Logistic Regression and the Sigmoid Function Using Function Syntax in Python
Exercise 3.03: Plotting the Sigmoid Function
Scope of Functions
Why Is Logistic Regression Considered a Linear Model?
Exercise 3.04: Examining the Appropriateness of Features for Logistic Regression
From Logistic Regression Coefficients to Predictions Using Sigmoid
Exercise 3.05: Linear Decision Boundary of Logistic Regression
Activity 3.01: Fitting a Logistic Regression Model and Directly Using the Coefficients
Summary
Chapter 4: The Bias-Variance Trade-Off
Introduction
Estimating the Coefficients and Intercepts of Logistic Regression
Gradient Descent to Find Optimal Parameter Values
Exercise 4.01: Using Gradient Descent to Minimize a Cost Function
Assumptions of Logistic Regression
The Motivation for Regularization: The Bias-Variance Trade-Off
Exercise 4.02: Generating and Modeling Synthetic Classification Data
Lasso (L1) and Ridge (L2) Regularization
Cross-Validation: Choosing the Regularization Parameter
Exercise 4.03: Reducing Overfitting on the Synthetic Data Classification Problem
Options for Logistic Regression in Scikit-Learn
Scaling Data, Pipelines, and Interaction Features in Scikit-Learn
Activity 4.01: Cross-Validation and Feature Engineering with the Case Study Data
Summary
Chapter 5: Decision Trees and Random Forests
Introduction
Decision Trees
The Terminology of Decision Trees and Connections to Machine Learning
Exercise 5.01: A Decision Tree in Scikit-Learn
Training Decision Trees: Node Impurity
Features Used for the First Splits: Connections to Univariate Feature Selection and Interactions
Training Decision Trees: A Greedy Algorithm
Training Decision Trees: Different Stopping Criteria and Other Options
Using Decision Trees: Advantages and Predicted Probabilities
A More Convenient Approach to Cross-Validation
Exercise 5.02: Finding Optimal Hyperparameters for a Decision Tree
Random Forests: Ensembles of Decision Trees
Random Forest: Predictions and Interpretability
Exercise 5.03: Fitting a Random Forest
Checkerboard Graph
Activity 5.01: Cross-Validation Grid Search with Random Forest
Summary
Chapter 6: Gradient Boosting, XGBoost, and SHAP Values
Introduction
Gradient Boosting and XGBoost
What Is Boosting?
Gradient Boosting and XGBoost
XGBoost Hyperparameters
Early Stopping
Tuning the Learning Rate
Other Important Hyperparameters in XGBoost
Exercise 6.01: Randomized Grid Search for Tuning XGBoost Hyperparameters
Another Way of Growing Trees: XGBoost's grow_policy
Explaining Model Predictions with SHAP Values
Exercise 6.02: Plotting SHAP Interactions, Feature Importance, and Reconstructing Predicted Probabilities from SHAP Values
Missing Data
Saving Python Variables to a File
Activity 6.01: Modeling the Case Study Data with XGBoost and Explaining the Model with SHAP
Summary
Chapter 7: Test Set Analysis, Financial Insights, and Delivery to the Client
Introduction
Review of Modeling Results
Feature Engineering
Ensembling Multiple Models
Different Modeling Techniques
Balancing Classes
Model Performance on the Test Set
Distribution of Predicted Probability and Decile Chart
Exercise 7.01: Equal-Interval Chart
Calibration of Predicted Probabilities
Financial Analysis
Financial Conversation with the Client
Exercise 7.02: Characterizing Costs and Savings
Activity 7.01: Deriving Financial Insights
Final Thoughts on Delivering a Predictive Model to the Client
Model Monitoring
Ethics in Predictive Modeling
Summary
Appendix
Index

Data Science Projects with Python: A case study approach to gaining valuable insights from real data with machine learning, 2nd Edition

My Account

Infomation

Talk To Us

Data Science Projects with Python: A case study approach to gaining valuable insights from real data with machine learning, 2nd Edition

How to Download?

Is it Free?

Book is not loading. What to do?