Applied Supervised Learning with Python: Use Scikit-Learn to Build Predictive Models from Real-world Datasets and Prepare Yourself for the Future of Machine Learning

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Explore the exciting world of machine learning with the fastest growing technology in the world Key Features Understand various machine learning concepts with real-world examples Implement a supervised machine learning pipeline from data ingestion to validation Gain insights into how you can use machine learning in everyday life Book Description Machine learning--the ability of a machine to give right answers based on input data--has revolutionized the way we do business. Applied Supervised Learning with Python provides a rich understanding of how you can apply machine learning techniques in your data science projects using Python. You'll explore Jupyter Notebooks, the technology used commonly in academic and commercial circles with in-line code running support. With the help of fun examples, you'll gain experience working on the Python machine learning toolkit--from performing basic data cleaning and processing to working with a range of regression and classification algorithms. Once you've grasped the basics, you'll learn how to build and train your own models using advanced techniques such as decision trees, ensemble modeling, validation, and error metrics. You'll also learn data visualization techniques using powerful Python libraries such as Matplotlib and Seaborn. This book also covers ensemble modeling and random forest classifiers along with other methods for combining results from multiple models, and concludes by delving into cross-validation to test your algorithm and check how well the model works on unseen data. By the end of this book, you'll be equipped to not only work with machine learning algorithms, but also be able to create some of your own! What you will learn Understand the concept of supervised learning and its applications Implement common supervised learning algorithms using machine learning Python libraries Validate models using the k-fold technique Build your models with decision trees to get results effortlessly Use ensemble modeling techniques to improve the performance of your model Apply a variety of metrics to compare machine learning models Who this book is for Applied Supervised Learning with Python is for you if you want to gain a solid understanding of machine learning using Python. It'll help if you to have some experience in any functional or object-oriented language and a basic understanding of Python libraries and expressions, such as arrays and dictionaries.

Author(s): Benjamin Johnston; Ishita Mathur
Publisher: Packt Publishing
Year: 2019

Language: English
Pages: 404

Cover
FM
Copyright
Table of Contents
Preface
Chapter 1: Python Machine Learning Toolkit
Introduction
Supervised Machine Learning
When to Use Supervised Learning
Why Python?
Jupyter Notebooks
Exercise 1: Launching a Jupyter Notebook
Exercise 2: Hello World
Exercise 3: Order of Execution in a Jupyter Notebook
Exercise 4: Advantages of Jupyter Notebooks
Python Packages and Modules
pandas
Loading Data in pandas
Exercise 5: Loading and Summarizing the Titanic Dataset
Exercise 6: Indexing and Selecting Data
Exercise 7: Advanced Indexing and Selection
pandas Methods
Exercise 8: Splitting, Applying, and Combining Data Sources
Lambda Functions
Exercise 9: Lambda Functions
Data Quality Considerations
Managing Missing Data
Class Imbalance
Low Sample Size
Activity 1: pandas Functions
Summary
Chapter 2: Exploratory Data Analysis and Visualization
Introduction
Exploratory Data Analysis (EDA)
Exercise 10: Importing Libraries for Data Exploration
Summary Statistics and Central Values
Standard Deviation
Percentiles
Exercise 11: Summary Statistics of Our Dataset
Missing Values
Finding Missing Values
Exercise 12: Visualizing Missing Values
Imputation Strategies for Missing Values
Exercise 13: Imputation Using pandas
Exercise 14: Imputation Using scikit-learn
Exercise 15: Imputation Using Inferred Values
Activity 2: Summary Statistics and Missing Values
Distribution of Values
Target Variable
Exercise 16: Plotting a Bar Chart
Categorical Data
Exercise 17: Datatypes for Categorical Variables
Exercise 18: Calculating Category Value Counts
Exercise 19: Plotting a Pie Chart
Continuous Data
Exercise 20: Plotting a Histogram
Exercise 21: Skew and Kurtosis
Activity 3: Visually Representing the Distribution of Values
Relationships within the Data
Relationship between Two Continuous Variables
Exercise 22: Plotting a Scatter Plot
Exercise 23: Correlation Heatmap
Exercise 24: Pairplot
Relationship between a Continuous and a Categorical Variable
Exercise 25: Bar Chart
Exercise 26: Box Plot
Relationship between Two Categorical Variables
Exercise 27: Stacked Bar Chart
Activity 4: Relationships Within the Data
Summary
Chapter 3: Regression Analysis
Introduction
Regression and Classification Problems
Data, Models, Training, and Evaluation
Linear Regression
Exercise 28: Plotting Data with a Moving Average
Activity 5: Plotting Data with a Moving Average
Least Squares Method
The scikit-learn Model API
Exercise 29: Fitting a Linear Model Using the Least Squares Method
Activity 6: Linear Regression Using the Least Squares Method
Linear Regression with Dummy Variables
Exercise 30: Introducing Dummy Variables
Activity 7: Dummy Variables
Parabolic Model with Linear Regression
Exercise 31: Parabolic Models with Linear Regression
Activity 8: Other Model Types with Linear Regression
Generic Model Training
Gradient Descent
Exercise 32: Linear Regression with Gradient Descent
Exercise 33: Optimizing Gradient Descent
Activity 9: Gradient Descent
Multiple Linear Regression
Exercise 34: Multiple Linear Regression
Autoregression Models
Exercise 35: Creating an Autoregression Model
Activity 10: Autoregressors
Summary
Chapter 4: Classification
Introduction
Linear Regression as a Classifier
Exercise 36: Linear Regression as a Classifier
Logistic Regression
Exercise 37: Logistic Regression as a Classifier – Two-Class Classifier
Exercise 38: Logistic Regression – Multiclass Classifier
Activity 11: Linear Regression Classifier – Two-Class Classifier
Activity 12: Iris Classification Using Logistic Regression
Classification Using K-Nearest Neighbors
Exercise 39: K-NN Classification
Exercise 40: Visualizing K-NN Boundaries
Activity 13: K-NN Multiclass Classifier
Classification Using Decision Trees
Exercise 41: ID3 Classification
Exercise 42: Iris Classification Using a CART Decision Tree
Summary
Chapter 5: Ensemble Modeling
Introduction
Exercise 43: Importing Modules and Preparing the Dataset
Overfitting and Underfitting
Underfitting
Overfitting
Overcoming the Problem of Underfitting and Overfitting
Bagging
Bootstrapping
Bootstrap Aggregation
Exercise 44: Using the Bagging Classifier
Random Forest
Exercise 45: Building the Ensemble Model Using Random Forest
Boosting
Adaptive Boosting
Exercise 46: Adaptive Boosting
Gradient Boosting
Exercise 47: GradientBoostingClassifier
Stacking
Exercise 48: Building a Stacked Model
Activity 14: Stacking with Standalone and Ensemble Algorithms
Summary
Chapter 6: Model Evaluation
Introduction
Exercise 49: Importing the Modules and Preparing Our Dataset
Evaluation Metrics
Regression
Exercise 50: Regression Metrics
Classification
Exercise 51: Classification Metrics
Splitting the Dataset
Hold-out Data
K-Fold Cross-Validation
Sampling
Exercise 52: K-Fold Cross-Validation with Stratified Sampling
Performance Improvement Tactics
Variation in Train and Test Error
Hyperparameter Tuning
Exercise 53: Hyperparameter Tuning with Random Search
Feature Importance
Exercise 54: Feature Importance Using Random Forest
Activity 15: Final Test Project
Summary
Appendix
Index