Author(s): Yuxi (Hayden) Liu
Edition: 2
Language: English
Tags: AI, artificial intelligence
Title Page
Copyright and Credits
Python Machine Learning By Example Second Edition
About Packt
Why subscribe?
Packt.com
Dedication
Foreword
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Fundamentals of Machine Learning
Getting Started with Machine Learning and Python
Defining machine learning and why we need it
A very high-level overview of machine learning technology
Types of machine learning tasks
A brief history of the development of machine learning algorithms
Core of machine learning – generalizing with data
Overfitting, underfitting, and the bias-variance trade-off
Avoiding overfitting with cross-validation
Avoiding overfitting with regularization
Avoiding overfitting with feature selection and dimensionality reduction
Preprocessing, exploration, and feature engineering
Missing values
Label encoding
One hot encoding
Scaling
Polynomial features
Power transform
Binning
Combining models
Voting and averaging
Bagging
Boosting
Stacking
Installing software and setting up
Setting up Python and environments
Installing the various packages
NumPy
SciPy
Pandas
Scikit-learn
TensorFlow
Summary
Exercises
Section 2: Practical Python Machine Learning By Example
Exploring the 20 Newsgroups Dataset with Text Analysis Techniques
How computers understand language - NLP
Picking up NLP basics while touring popular NLP libraries
Corpus
Tokenization
PoS tagging
Named-entity recognition
Stemming and lemmatization
Semantics and topic modeling
Getting the newsgroups data
Exploring the newsgroups data
Thinking about features for text data
Counting the occurrence of each word token
Text preprocessing
Dropping stop words
Stemming and lemmatizing words
Visualizing the newsgroups data with t-SNE
What is dimensionality reduction?
t-SNE for dimensionality reduction
Summary
Exercises
Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms
Learning without guidance – unsupervised learning
Clustering newsgroups data using k-means
How does k-means clustering work?
Implementing k-means from scratch
Implementing k-means with scikit-learn
Choosing the value of k
Clustering newsgroups data using k-means
Discovering underlying topics in newsgroups
Topic modeling using NMF
Topic modeling using LDA
Summary
Exercises
Detecting Spam Email with Naive Bayes
Getting started with classification
Types of classification
Applications of text classification
Exploring Naïve Bayes
Learning Bayes' theorem by examples
The mechanics of Naïve Bayes
Implementing Naïve Bayes from scratch
Implementing Naïve Bayes with scikit-learn
Classification performance evaluation
Model tuning and cross-validation
Summary
Exercise
Classifying Newsgroup Topics with Support Vector Machines
Finding separating boundary with support vector machines
Understanding how SVM works through different use cases
Case 1 – identifying a separating hyperplane
Case 2 – determining the optimal hyperplane
Case 3 – handling outliers
Implementing SVM
Case 4 – dealing with more than two classes
The kernels of SVM
Case 5 – solving linearly non-separable problems
Choosing between linear and RBF kernels
Classifying newsgroup topics with SVMs
More example – fetal state classification on cardiotocography
A further example – breast cancer classification using SVM with TensorFlow
Summary
Exercise
Predicting Online Ad Click-Through with Tree-Based Algorithms
Brief overview of advertising click-through prediction
Getting started with two types of data – numerical and categorical
Exploring decision tree from root to leaves
Constructing a decision tree
The metrics for measuring a split
Implementing a decision tree from scratch
Predicting ad click-through with decision tree
Ensembling decision trees – random forest
Implementing random forest using TensorFlow
Summary
Exercise
Predicting Online Ad Click-Through with Logistic Regression
Converting categorical features to numerical – one-hot encoding and ordinal encoding
Classifying data with logistic regression
Getting started with the logistic function
Jumping from the logistic function to logistic regression
Training a logistic regression model
Training a logistic regression model using gradient descent
Predicting ad click-through with logistic regression using gradient descent
Training a logistic regression model using stochastic gradient descent
Training a logistic regression model with regularization
Training on large datasets with online learning
Handling multiclass classification
Implementing logistic regression using TensorFlow
Feature selection using random forest
Summary
Exercises
Scaling Up Prediction to Terabyte Click Logs
Learning the essentials of Apache Spark
Breaking down Spark
Installing Spark
Launching and deploying Spark programs
Programming in PySpark
Learning on massive click logs with Spark
Loading click logs
Splitting and caching the data
One-hot encoding categorical features
Training and testing a logistic regression model
Feature engineering on categorical variables with Spark
Hashing categorical features
Combining multiple variables – feature interaction
Summary
Exercises
Stock Price Prediction with Regression Algorithms
Brief overview of the stock market and stock prices
What is regression?
Mining stock price data
Getting started with feature engineering
Acquiring data and generating features
Estimating with linear regression
How does linear regression work?
Implementing linear regression
Estimating with decision tree regression
Transitioning from classification trees to regression trees
Implementing decision tree regression
Implementing regression forest
Estimating with support vector regression
Implementing SVR
Estimating with neural networks
Demystifying neural networks
Implementing neural networks
Evaluating regression performance
Predicting stock price with four regression algorithms
Summary
Exercise
Section 3: Python Machine Learning Best Practices
Machine Learning Best Practices
Machine learning solution workflow
Best practices in the data preparation stage
Best practice 1 – completely understanding the project goal
Best practice 2 – collecting all fields that are relevant
Best practice 3 – maintaining the consistency of field values
Best practice 4 – dealing with missing data
Best practice 5 – storing large-scale data
Best practices in the training sets generation stage
Best practice 6 – identifying categorical features with numerical values
Best practice 7 – deciding on whether or not to encode categorical features
Best practice 8 – deciding on whether or not to select features, and if so, how to do so
Best practice 9 – deciding on whether or not to reduce dimensionality, and if so, how to do so
Best practice 10 – deciding on whether or not to rescale features
Best practice 11 – performing feature engineering with domain expertise
Best practice 12 – performing feature engineering without domain expertise
Best practice 13 – documenting how each feature is generated
Best practice 14 – extracting features from text data
Best practices in the model training, evaluation, and selection stage
Best practice 15 – choosing the right algorithm(s) to start with
Naïve Bayes
Logistic regression
SVM
Random forest (or decision tree)
Neural networks
Best practice 16 – reducing overfitting
Best practice 17 – diagnosing overfitting and underfitting
Best practice 18 – modeling on large-scale datasets
Best practices in the deployment and monitoring stage
Best practice 19 – saving, loading, and reusing models
Best practice 20 – monitoring model performance
Best practice 21 – updating models regularly
Summary
Exercises
Other Books You May Enjoy
Leave a review - let other readers know what you think