Discover valuable machine learning techniques you can understand and apply using just high-school math.
In Grokking Machine Learning you will learn:
• Supervised algorithms for classifying and splitting data
• Methods for cleaning and simplifying data
• Machine learning packages and tools
• Neural networks and ensemble methods for complex datasets
Grokking Machine Learning teaches you how to apply ML to your projects using only standard Python code and high school-level math. No specialist knowledge is required to tackle the hands-on exercises using Python and readily available machine learning tools. Packed with easy-to-follow Python-based exercises and mini-projects, this book sets you on the path to becoming a machine learning expert.
About the technology
Discover powerful machine learning techniques you can understand and apply using only high school math! Put simply, machine learning is a set of techniques for data analysis based on algorithms that deliver better results as you give them more data. ML powers many cutting-edge technologies, such as recommendation systems, facial recognition software, smart speakers, and even self-driving cars. This unique book introduces the core concepts of machine learning, using relatable examples, engaging exercises, and crisp illustrations.
About the book
Grokking Machine Learning presents machine learning algorithms and techniques in a way that anyone can understand. This book skips the confused academic jargon and offers clear explanations that require only basic algebra. As you go, you’ll build interesting projects with Python, including models for spam detection and image recognition. You’ll also pick up practical skills for cleaning and preparing data.
What's inside
• Supervised algorithms for classifying and splitting data
• Methods for cleaning and simplifying data
• Machine learning packages and tools
• Neural networks and ensemble methods for complex datasets
About the reader
For readers who know basic Python. No machine learning knowledge necessary.
About the author
Luis G. Serrano is a research scientist in quantum artificial intelligence. Previously, he was a Machine Learning Engineer at Google and Lead Artificial Intelligence Educator at Apple.
Author(s): Luis Serrano
Edition: 1
Publisher: Manning Publications
Year: 2021
Language: English
Commentary: Vector PDF
Pages: 512
City: Shelter Island, NY
Tags: Machine Learning; Neural Networks; Unsupervised Learning; Reinforcement Learning; Decision Trees; Supervised Learning; Classification; Kernel Methods; Support Vector Machines; Regularization; Linear Regression; Logistic Regression; Ensemble Learning; Perceptron; Overfitting; Underfitting; Elementary; Naïve Bayes; XGBoost
Machine Learning
contents
foreword
preface
acknowledgments
about this book
How this book is organized: A roadmap
About the code
liveBook discussion forum
about the author
1 What is machine learning? It is common sense, except done by a computer
I am super happy to join you in your learning journey!
Machine learning is everywhere
Do I need a heavy math and coding background to understand machine learning?
Formulas and code are fun when seen as a language
OK, so what exactly is machine learning?
What is artificial intelligence?
What is machine learning?
And now that we’re at it, what is deep learning?
How do we get machines to make decisions with data? The remember-formulate-predict framework
How do humans think?
Some machine learning lingo-models and algorithms
Some examples of models that humans use
Some examples of models that machines use
Summary
2 Types of machine learning
What is the difference between labeled and unlabeled data?
What is data?
And what are features?
Labels?
Predictions
Labeled and unlabeled data
Supervised learning: The branch of machine learning that works with labeled data
Regression models predict numbers
Classification models predict a state
Unsupervised learning: The branch of machine learning that works with unlabeled data
Clustering algorithms split a dataset into similar groups
Dimensionality reduction simplifies data without losing too much information
Other ways of simplifying our data: Matrix factorization and singular value decomposition
Generative machine learning
What is reinforcement learning?
Summary
Exercises
3 Drawing a line close to our points: Linear regression
The problem: We need to predict the price of a house
The solution: Building a regression model for housing prices
The remember step: Looking at the prices of existing houses
The formulate step: Formulating a rule that estimates the price of the house
The predict step: What do we do when a new house comes on the market?
What if we have more variables? Multivariate linear regression
Some questions that arise and some quick answers
How to get the computer to draw this line: The linear regression algorithm
Crash course on slope and y-intercept
A simple trick to move a line closer to a set of points, one point at a time
The square trick: A much more clever way of moving our line closer to one of the points
The absolute trick: Another useful trick to move the line closer to the points
The linear regression algorithm: Repeating the absolute or square trick many times to move the line
Loading our data and plotting it
Using the linear regression algorithm in our dataset
Using the model to make predictions
The general linear regression algorithm (optional)
How do we measure our results? The error function
The absolute error: A metric that tells us how good our model is by adding distances
The square error: A metric that tells us how good our model is by adding squares of distances
Mean absolute and (root) mean square errors are more common in real life
Gradient descent: How to decrease an error function by slowly descending from a mountain
Plotting the error function and knowing when to stop running the algorithm
Do we train using one point at a time or many? Stochastic and batch gradient descent
Real-life application: Using Turi Create to predict housing prices in India
What if the data is not in a line? Polynomial regression
A special kind of curved functions: Polynomials
Nonlinear data? No problem: Let’s try to fit a polynomial curve to it
Parameters and hyperparameters
Applications of regression
Recommendation systems
Video and music recommendations
Product recommendations
Health care
Summary
Exercises
4 Optimizing the training process: Underfitting, overfitting, testing, and regularization
An example of underfitting and overfitting using polynomial regression
How do we get the computer to pick the right model? By testing
How do we pick the testing set, and how big should it be?
Can we use our testing data for training the model? No.
Where did we break the golden rule, and how do we fix it? The validation set
A numerical way to decide how complex our model should be: The model complexity graph
Another alternative to avoiding overfitting: Regularization
Another example of overfitting: Movie recommendations
Measuring how complex a model is: L1 and L2 norm
Modifying the error function to solve our problem: Lasso regression and ridge regression
Regulating the amount of performance and complexity in our model: The regularization parameter
Effects of L1 and L2 regularization in the coefficients of the model
An intuitive way to see regularization
Polynomial regression, testing, and regularization with Turi Create
Summary
Exercises
5 Using lines to split our points: The perceptron algorithm
The problem: We are on an alien planet, and we don’t know their language!
A slightly more complicated planet
Does our classifier need to be correct all the time? No
A more general classifier and a slightly different way to define lines
The step function and activation functions: A condensed way to get predictions
What happens if I have more than two words? General definition of the perceptron classifier
The bias, the y-intercept, and the inherent mood of a quiet alien
How do we determine whether a classifier is good or bad? The error function
How to compare classifiers? The error function
How to find a good classifier? The perceptron algorithm
The perceptron trick: A way to slightly improve the perceptron
Repeating the perceptron trick many times: The perceptron algorithm
Gradient descent
Stochastic and batch gradient descent
Coding the perceptron algorithm
Coding the perceptron trick
Coding the perceptron algorithm
Coding the perceptron algorithm using Turi Create
Applications of the perceptron algorithm
Spam email filters
Recommendation Systems
Health care
Computer vision
Summary
Exercises
6 A continuous approach to splitting points: Logistic classifiers
Logistic classifiers: A continuous version of perceptron classifiers
A probability approach to classification: The sigmoid function
The dataset and the predictions
The error functions: Absolute, square, and log loss
Comparing classifiers using the log loss
How to find a good logistic classifier? The logistic regression algorithm
The logistic trick: A way to slightly improve the continuous perceptron
Repeating the logistic trick many times: The logistic regression algorithm
Stochastic, mini-batch, and batch gradient descent
Coding the logistic regression algorithm
Coding the logistic regression algorithm by hand
Real-life application: Classifying IMDB reviews with Turi Create
Classifying into multiple classes: The softmax function
Summary
Exercises
7 How do you measure classification models? Accuracy and its friends
Accuracy: How often is my model correct?
Two examples of models: Coronavirus and spam email
A super effective yet super useless model
How to fix the accuracy problem? Defining different types of errors and how to measure them
False positives and false negatives: Which one is worse?
Storing the correctly and incorrectly classified points in a table: The confusion matrix
Recall: Among the positive examples, how many did we correctly classify?
Precision: Among the examples we classified as positive, how many did we correctly classify?
Combining recall and precision as a way to optimize both: The F-score
Recall, precision, or F-scores: Which one should we use?
A useful tool to evaluate our model: The receiver operating characteristic (ROC) curve
Sensitivity and specificity: Two new ways to evaluate our model
The receiver operating characteristic (ROC) curve: A way to optimize sensitivity and specificity in
A metric that tells us how good our model is: The AUC (area under the curve)
How to make decisions using the ROC curve
Recall is sensitivity, but precision and specificity are different
Summary
Exercises
8 Using probability to its maximum: The naive Bayes model
Sick or healthy? A story with Bayes’ theorem as the hero
Prelude to Bayes’ theorem: The prior, the event, and the posterior
Use case: Spam-detection model
Finding the prior: The probability that any email is spam
Finding the posterior: The probability that an email is spam, knowing that it contains a particular
What the math just happened? Turning ratios into probabilities
What about two words? The naive Bayes algorithm
What about more than two words?
Building a spam-detection model with real data
Data preprocessing
Finding the priors
Finding the posteriors with Bayes’ theorem
Implementing the naive Bayes algorithm
Further work
Summary
Exercises
9 Splitting data by asking questions: Decision trees
The problem: We need to recommend apps to users according to what they are likely to download
The solution: Building an app-recommendation system
First step to build the model: Asking the best question
Second step to build the model: Iterating
Last step: When to stop building the tree and other hyperparameters
The decision tree algorithm: How to build a decision tree and make predictions with it
Beyond questions like yes/no
Splitting the data using non-binary categorical features, such as dog/cat/bird
Splitting the data using continuous features, such as age
The graphical boundary of decision trees
Using Scikit-Learn to build a decision tree
Real-life application: Modeling student admissions with Scikit-Learn
Setting hyperparameters in Scikit-Learn
Decision trees for regression
Applications
Decision trees are widely used in health care
Decision trees are useful in recommendation systems
Summary
Exercises
10 Combining building blocks to gain more power: Neural networks
Neural networks with an example: A more complicated alien planet
Solution: If one line is not enough, use two lines to classify your dataset
Why two lines? Is happiness not linear?
Combining the outputs of perceptrons into another perceptron
A graphical representation of perceptrons
A graphical representation of neural networks
The boundary of a neural network
The general architecture of a fully connected neural network
Training neural networks
Error function: A way to measure how the neural network is performing
Backpropagation: The key step in training the neural network
Potential problems: From overfitting to vanishing gradients
Techniques for training neural networks: Regularization and dropout
Different activation functions: Hyperbolic tangent (tanh) and the rectified linear unit (ReLU)
Neural networks with more than one output: The softmax function
Hyperparameters
Coding neural networks in Keras
A graphical example in two dimensions
Training a neural network for image recognition
Neural networks for regression
Other architectures for more complex datasets
How neural networks see: Convolutional neural networks (CNN)
How neural networks talk: Recurrent neural networks (RNN), gated recurrent units (GRU), and long sho
How neural networks paint paintings: Generative adversarial networks (GAN)
Summary
Exercises
11 Finding boundaries with style: Support vector machines and the kernel method
Using a new error function to build better classifiers
Classification error function: Trying to classify the points correctly
Distance error function: Trying to separate our two lines as far apart as possible
Adding the two error functions to obtain the error function
Do we want our SVM to focus more on classification or distance? The C parameter can help us
Coding support vector machines in Scikit-Learn
Coding a simple SVM
The C parameter
Training SVMs with nonlinear boundaries: The kernel method
Using polynomial equations to our benefit: The polynomial kernel
Using bumps in higher dimensions to our benefit: The radial basis function (RBF) kernel
Training an SVM with the RBF kernel
Coding the kernel method
Summary
Exercises
12 Combining models to maximize results: Ensemble learning
With a little help from our friends
Bagging: Joining some weak learners randomly to build a strong learner
Fitting a random forest manually
Training a random forest in Scikit-Learn
AdaBoost: Joining weak learners in a clever way to build a strong learner
A big picture of AdaBoost: Building the weak learners
Combining the weak learners into a strong learner
Coding AdaBoost in Scikit-Learn
Gradient boosting: Using decision trees to build strong learners
XGBoost: An extreme way to do gradient boosting
XGBoost similarity score: A new and effective way to measure similarity in a set
Building the weak learners
Tree pruning: A way to reduce overfitting by simplifying the weak learners
Making the predictions
Training an XGBoost model in Python
Applications of ensemble methods
Summary
Exercises
13 Putting it all in practice: A real-life example of data engineering and machine learning
The Titanic dataset
The features of our dataset
Using Pandas to load the dataset
Using Pandas to study our dataset
Cleaning up our dataset: Missing values and how to deal with them
Dropping columns with missing data
How to not lose the entire column: Filling in missing data
Feature engineering: Transforming the features in our dataset before training the models
Turning categorical data into numerical data: One-hot encoding
Turning numerical data into categorical data (and why would we want to do this?): Binning
Feature selection: Getting rid of unnecessary features
Training our models
Splitting the data into features and labels, and training and validation
Training several models on our dataset
Which model is better? Evaluating the models
Testing the model
Tuning the hyperparameters to find the best model: Grid search
Using K-fold cross-validation to reuse our data as training and validation
Summary
Exercises
Appendix A: Solutions to the exercises
Chapter 2: Types of machine learning
Exercise 2.1
Exercise 2.2
Exercise 2.3
Chapter 3: Drawing a line close to our points: Linear regression
Exercise 3.1
Exercise 3.2
Exercise 3.3
Exercise 3.4
Chapter 4: Optimizing the training process: Underfitting, overfitting, testing, and regularization
Exercise 4.1
Exercise 4.2
Chapter 5: Using lines to split our points: The perceptron algorithm
Exercise 5.1
Exercise 5.2
Exercise 5.3
Chapter 6: A continuous approach to splitting points: Logistic classifiers
Exercise 6.1
Exercise 6.2
Exercise 6.3
Chapter 7: How do you measure classification models? Accuracy and its friends
Exercise 7.1
Exercise 7.2
Exercise 7.3
Exercise 7.4
Chapter 8: Using probability to its maximum: The naive Bayes model
Exercise 8.1
Exercise 8.2
Exercise 8.3
Chapter 9: Splitting data by asking questions: Decision trees
Exercise 9.1
Exercise 9.2
Exercise 9.3
Chapter 10: Combining building blocks to gain more power: Neural networks
Exercise 10.1
Exercise 10.2
Exercise 10.3
Chapter 11: Finding boundaries with style: Support vector machines and the kernel method
Exercise 11.1
Exercise 11.2
Chapter 12: Combining models to maximize results: Ensemble learning
Exercise 12.1
Exercise 12.2
Chapter 13: Putting it all in practice: A real-life example of data engineering and machine learning
Exercise 13.1
Appendix B: The math behind gradient descent: Coming down a mountain using derivatives and slopes
Using gradient descent to decrease functions
Using gradient descent to train models
Using gradient descent to train linear regression models
Using gradient descent to train classification models
Using gradient descent to train neural networks
Using gradient descent for regularization
Getting stuck on local minima: How it happens, and how we solve it
Appendix C: References
General references
Courses
Blogs and YouTube channels
Books
Chapter 1
Videos
Chapter 2
Videos
Books
Courses
Chapter 3
Code
Datasets
Videos
Chapter 4
Code
Videos
Chapter 5
Code
Videos
Chapter 6
Code
Datasets
Videos
Chapter 7
Videos
Chapter 8
Code
Datasets
Visibility: Public Videos
Chapter 9
Code
Datasets
Videos
Blog post
Chapter 10
Code
Datasets
Videos
Books
Courses
Blog posts
Tools
Chapter 11
Code
Videos
Blog posts
Chapter 12
Code
Videos
Articles and blog posts
Chapter 13
Code
Datasets
Graphics and image icons
index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
R
S
T
U
V
W
X