Python: Real World Machine Learning

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Learn to solve challenging data science problems by building powerful machine learning models using Python About This Book Understand which algorithms to use in a given context with the help of this exciting recipe-based guide This practical tutorial tackles real-world computing problems through a rigorous and effective approach Build state-of-the-art models and develop personalized recommendations to perform machine learning at scale Who This Book Is For This Learning Path is for Python programmers who are looking to use machine learning algorithms to create real-world applications. It is ideal for Python professionals who want to work with large and complex datasets and Python developers and analysts or data scientists who are looking to add to their existing skills by accessing some of the most powerful recent trends in data science. Experience with Python, Jupyter Notebooks, and command-line execution together with a good level of mathematical knowledge to understand the concepts is expected. Machine learning basic knowledge is also expected. What You Will Learn Use predictive modeling and apply it to real-world problems Understand how to perform market segmentation using unsupervised learning Apply your new-found skills to solve real problems, through clearly-explained code for every technique and test Compete with top data scientists by gaining a practical and theoretical understanding of cutting-edge deep learning algorithms Increase predictive accuracy with deep learning and scalable data-handling techniques Work with modern state-of-the-art large-scale machine learning techniques Learn to use Python code to implement a range of machine learning algorithms and techniques In Detail Machine learning is increasingly spreading in the modern data-driven world. It is used extensively across many fields such as search engines, robotics, self-driving cars, and more. Machine learning is transforming the way we understand and interact with the world around us. In the first module, Python Machine Learning Cookbook, you will learn how to perform various machine learning tasks using a wide variety of machine learning algorithms to solve real-world problems and use Python to implement these algorithms. The second module, Advanced Machine Learning with Python, is designed to take you on a guided tour of the most relevant and powerful machine learning techniques and you'll acquire a broad set of powerful skills in the area of feature selection and feature engineering. The third module in this learning path, Large Scale Machine Learning with Python, dives into scalable machine learning and the three forms of scalability. It covers the most effective machine learning techniques on a map reduce framework in Hadoop and Spark in Python. This Learning Path will teach you Python machine learning for the real world. The machine learning techniques covered in this Learning Path are at the forefront of commercial practice. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Python Machine Learning Cookbook by Prateek Joshi Advanced Machine Learning with Python by John Hearty Large Scale Machine Learning with Python by Bastiaan Sjardin, Alberto Boschetti, Luca Massaron Style and approach This course is a smooth learning path that will teach you how to get started with Python machine learning for the real world, and develop solutions to real-world problems. Through this comprehensive course, you'll learn to create the most effective machine learning techniques from scratch and more!"

Author(s): Prateek Joshi; Luca Massaron; John Hearty
Year: 2017

Language: English
Tags: learning path; artificial intelligence; AI

Python: Real World Machine Learning
Table of Contents
Python: Real World Machine Learning
Python: Real World Machine Learning
Python: Real World Machine Learning
Credits
Preface
What this learning path covers
What you need for this learning path
Who this learning path is for
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Part I. Module 1
Chapter 1. The Realm of Supervised Learning
Introduction
Preprocessing data using different techniques
Getting ready
How to do it…
Mean removal
Scaling
Normalization
Binarization
One Hot Encoding
Label encoding
How to do it…
Building a linear regressor
Getting ready
How to do it…
Computing regression accuracy
Getting ready
How to do it…
Achieving model persistence
How to do it…
Building a ridge regressor
Getting ready
How to do it…
Building a polynomial regressor
Getting ready
How to do it…
Estimating housing prices
Getting ready
How to do it…
Computing the relative importance of features
How to do it…
Estimating bicycle demand distribution
Getting ready
How to do it…
There's more…
Chapter 2. Constructing a Classifier
Introduction
Building a simple classifier
How to do it…
There's more…
Building a logistic regression classifier
How to do it…
Building a Naive Bayes classifier
How to do it…
Splitting the dataset for training and testing
How to do it…
Evaluating the accuracy using cross-validation
Getting ready…
How to do it…
Visualizing the confusion matrix
How to do it…
Extracting the performance report
How to do it…
Evaluating cars based on their characteristics
Getting ready
How to do it…
Extracting validation curves
How to do it…
Extracting learning curves
How to do it…
Estimating the income bracket
How to do it…
Chapter 3. Predictive Modeling
Introduction
Building a linear classifier using Support Vector Machine (SVMs)
Getting ready
How to do it…
Building a nonlinear classifier using SVMs
How to do it…
Tackling class imbalance
How to do it…
Extracting confidence measurements
How to do it…
Finding optimal hyperparameters
How to do it…
Building an event predictor
Getting ready
How to do it…
Estimating traffic
Getting ready
How to do it…
Chapter 4. Clustering with Unsupervised Learning
Introduction
Clustering data using the k-means algorithm
How to do it…
Compressing an image using vector quantization
How to do it…
Building a Mean Shift clustering model
How to do it…
Grouping data using agglomerative clustering
How to do it…
Evaluating the performance of clustering algorithms
How to do it…
Automatically estimating the number of clusters using DBSCAN algorithm
How to do it…
Finding patterns in stock market data
How to do it…
Building a customer segmentation model
How to do it…
Chapter 5. Building Recommendation Engines
Introduction
Building function compositions for data processing
How to do it…
Building machine learning pipelines
How to do it…
How it works…
Finding the nearest neighbors
How to do it…
Constructing a k-nearest neighbors classifier
How to do it…
How it works…
Constructing a k-nearest neighbors regressor
How to do it…
How it works…
Computing the Euclidean distance score
How to do it…
Computing the Pearson correlation score
How to do it…
Finding similar users in the dataset
How to do it…
Generating movie recommendations
How to do it…
Chapter 6. Analyzing Text Data
Introduction
Preprocessing data using tokenization
How to do it…
Stemming text data
How to do it…
How it works…
Converting text to its base form using lemmatization
How to do it…
Dividing text using chunking
How to do it…
Building a bag-of-words model
How to do it…
How it works…
Building a text classifier
How to do it…
How it works…
Identifying the gender
How to do it…
Analyzing the sentiment of a sentence
How to do it…
How it works…
Identifying patterns in text using topic modeling
How to do it…
How it works…
Chapter 7. Speech Recognition
Introduction
Reading and plotting audio data
How to do it…
Transforming audio signals into the frequency domain
How to do it…
Generating audio signals with custom parameters
How to do it…
Synthesizing music
How to do it…
Extracting frequency domain features
How to do it…
Building Hidden Markov Models
How to do it…
Building a speech recognizer
How to do it…
Chapter 8. Dissecting Time Series and Sequential Data
Introduction
Transforming data into the time series format
How to do it…
Slicing time series data
How to do it…
Operating on time series data
How to do it…
Extracting statistics from time series data
How to do it…
Building Hidden Markov Models for sequential data
Getting ready
How to do it…
Building Conditional Random Fields for sequential text data
Getting ready
How to do it…
Analyzing stock market data using Hidden Markov Models
How to do it…
Chapter 9. Image Content Analysis
Introduction
Operating on images using OpenCV-Python
How to do it…
Detecting edges
How to do it…
Histogram equalization
How to do it…
Detecting corners
How to do it…
Detecting SIFT feature points
How to do it…
Building a Star feature detector
How to do it…
Creating features using visual codebook and vector quantization
How to do it…
Training an image classifier using Extremely Random Forests
How to do it…
Building an object recognizer
How to do it…
Chapter 10. Biometric Face Recognition
Introduction
Capturing and processing video from a webcam
How to do it…
Building a face detector using Haar cascades
How to do it…
Building eye and nose detectors
How to do it…
Performing Principal Components Analysis
How to do it…
Performing Kernel Principal Components Analysis
How to do it…
Performing blind source separation
How to do it…
Building a face recognizer using Local Binary Patterns Histogram
How to do it…
Chapter 11. Deep Neural Networks
Introduction
Building a perceptron
How to do it…
Building a single layer neural network
How to do it…
Building a deep neural network
How to do it…
Creating a vector quantizer
How to do it…
Building a recurrent neural network for sequential data analysis
How to do it…
Visualizing the characters in an optical character recognition database
How to do it…
Building an optical character recognizer using neural networks
How to do it…
Chapter 12. Visualizing Data
Introduction
Plotting 3D scatter plots
How to do it…
Plotting bubble plots
How to do it…
Animating bubble plots
How to do it…
Drawing pie charts
How to do it…
Plotting date-formatted time series data
How to do it…
Plotting histograms
How to do it…
Visualizing heat maps
How to do it…
Animating dynamic signals
How to do it…
Part II. Module 2
Chapter 1. Unsupervised Machine Learning
Principal component analysis
Note
PCA – a primer
Note
Employing PCA
Introducing k-means clustering
Clustering – a primer
Kick-starting clustering analysis
Note
Note
Tuning your clustering configurations
Note
Self-organizing maps
SOM – a primer
Employing SOM
Note
Further reading
Summary
Chapter 2. Deep Belief Networks
Neural networks – a primer
The composition of a neural network
Network topologies
Restricted Boltzmann Machine
Introducing the RBM
Note
Topology
Training
Note
Applications of the RBM
Further applications of the RBM
Deep belief networks
Training a DBN
Applying the DBN
Note
Validating the DBN
Further reading
Summary
Chapter 3. Stacked Denoising Autoencoders
Autoencoders
Introducing the autoencoder
Topology
Training
Denoising autoencoders
Note
Applying a dA
Stacked Denoising Autoencoders
Applying the SdA
Note
Note
Assessing SdA performance
Further reading
Summary
Chapter 4. Convolutional Neural Networks
Introducing the CNN
Understanding the convnet topology
Understanding convolution layers
Note
Understanding pooling layers
Training a convnet
Putting it all together
Applying a CNN
Further Reading
Summary
Chapter 5. Semi-Supervised Learning
Introduction
Understanding semi-supervised learning
Semi-supervised algorithms in action
Self-training
Implementing self-training
Note
Finessing your self-training implementation
Note
Note
Improving the selection process
Contrastive Pessimistic Likelihood Estimation
Note
Note
Further reading
Summary
Chapter 6. Text Feature Engineering
Introduction
Text feature engineering
Cleaning text data
Text cleaning with BeautifulSoup
Managing punctuation and tokenizing
Note
Tagging and categorising words
Tagging with NLTK
Sequential tagging
Note
Backoff tagging
Creating features from text data
Stemming
Bagging and random forests
Note
Testing our prepared data
Note
Note
Further reading
Summary
Chapter 7. Feature Engineering Part II
Introduction
Creating a feature set
Engineering features for ML applications
Using rescaling techniques to improve the learnability of features
Creating effective derived variables
Reinterpreting non-numeric features
Note
Using feature selection techniques
Performing feature selection
Correlation
LASSO
Recursive Feature Elimination
Genetic models
Feature engineering in practice
Acquiring data via RESTful APIs
Testing the performance of our model
Twitter
Translink Twitter
Note
Consumer comments
The Bing Traffic API
Deriving and selecting variables using feature engineering techniques
The weather API
Note
Note
Further reading
Summary
Chapter 8. Ensemble Methods
Introducing ensembles
Understanding averaging ensembles
Using bagging algorithms
Note
Using random forests
Applying boosting methods
Using XGBoost
Using stacking ensembles
Applying ensembles in practice
Using models in dynamic applications
Understanding model robustness
Identifying modeling risk factors
Strategies to managing model robustness
Note
Further reading
Summary
Chapter 9. Additional Python Machine Learning Tools
Alternative development tools
Introduction to Lasagne
Getting to know Lasagne
Introduction to TensorFlow
Getting to know TensorFlow
Using TensorFlow to iteratively improve our models
Knowing when to use these libraries
Further reading
Summary
Appendix A. Chapter Code Requirements
Part III. Module 3
Chapter 1. First Steps to Scalability
Explaining scalability in detail
Making large scale examples
Introducing Python
Tip
Scale up with Python
Scale out with Python
Python for large scale machine learning
Choosing between Python 2 and Python 3
Tip
Package upgrades
Scientific distributions
Tip
Introducing Jupyter/IPython
Tip
Note
Python packages
NumPy
Tip
SciPy
Pandas
Tip
Scikit-learn
Tip
The matplotlib package
Gensim
H2O
XGBoost
Theano
Tip
TensorFlow
The sknn library
Theanets
Keras
Other useful packages to install on your system
Summary
Chapter 2. Scalable Learning in Scikit-learn
Out-of-core learning
Subsampling as a viable option
Optimizing one instance at a time
Building an out-of-core learning system
Streaming data from sources
Datasets to try the real thing yourself
Tip
The first example – streaming the bike-sharing dataset
Using pandas I/O tools
Working with databases
Tip
Paying attention to the ordering of instances
Tip
Stochastic learning
Batch gradient descent
Stochastic gradient descent
The Scikit-learn SGD implementation
Defining SGD learning parameters
Tip
Feature management with data streams
Describing the target
The hashing trick
Other basic transformations
Testing and validation in a stream
Trying SGD in action
Summary
Chapter 3. Fast SVM Implementations
Datasets to experiment with on your own
The bike-sharing dataset
The covertype dataset
Support Vector Machines
Hinge loss and its variants
Understanding the Scikit-learn SVM implementation
Tip
Tip
Pursuing nonlinear SVMs by subsampling
Achieving SVM at scale with SGD
Tip
Tip
Feature selection by regularization
Tip
Including non-linearity in SGD
Tip
Trying explicit high-dimensional mappings
Hyperparameter tuning
Tip
Other alternatives for SVM fast learning
Nonlinear and faster with Vowpal Wabbit
Installing VW
Tip
Understanding the VW data format
Tip
Python integration
A few examples using reductions for SVM and neural nets
Faster bike-sharing
The covertype dataset crunched by VW
Tip
Summary
Chapter 4. Neural Networks and Deep Learning
The neural network architecture
Note
Note
What and how neural networks learn
Choosing the right architecture
The input layer
The hidden layer
Tip
The output layer
Neural networks in action
Parallelization for sknn
Neural networks and regularization
Neural networks and hyperparameter optimization
Note
Neural networks and decision boundaries
Note
Deep learning at scale with H2O
Large scale deep learning with H2O
Gridsearch on H2O
Deep learning and unsupervised pretraining
Deep learning with theanets
Autoencoders and unsupervised learning
Autoencoders
Summary
Chapter 5. Deep Learning with TensorFlow
TensorFlow installation
TensorFlow operations
GPU computing
Linear regression with SGD
A neural network from scratch in TensorFlow
Machine learning on TensorFlow with SkFlow
Deep learning with large files – incremental learning
Keras and TensorFlow installation
Convolutional Neural Networks in TensorFlow through Keras
Note
Note
The convolution layer
The pooling layer
The fully connected layer
CNN's with an incremental approach
GPU Computing
Summary
Chapter 6. Classification and Regression Trees at Scale
Bootstrap aggregation
Random forest and extremely randomized forest
Fast parameter optimization with randomized search
Extremely randomized trees and large datasets
Note
CART and boosting
Gradient Boosting Machines
max_depth
learning_rate
Subsample
Faster GBM with warm_start
Note
Speeding up GBM with warm_start
Training and storing GBM models
XGBoost
XGBoost regression
XGBoost and variable importance
XGBoost streaming large datasets
XGBoost model persistence
Out-of-core CART with H2O
Random forest and gridsearch on H2O
Stochastic gradient boosting and gridsearch on H2O
Summary
Chapter 7. Unsupervised Learning at Scale
Unsupervised methods
Feature decomposition – PCA
Randomized PCA
Incremental PCA
Sparse PCA
PCA with H2O
Clustering – K-means
Initialization methods
K-means assumptions
Selection of the best K
Scaling K-means – mini-batch
Note
K-means with H2O
LDA
Note
Note
Scaling LDA – memory, CPUs, and machines
Summary
Chapter 8. Distributed Environments – Hadoop and Spark
From a standalone machine to a bunch of nodes
Why do we need a distributed framework?
Tip
Note
Setting up the VM
Note
VirtualBox
Note
Vagrant
Note
Using the VM
Note
Note
Note
The Hadoop ecosystem
Architecture
HDFS
Note
MapReduce
Note
YARN
Spark
pySpark
Summary
Chapter 9. Practical Machine Learning with Spark
Setting up the VM for this chapter
Note
Sharing variables across cluster nodes
Broadcast read-only variables
Note
Accumulators write-only variables
Broadcast and accumulators together – an example
Data preprocessing in Spark
JSON files and Spark DataFrames
Dealing with missing data
Grouping and creating tables in-memory
Writing the preprocessed DataFrame or RDD to disk
Working with Spark DataFrames
Note
Machine learning with Spark
Note
Spark on the KDD99 dataset
Note
Reading the dataset
Feature engineering
Note
Training a learner
Evaluating a learner's performance
The power of the ML pipeline
Manual tuning
Cross-validation
Final cleanup
Summary
Appendix A. Introduction to GPUs and Theano
GPU computing
Theano – parallel computing on the GPU
Installing Theano
Appendix A. Bibliography
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z