Learn to use scikit-learn operations and functions for Machine Learning and deep learning applications.About This Book* Handle a variety of machine learning tasks effortlessly by leveraging the power of scikit-learn* Perform supervised and unsupervised learning with ease, and evaluate the performance of your model* Practical, easy to understand recipes aimed at helping you choose the right machine learning algorithmWho This Book Is ForData Analysts already familiar with Python but not so much with scikit-learn, who want quick solutions to the common machine learning problems will find this book to be very useful. If you are a Python programmer who wants to take a dive into the world of machine learning in a practical manner, this book will help you too.What You Will Learn* Build predictive models in minutes by using scikit-learn* Understand the differences and relationships between Classification and Regression, two types of Supervised Learning.* Use distance metrics to predict in Clustering, a type of Unsupervised Learning* Find points with similar characteristics with Nearest Neighbors.* Use automation and cross-validation to find a best model and focus on it for a data product* Choose among the best algorithm of many or use them together in an ensemble.* Create your own estimator with the simple syntax of sklearn* Explore the feed-forward neural networks available in scikit-learnIn DetailPython is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. This book includes walk throughs and solutions to the common as well as the not-so-common problems in machine learning, and how scikit-learn can be leveraged to perform various machine learning tasks effectively.The second edition begins with taking you through recipes on evaluating the statistical properties of data and generates synthetic data for machine learning modelling. As you progress through the chapters, you will comes across recipes that will teach you to implement techniques like data pre-processing, linear regression, logistic regression, K-NN, Naive Bayes, classification, decision trees, Ensembles and much more. Furthermore, you'll learn to optimize your models with multi-class classification, cross validation, model evaluation and dive deeper in to implementing deep learning with scikit-learn. Along with covering the enhanced features on model section, API and new features like classifiers, regressors and estimators the book also contains recipes on evaluating and fine-tuning the performance of your model.By the end of this book, you will have explored plethora of features offered by scikit-learn for Python to solve any machine learning problem you come across.Style and ApproachThis book consists of practical recipes on scikit-learn that target novices as well as intermediate users. It goes deep into the technical issues, covers additional protocols, and many more real-live examples so that you are able to implement it in your daily life scenarios.
Author(s): Julian Avila; Trent Hauck
Edition: 2
Year: 2017
Language: English
Pages: 374
Cover
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: High-Performance Machine Learning – NumPy
Introduction
NumPy basics
How to do it...
The shape and dimension of NumPy arrays
NumPy broadcasting
Initializing NumPy arrays and dtypes
Indexing
Boolean arrays
Arithmetic operations
NaN values
How it works...
Loading the iris dataset
Getting ready
How to do it...
How it works...
Viewing the iris dataset
How to do it...
How it works...
There's more...
Viewing the iris dataset with Pandas
How to do it...
How it works...
Plotting with NumPy and matplotlib
Getting ready
How to do it...
A minimal machine learning recipe – SVM classification
Getting ready
How to do it...
How it works...
There's more...
Introducing cross-validation
Getting ready
How to do it...
How it works...
There's more...
Putting it all together
How to do it...
There's more...
Machine learning overview – classification versus regression
The purpose of scikit-learn
Supervised versus unsupervised
Getting ready
How to do it...
Quick SVC – a classifier and regressor
Making a scorer
How it works...
There's more...
Linear versus nonlinear
Black box versus not
Interpretability
A pipeline
Chapter 2: Pre-Model Workflow and Pre-Processing
Introduction
Creating sample data for toy analysis
Getting ready
How to do it...
Creating a regression dataset
Creating an unbalanced classification dataset
Creating a dataset for clustering
How it works...
Scaling data to the standard normal distribution
Getting ready
How to do it...
How it works...
Creating binary features through thresholding
Getting ready
How to do it...
There's more...
Sparse matrices
The fit method
Working with categorical variables
Getting ready
How to do it...
How it works...
There's more...
DictVectorizer class
Imputing missing values through various strategies
Getting ready
How to do it...
How it works...
There's more...
A linear model in the presence of outliers
Getting ready
How to do it...
How it works...
Putting it all together with pipelines
Getting ready
How to do it...
How it works...
There's more...
Using Gaussian processes for regression
Getting ready
How to do it…
Cross-validation with the noise parameter
There's more...
Using SGD for regression
Getting ready
How to do it…
How it works…
Chapter 3: Dimensionality Reduction
Introduction
Reducing dimensionality with PCA
Getting ready
How to do it...
How it works...
There's more...
Using factor analysis for decomposition
Getting ready
How to do it...
How it works...
Using kernel PCA for nonlinear dimensionality reduction
Getting ready
How to do it...
How it works...
Using truncated SVD to reduce dimensionality
Getting ready
How to do it...
How it works...
There's more...
Sign flipping
Sparse matrices
Using decomposition to classify with DictionaryLearning
Getting ready
How to do it...
How it works...
Doing dimensionality reduction with manifolds – t-SNE
Getting ready
How to do it...
How it works...
Testing methods to reduce dimensionality with pipelines
Getting ready
How to do it...
How it works...
Chapter 4: Linear Models with scikit-learn
Introduction
Fitting a line through data
Getting ready
How to do it...
How it works...
There's more...
Fitting a line through data with machine learning
Getting ready
How to do it...
Evaluating the linear regression model
Getting ready
How to do it...
How it works...
There's more...
Using ridge regression to overcome linear regression's shortfalls
Getting ready
How to do it...
Optimizing the ridge regression parameter
Getting ready
How to do it...
How it works...
There's more...
Bayesian ridge regression
Using sparsity to regularize models
Getting ready
How to do it...
How it works...
LASSO cross-validation – LASSOCV
LASSO for feature selection
Taking a more fundamental approach to regularization with LARS
Getting ready
How to do it...
How it works...
There's more...
References
Chapter 5: Linear Models – Logistic Regression
Introduction
Using linear methods for classification – logistic regression
Loading data from the UCI repository
How to do it...
Viewing the Pima Indians diabetes dataset with pandas
How to do it...
Looking at the UCI Pima Indians dataset web page
How to do it...
View the citation policy
Read about missing values and context
Machine learning with logistic regression
Getting ready
Define X, y – the feature and target arrays
How to do it...
Provide training and testing sets
Train the logistic regression
Score the logistic regression
Examining logistic regression errors with a confusion matrix
Getting ready
How to do it...
Reading the confusion matrix
General confusion matrix in context
Varying the classification threshold in logistic regression
Getting ready
How to do it...
Receiver operating characteristic – ROC analysis
Getting ready
Sensitivity
A visual perspective
How to do it...
Calculating TPR in scikit-learn
Plotting sensitivity
There's more...
The confusion matrix in a non-medical context
Plotting an ROC curve without context
How to do it...
Perfect classifier
Imperfect classifier
AUC – the area under the ROC curve
Putting it all together – UCI breast cancer dataset
How to do it...
Outline for future projects
Chapter 6: Building Models with Distance Metrics
Introduction
Using k-means to cluster data
Getting ready
How to do it…
How it works...
Optimizing the number of centroids
Getting ready
How to do it...
How it works...
Assessing cluster correctness
Getting ready
How to do it...
There's more...
Using MiniBatch k-means to handle more data
Getting ready
How to do it...
How it works...
Quantizing an image with k-means clustering
Getting ready
How do it…
How it works…
Finding the closest object in the feature space
Getting ready
How to do it...
How it works...
There's more...
Probabilistic clustering with Gaussian mixture models
Getting ready
How to do it...
How it works...
Using k-means for outlier detection
Getting ready
How to do it...
How it works...
Using KNN for regression
Getting ready
How to do it…
How it works..
Chapter 7: Cross-Validation and Post-Model Workflow
Introduction
Selecting a model with cross-validation
Getting ready
How to do it...
How it works...
K-fold cross validation
Getting ready
How to do it..
There's more...
Balanced cross-validation
Getting ready
How to do it...
There's more...
Cross-validation with ShuffleSplit
Getting ready
How to do it...
Time series cross-validation
Getting ready
How to do it...
There's more...
Grid search with scikit-learn
Getting ready
How to do it...
How it works...
Randomized search with scikit-learn
Getting ready
How to do it...
Classification metrics
Getting ready
How to do it...
There's more...
Regression metrics
Getting ready
How to do it...
Clustering metrics
Getting ready
How to do it...
Using dummy estimators to compare results
Getting ready
How to do it...
How it works...
Feature selection
Getting ready
How to do it...
How it works...
Feature selection on L1 norms
Getting ready
How to do it...
There's more...
Persisting models with joblib or pickle
Getting ready
How to do it...
Opening the saved model
There's more...
Chapter 8: Support Vector Machines
Introduction
Classifying data with a linear SVM
Getting ready
Load the data
Visualize the two classes
How to do it...
How it works...
There's more...
Optimizing an SVM
Getting ready
How to do it...
Construct a pipeline
Construct a parameter grid for a pipeline
Provide a cross-validation scheme
Perform a grid search
There's more...
Randomized grid search alternative
Visualize the nonlinear RBF decision boundary
More meaning behind C and gamma
Multiclass classification with SVM
Getting ready
How to do it...
OneVsRestClassifier
Visualize it
How it works...
Support vector regression
Getting ready
How to do it...
Chapter 9: Tree Algorithms and Ensembles
Introduction
Doing basic classifications with decision trees
Getting ready
How to do it...
Visualizing a decision tree with pydot
How to do it...
How it works...
There's more...
Tuning a decision tree
Getting ready
How to do it...
There's more...
Using decision trees for regression
Getting ready
How to do it...
There's more...
Reducing overfitting with cross-validation
How to do it...
There's more...
Implementing random forest regression
Getting ready
How to do it...
Bagging regression with nearest neighbors
Getting ready
How to do it...
Tuning gradient boosting trees
Getting ready
How to do it...
There's more...
Finding the best parameters of a gradient boosting classifier
Tuning an AdaBoost regressor
How to do it...
There's more...
Writing a stacking aggregator with scikit-learn
How to do it...
Chapter 10: Text and Multiclass Classification with scikit-learn
Using LDA for classification
Getting ready
How to do it...
How it works...
Working with QDA – a nonlinear LDA
Getting ready
How to do it...
How it works...
Using SGD for classification
Getting ready
How to do it...
There's more...
Classifying documents with Naive Bayes
Getting ready
How to do it...
How it works...
There's more...
Label propagation with semi-supervised learning
Getting ready
How to do it...
How it works...
Chapter 11: Neural Networks
Introduction
Perceptron classifier
Getting ready
How to do it...
How it works...
There's more...
Neural network – multilayer perceptron
Getting ready
How to do it...
How it works...
Philosophical thoughts on neural networks
Stacking with a neural network
Getting ready
How to do it...
First base model – neural network
Second base model – gradient boost ensemble
Third base model – bagging regressor of gradient boost ensembles
Some functions of the stacker
Meta-learner – extra trees regressor
There's more...
Chapter 12: Create a Simple Estimator
Introduction
Create a simple estimator
Getting ready
How to do it...
How it works...
There's more...
Trying the new GEE classifier on the Pima diabetes dataset
Saving your trained estimator
Index