Hands-On Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to the holy grail in AI research, the so-called general artificial intelligence. Since the majority of the world's data is unlabeled, conventional supervised learning cannot be applied; this is where unsupervised learning comes in. Unsupervised learning can be applied to unlabeled datasets to discover meaningful patterns buried deep in the data, patterns that may be near impossible for humans to uncover. Author Ankur Patel provides practical knowledge on how to apply unsupervised learning using two simple, production-ready Python frameworks - scikit-learn and TensorFlow using Keras. With the hands-on examples and code provided, you will identify difficult-to-find patterns in data and gain deeper business insight, detect anomalies, perform automatic feature engineering and selection, and generate synthetic datasets. All you need is programming and some machine learning experience to get started. • Compare the strengths and weaknesses of the different machine learning approaches: supervised, unsupervised, and reinforcement learning • Set up and manage a machine learning project end-to-end - everything from data acquisition to building a model and implementing a solution in production • Use dimensionality reduction algorithms to uncover the most relevant information in data and build an anomaly detection system to catch credit card fraud • Apply clustering algorithms to segment users - such as loan borrowers - into distinct and homogeneous groups • Use autoencoders to perform automatic feature engineering and selection • Combine supervised and unsupervised learning algorithms to develop semi-supervised solutions • Build movie recommender systems using restricted Boltzmann machines • Generate synthetic images using deep belief networks and generative adversarial networks • Perform clustering on time series data such as electrocardiograms • Explore the successes of unsupervised learning to date and its promising future

Author(s): Ankur A. Patel
Edition: 1
Publisher: O'Reilly Media
Year: 2019

Language: English
Commentary: Publisher's PDF
Pages: 359
City: Sebastopol, CA
Tags: Machine Learning;Neural Networks;Deep Learning;Unsupervised Learning;Reinforcement Learning;Anomaly Detection;Python;Recommender Systems;Deep Belief Networks;Boltzmann Machines;Autoencoders;Generative Adversarial Networks;Principal Component Analysis;Keras;TensorFlow;Logistic Regression;Feature Extraction;Git;Fraud Detection;Anaconda;Dimensionality Reduction;Time Series Analysis;Activation Functions;Version Control Systems;Gradient Boosting;Semi-Supervised Learning;Cluster Analysis

Copyright
Table of Contents
Preface
A Brief History of Machine Learning
AI Is Back, but Why Now?
The Emergence of Applied AI
Major Milestones in Applied AI over the Past 20 Years
From Narrow AI to AGI
Objective and Approach
Prerequisites
Roadmap
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Part I. Fundamentals of Unsupervised Learning
Chapter 1. Unsupervised Learning in the Machine Learning Ecosystem
Basic Machine Learning Terminology
Rules-Based vs. Machine Learning
Supervised vs. Unsupervised
The Strengths and Weaknesses of Supervised Learning
The Strengths and Weaknesses of Unsupervised Learning
Using Unsupervised Learning to Improve Machine Learning Solutions
A Closer Look at Supervised Algorithms
Linear Methods
Neighborhood-Based Methods
Tree-Based Methods
Support Vector Machines
Neural Networks
A Closer Look at Unsupervised Algorithms
Dimensionality Reduction
Clustering
Feature Extraction
Unsupervised Deep Learning
Sequential Data Problems Using Unsupervised Learning
Reinforcement Learning Using Unsupervised Learning
Semisupervised Learning
Successful Applications of Unsupervised Learning
Anomaly Detection
Conclusion
Chapter 2. End-to-End Machine Learning Project
Environment Setup
Version Control: Git
Clone the Hands-On Unsupervised Learning Git Repository
Scientific Libraries: Anaconda Distribution of Python
Neural Networks: TensorFlow and Keras
Gradient Boosting, Version One: XGBoost
Gradient Boosting, Version Two: LightGBM
Clustering Algorithms
Interactive Computing Environment: Jupyter Notebook
Overview of the Data
Data Preparation
Data Acquisition
Data Exploration
Generate Feature Matrix and Labels Array
Feature Engineering and Feature Selection
Data Visualization
Model Preparation
Split into Training and Test Sets
Select Cost Function
Create k-Fold Cross-Validation Sets
Machine Learning Models (Part I)
Model #1: Logistic Regression
Evaluation Metrics
Confusion Matrix
Precision-Recall Curve
Receiver Operating Characteristic
Machine Learning Models (Part II)
Model #2: Random Forests
Model #3: Gradient Boosting Machine (XGBoost)
Model #4: Gradient Boosting Machine (LightGBM)
Evaluation of the Four Models Using the Test Set
Ensembles
Stacking
Final Model Selection
Production Pipeline
Conclusion
Part II. Unsupervised Learning Using Scikit-Learn
Chapter 3. Dimensionality Reduction
The Motivation for Dimensionality Reduction
The MNIST Digits Database
Dimensionality Reduction Algorithms
Linear Projection vs. Manifold Learning
Principal Component Analysis
PCA, the Concept
PCA in Practice
Incremental PCA
Sparse PCA
Kernel PCA
Singular Value Decomposition
Random Projection
Gaussian Random Projection
Sparse Random Projection
Isomap
Multidimensional Scaling
Locally Linear Embedding
t-Distributed Stochastic Neighbor Embedding
Other Dimensionality Reduction Methods
Dictionary Learning
Independent Component Analysis
Conclusion
Chapter 4. Anomaly Detection
Credit Card Fraud Detection
Prepare the Data
Define Anomaly Score Function
Define Evaluation Metrics
Define Plotting Function
Normal PCA Anomaly Detection
PCA Components Equal Number of Original Dimensions
Search for the Optimal Number of Principal Components
Sparse PCA Anomaly Detection
Kernel PCA Anomaly Detection
Gaussian Random Projection Anomaly Detection
Sparse Random Projection Anomaly Detection
Nonlinear Anomaly Detection
Dictionary Learning Anomaly Detection
ICA Anomaly Detection
Fraud Detection on the Test Set
Normal PCA Anomaly Detection on the Test Set
ICA Anomaly Detection on the Test Set
Dictionary Learning Anomaly Detection on the Test Set
Conclusion
Chapter 5. Clustering
MNIST Digits Dataset
Data Preparation
Clustering Algorithms
k-Means
k-Means Inertia
Evaluating the Clustering Results
k-Means Accuracy
k-Means and the Number of Principal Components
k-Means on the Original Dataset
Hierarchical Clustering
Agglomerative Hierarchical Clustering
The Dendrogram
Evaluating the Clustering Results
DBSCAN
DBSCAN Algorithm
Applying DBSCAN to Our Dataset
HDBSCAN
Conclusion
Chapter 6. Group Segmentation
Lending Club Data
Data Preparation
Transform String Format to Numerical Format
Impute Missing Values
Engineer Features
Select Final Set of Features and Perform Scaling
Designate Labels for Evaluation
Goodness of the Clusters
k-Means Application
Hierarchical Clustering Application
HDBSCAN Application
Conclusion
Part III. Unsupervised Learning Using TensorFlow and Keras
Chapter 7. Autoencoders
Neural Networks
TensorFlow
Keras
Autoencoder: The Encoder and the Decoder
Undercomplete Autoencoders
Overcomplete Autoencoders
Dense vs. Sparse Autoencoders
Denoising Autoencoder
Variational Autoencoder
Conclusion
Chapter 8. Hands-On Autoencoder
Data Preparation
The Components of an Autoencoder
Activation Functions
Our First Autoencoder
Loss Function
Optimizer
Training the Model
Evaluating on the Test Set
Two-Layer Undercomplete Autoencoder with Linear Activation Function
Increasing the Number of Nodes
Adding More Hidden Layers
Nonlinear Autoencoder
Overcomplete Autoencoder with Linear Activation
Overcomplete Autoencoder with Linear Activation and Dropout
Sparse Overcomplete Autoencoder with Linear Activation
Sparse Overcomplete Autoencoder with Linear Activation and Dropout
Working with Noisy Datasets
Denoising Autoencoder
Two-Layer Denoising Undercomplete Autoencoder with Linear Activation
Two-Layer Denoising Overcomplete Autoencoder with Linear Activation
Two-Layer Denoising Overcomplete Autoencoder with ReLu Activation
Conclusion
Chapter 9. Semisupervised Learning
Data Preparation
Supervised Model
Unsupervised Model
Semisupervised Model
The Power of Supervised and Unsupervised
Conclusion
Part IV. Deep Unsupervised Learning Using TensorFlow and Keras
Chapter 10. Recommender Systems Using Restricted Boltzmann Machines
Boltzmann Machines
Restricted Boltzmann Machines
Recommender Systems
Collaborative Filtering
The Netflix Prize
MovieLens Dataset
Data Preparation
Define the Cost Function: Mean Squared Error
Perform Baseline Experiments
Matrix Factorization
One Latent Factor
Three Latent Factors
Five Latent Factors
Collaborative Filtering Using RBMs
RBM Neural Network Architecture
Build the Components of the RBM Class
Train RBM Recommender System
Conclusion
Chapter 11. Feature Detection Using Deep Belief Networks
Deep Belief Networks in Detail
MNIST Image Classification
Restricted Boltzmann Machines
Build the Components of the RBM Class
Generate Images Using the RBM Model
View the Intermediate Feature Detectors
Train the Three RBMs for the DBN
Examine Feature Detectors
View Generated Images
The Full DBN
How Training of a DBN Works
Train the DBN
How Unsupervised Learning Helps Supervised Learning
Generate Images to Build a Better Image Classifier
Image Classifier Using LightGBM
Supervised Only
Unsupervised and Supervised Solution
Conclusion
Chapter 12. Generative Adversarial Networks
GANs, the Concept
The Power of GANs
Deep Convolutional GANs
Convolutional Neural Networks
DCGANs Revisited
Generator of the DCGAN
Discriminator of the DCGAN
Discriminator and Adversarial Models
DCGAN for the MNIST Dataset
MNIST DCGAN in Action
Synthetic Image Generation
Conclusion
Chapter 13. Time Series Clustering
ECG Data
Approach to Time Series Clustering
k-Shape
Time Series Clustering Using k-Shape on ECGFiveDays
Data Preparation
Training and Evaluation
Time Series Clustering Using k-Shape on ECG5000
Data Preparation
Training and Evaluation
Time Series Clustering Using k-Means on ECG5000
Time Series Clustering Using Hierarchical DBSCAN on ECG5000
Comparing the Time Series Clustering Algorithms
Full Run with k-Shape
Full Run with k-Means
Full Run with HDBSCAN
Comparing All Three Time Series Clustering Approaches
Conclusion
Chapter 14. Conclusion
Supervised Learning
Unsupervised Learning
Scikit-Learn
TensorFlow and Keras
Reinforcement Learning
Most Promising Areas of Unsupervised Learning Today
The Future of Unsupervised Learning
Final Words
Index
About the Author
Colophon