Quickly build and deploy machine learning models without managing infrastructure, and improve productivity using Amazon SageMaker's capabilities such as Amazon SageMaker Studio, Autopilot, Experiments, Debugger, and Model Monitor
Key Features
• Build, train, and deploy machine learning models quickly using Amazon SageMaker
• Analyze, detect, and receive alerts relating to various business problems using machine learning algorithms and techniques
• Improve productivity by training and fine-tuning machine learning models in production
Book Description
Amazon SageMaker enables you to quickly build, train, and deploy machine learning (ML) models at scale, without managing any infrastructure. It helps you focus on the ML problem at hand and deploy high-quality models by removing the heavy lifting typically involved in each step of the ML process. This book is a comprehensive guide for data scientists and ML developers who want to learn the ins and outs of Amazon SageMaker.
You'll understand how to use various modules of SageMaker as a single toolset to solve the challenges faced in ML. As you progress, you'll cover features such as AutoML, built-in algorithms and frameworks, and the option for writing your own code and algorithms to build ML models. Later, the book will show you how to integrate Amazon SageMaker with popular deep learning libraries such as TensorFlow and PyTorch to increase the capabilities of existing models. You'll also learn to get the models to production faster with minimum effort and at a lower cost. Finally, you'll explore how to use Amazon SageMaker Debugger to analyze, detect, and highlight problems to understand the current model state and improve model accuracy.
By the end of this Amazon book, you'll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.
What you will learn
• Create and automate end-to-end machine learning workflows on Amazon Web Services (AWS)
• Become well-versed with data annotation and preparation techniques
• Use AutoML features to build and train machine learning models with AutoPilot
• Create models using built-in algorithms and frameworks and your own code
• Train computer vision and NLP models using real-world examples
• Cover training techniques for scaling, model optimization, model debugging, and cost optimization
• Automate deployment tasks in a variety of configurations using SDK and several automation tools
Who this book is for
This book is for software engineers, machine learning developers, data scientists, and AWS users who are new to using Amazon SageMaker and want to build high-quality machine learning models without worrying about infrastructure. Knowledge of AWS basics is required to grasp the concepts covered in this book more effectively. Some understanding of machine learning concepts and the Python programming language will also be beneficial.
Author(s): Julien Simon
Edition: 1
Publisher: Packt Publishing
Year: 2020
Language: English
Commentary: Vector PDF
Pages: 490
City: Birmingham, UK
Tags: Amazon Web Services; Machine Learning; Natural Language Processing; Unsupervised Learning; Computer Vision; Debugging; Supervised Learning; Python; Apache Spark; Keras; TensorFlow; Pipelines; Scalability; R; Deployment; scikit-learn; Raspberry Pi; AutoML; PyTorch; Image Classification; Automation; Batch Learning; ImageNet; AWS CloudFormation; Cost Optimization; Object Detection; Semantic Segmentation; Amazon SageMaker
Cover
Title Page
Copyright and Credits
About Packt
Foreword
Contributors
Table of Contents
Preface
Section 1: Introduction to Amazon SageMaker
Chapter 1: Introduction to Amazon SageMaker
Technical requirements
Exploring the capabilities of Amazon SageMaker
The main capabilities of Amazon SageMaker
The Amazon SageMaker API
Demonstrating the strengths ofAmazon SageMaker
Solving Alice's problems
Solving Bob's problems
Setting up Amazon SageMaker on yourlocal machine
Installing the SageMaker SDK with virtualenv
Installing the SageMaker SDK with Anaconda
A word about AWS permissions
Setting up an Amazon SageMakernotebook instance
Setting up Amazon SageMaker Studio
Onboarding to Amazon SageMaker Studio
Onboarding with the quick start procedure
Summary
Chapter 2: Handling Data Preparation Techniques
Technical requirements
Discovering Amazon SageMaker Ground Truth
Using workforces
Creating a private workforce
Uploading data for labeling
Creating a labeling job
Labeling images
Labeling text
Exploring Amazon SageMaker Processing
Discovering the Amazon SageMaker Processing API
Processing a dataset with scikit-learn
Processing a dataset with your own code
Processing data with other AWS services
Amazon Elastic Map Reduce
AWS Glue
Amazon Athena
Summary
Section 2: Building and Training Models
Chapter 3: AutoML with Amazon SageMaker Autopilot
Technical requirements
Discovering Amazon SageMaker Autopilot
Analyzing data
Feature engineering
Model tuning
Using SageMaker Autopilot in SageMaker Studio
Launching a job
Monitoring a job
Comparing jobs
Deploying and invoking a model
Using the SageMaker Autopilot SDK
Launching a job
Monitoring a job
Cleaning up
Diving deep on SageMaker Autopilot
The job artifacts
The Data Exploration notebook
The Candidate Generation notebook
Summary
Chapter 4: Training Machine Learning Models
Technical requirements
Discovering the built-in algorithms in Amazon SageMaker
Supervised learning
Unsupervised learning
A word about scalability
Training and deploying models with built-in algorithms
Understanding the end-to-end workflow
Using alternative workflows
Using fully managed infrastructure
Using the SageMaker SDK with built-in algorithms
Preparing data
Configuring a training job
Launching a training job
Deploying a model
Cleaning up
Working with more built-in algorithms
Classification with XGBoost
Recommendation with Factorization Machines
Using Principal Component Analysis
Detecting anomalies with Random Cut Forest
Summary
Chapter 5: Training Computer Vision Models
Technical requirements
Discovering the CV built-in algorithms in Amazon SageMaker
Discovering the image classification algorithm
Discovering the object detection algorithm
Discovering the semantic segmentation algorithm
Training with CV algorithms
Preparing image datasets
Working with image files
Working with RecordIO files
Working with SageMaker Ground Truth files
Using the built-in CV algorithms
Training an image classification model
Fine-tuning an image classification model
Training an object detection model
Training a semantic segmentation model
Summary
Chapter 6: Training Natural Language Processing Models
Technical requirements
Discovering the NLP built-in algorithms in Amazon SageMaker
Discovering the BlazingText algorithm
Discovering the LDA algorithm
Discovering the NTM algorithm
Discovering the seq2seq algorithm
Training with NLP algorithms
Preparing natural language datasets
Preparing data for classification with BlazingText
Preparing data for classification with BlazingText, version 2
Preparing data for word vectors with BlazingText
Preparing data for topic modeling with LDA and NTM
Using datasets labeled with SageMaker Ground Truth
Using the built-in algorithms for NLP
Classifying text with BlazingText
Computing word vectors with BlazingText
Using BlazingText models with FastText
Modeling topics with LDA
Modeling topics with NTM
Summary
Chapter 7: Extending Machine Learning Services Using Built-In Frameworks
Technical requirements
Discovering the built-in frameworks in Amazon SageMaker
Running a first example
Working with framework containers
Training and deploying locally
Training with script mode
Understanding model deployment
Managing dependencies
Putting it all together
Running your framework code on Amazon SageMaker
Using the built-in frameworks
Working with TensorFlow and Keras
Working with PyTorch
Working with Apache Spark
Summary
Chapter 8: Using Your Algorithms and Code
Technical requirements
Understanding how SageMaker invokes your code
Using the SageMaker training toolkit with scikit-learn
Building a fully custom container for scikit-learn
Training with a fully custom container
Deploying a fully custom container
Building a fully custom container for R
Coding with R and Plumber
Building a custom container
Training and deploying a custom container on SageMaker
Training and deploying with XGBoost and MLflow
Installing MLflow
Training a model with MLflow
Building a SageMaker container with MLflow
Training and deploying with XGBoost and Sagify
Installing Sagify
Coding our model with Sagify
Deploying a model locally with Sagify
Deploying a model on SageMaker with Sagify
Summary
Section 3: Diving Deeper on Training
Chapter 9: Scaling Your Training Jobs
Technical requirements
Understanding when and how to scale
Understanding what scaling means
Adapting training time to business requirements
Right-sizing training infrastructure
Deciding when to scale
Deciding how to scale
Scaling a BlazingText training job
Scaling a Semantic Segmentation training job
Solving training challenges
Streaming datasets with pipe mode
Using pipe mode with built-in algorithms
Using pipe mode with other algorithms
Training factorization machines with pipe mode
Training Object Detection with pipe mode
Using other storage services
Working with SageMaker and Amazon EFS
Working with SageMaker and Amazon FSx for Lustre
Distributing training jobs
Distributing training for built-in algorithms
Distributing training for built-in frameworks
Distributing training for custom containers
Distributing training for Object Detection
Training an Image Classification model on ImageNet
Preparing the ImageNet dataset
Defining our training job
Training on ImageNet
Examining results
Summary
Chapter 10: Advanced Training Techniques
Technical requirements
Optimizing training costs with ManagedSpot Training
Comparing costs
Understanding spot instances
Understanding Managed Spot Training
Using Managed Spot Training with Object Detection
Using Managed Spot Training and checkpointingwith Keras
Optimizing hyperparameters with Automatic Model Tuning
Understanding Automatic Model Tuning
Using Automatic Model Tuning with Object Detection
Using Automatic Model Tuning with Keras
Using Automatic Model Tuning for architecture search
Tuning multiple algorithms
Exploring models with SageMaker Debugger
Debugging an XGBoost job
Inspecting an XGBoost job
Debugging and inspecting a Keras job
Summary
Section 4: Managing Models in Production
Chapter 11: Deploying Machine Learning Models
Technical requirements
Examining model artifacts
Examining artifacts for built-in algorithms
Examining artifacts for built-in computervision algorithms
Examining artifacts for XGBoost
Managing real-time endpoints
Managing endpoints with the SageMaker SDK
Managing endpoints with the boto3 SDK
Deploying batch transformers
Deploying inference pipelines
Monitoring predictions with Amazon SageMaker Model Monitor
Capturing data
Creating a baseline
Setting up a monitoring schedule
Sending bad data
Examining violation reports
Deploying models to container services
Training on SageMaker and deploying on Amazon Fargate
Summary
Chapter 12: Automating Machine Learning Workflows
Technical requirements
Automating with AWS CloudFormation
Writing a template
Deploying a model to a real-time endpoint
Modifying a stack with a change set
Adding a second production variant to the endpoint
Implementing canary deployment
Implementing blue-green deployment
Automating with the AWS CloudDevelopment Kit
Installing CDK
Creating a CDK application
Writing a CDK application
Deploying a CDK application
Automating with AWS Step Functions
Setting up permissions
Implementing our first workflow
Adding parallel execution to a workflow
Adding a Lambda function to a workflow
Summary
Chapter 13: Optimizing Prediction Cost and Performance
Technical requirements
Autoscaling an endpoint
Deploying a multi-model endpoint
Understanding multi-model endpoints
Building a multi-model endpoint with Scikit-Learn
Deploying a model with AmazonElastic Inference
Deploying a model with AWS
Compiling models with AmazonSageMaker Neo
Understanding Amazon Neo
Compiling and deploying an image classification model on SageMaker
Exploring models compiled with Neo
Deploying an image classification model ona Raspberry Pi
Deploying models on AWS Inferentia
Building a cost optimization checklist
Optimizing costs for data preparation
Optimizing costs for experimentation
Optimizing costs for model training
Optimizing costs for model deployment
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Index