Learning Ray: Flexible Distributed Python for Machine Learning (Final Release)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Get started with Ray, the open source distributed computing framework that simplifies the process of scaling compute-intensive Python workloads. With this practical book, Python programmers, data engineers, and data scientists will learn how to leverage Ray locally and spin up compute clusters. You'll be able to use Ray to structure and run machine learning programs at scale. Authors Max Pumperla, Edward Oakes, and Richard Liaw show you how to build machine learning applications with Ray. You'll understand how Ray fits into the current landscape of machine learning tools and discover how Ray continues to integrate ever more tightly with these tools. Distributed computation is hard, but by using Ray you'll find it easy to get started. Distributed computing is a fascinating topic. Looking back at the early days of computing, I can’t help but be impressed by the fact that so many companies distribute their workloads across clusters of computers. It’s not only impressive because we have figured out efficient ways to do so, but it’s also becoming a necessity. Individual computers keep getting faster and more powerful, and yet our need for large scale computing keeps exceeding what single machines can do. Ray simplifies distributed computing for non-experts and makes it easy to take Python scripts and scale them across multiple nodes. Ray is great at scaling both data and compute heavy workloads, such as data transformations and model training, and targets machine learning (ML) workloads with the need to scale. The addition of the Ray AI Runtime (AIR) with the release of Ray 2.0 in August 2022 increased the support for complex ML workloads in Ray even further. Learn how to build your first distributed applications with Ray Core Conduct hyperparameter optimization with Ray Tune Use the Ray RLlib library for reinforcement learning Manage distributed training with the Ray Train library Use Ray to perform data processing with Ray Datasets Learn how work with Ray Clusters and serve models with Ray Serve Build end-to-end machine learning applications with Ray AIR Who Should Read This Book: It’s likely that you picked up this book because you’re interested in some aspects of Ray. Maybe you’re a distributed systems engineer who wants to know how Ray’s engine works. You might also be a software developer interested in picking up a new technology. Or you could be a data engineer who wants to evaluate how Ray compares to similar tools. You could also be a machine learning practitioner or data scientist who needs to find ways to scale experiments. No matter your concrete role, the common denominator to get the most out of this book is to feel comfortable programming in Python. This book’s examples are written in Python, and an intermediate knowledge of the language is a requirement. Explicit is better than implicit, as you know full well as a Pythonista. So, let us be explicit by saying that knowing Python implies to me that you know how to use the command line on your system, how to get help when stuck, and how to set up a programming environment on your own. If you’ve never worked with distributed systems before, that’s OK. We cover all the basics you need to get started with that in the book. On top of that, you can run most code examples presented here on your laptop. Covering the basics means that we can’t go into too much detail about distributed systems. This book is ultimately focused on application developers using Ray, specifically for Data Science and ML. For the later chapters of this book, you’ll need some familiarity with ML, but we don’t expect you to have worked in the field. In particular, you should have a basic understanding of the ML paradigm and how it differs from traditional programming. You should also know the basics of using NumPy and Pandas. Also, you should at least feel comfortable reading examples using the popular TensorFlow and PyTorch libraries. It’s enough to follow the flow of the code, on the API level, but you don’t need to know how to write your own models. We cover examples using both dominant deep learning libraries (TensorFlow and PyTorch) to illustrate how you can use Ray for ML workloads, regardless of your preferred framework. We cover a lot of ground in advanced ML topics, but the main focus is on Ray as a technology and how to use it. The ML examples we discuss might be new to you and could require a second reading, but you can still focus on Ray’s API and how to use it in practice.

Author(s): Max Pumperla, Edward Oakes, Richard Liaw
Edition: final
Publisher: O’Reilly Media, Inc.
Year: 2023

Language: English
Commentary: Final Release
Pages: 271

Foreword
Preface
Who Should Read This Book
Goals of This Book
Navigating This Book
How to Use the Code Examples
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. An Overview of Ray
What Is Ray?
What Led to Ray?
Ray’s Design Principles
Simplicity and abstraction
Flexibility and heterogeneity
Speed and scalability
Three Layers: Core, Libraries, and Ecosystem
A Distributed Computing Framework
A Suite of Data Science Libraries
Ray AIR and the Data Science Workflow
Data Processing with Ray Datasets
Model Training
Reinforcement learning with Ray RLlib
Distributed training with Ray Train
Hyperparameter Tuning
Model Serving
A Growing Ecosystem
Summary
2. Getting Started with Ray Core
An Introduction to Ray Core
A First Example Using the Ray API
Functions and remote Ray tasks
Using the object store with put and get
Using Ray’s wait function for nonblocking calls
Handling task dependencies
From classes to actors
An Overview of the Ray Core API
Understanding Ray System Components
Scheduling and Executing Work on a Node
The Head Node
Distributed Scheduling and Execution
A Simple MapReduce Example with Ray
Mapping and Shuffling Document Data
Reducing Word Counts
Summary
3. Building Your First Distributed Application
Introducing Reinforcement Learning
Setting Up a Simple Maze Problem
Building a Simulation
Training a Reinforcement Learning Model
Building a Distributed Ray App
Recapping RL Terminology
Summary
4. Reinforcement Learning with Ray RLlib
An Overview of RLlib
Getting Started with RLlib
Building a Gym Environment
Running the RLlib CLI
Using the RLlib Python API
Training RLlib algorithms
Saving, loading, and evaluating RLlib models
Computing actions
Accessing policy and model states
Configuring RLlib Experiments
Resource Configuration
Rollout Worker Configuration
Environment Configuration
Working with RLlib Environments
An Overview of RLlib Environments
Working with Multiple Agents
Working with Policy Servers and Clients
Defining a server
Defining a client
Advanced Concepts
Building an Advanced Environment
Applying Curriculum Learning
Working with Offline Data
Other Advanced Topics
Summary
5. Hyperparameter Optimization with Ray Tune
Tuning Hyperparameters
Building a Random Search Example with Ray
Why Is HPO Hard?
An Introduction to Tune
How Does Tune Work?
Search algorithms
Schedulers
Configuring and Running Tune
Specifying resources
Callbacks and metrics
Checkpoints, stopping, and resuming
Custom and conditional search spaces
Machine Learning with Tune
Using RLlib with Tune
Tuning Keras Models
Summary
6. Data Processing with Ray
Ray Datasets
Ray Datasets Basics
Creating a Ray Dataset
Reading from and writing to storage
Built-in transformations
Blocks and repartitioning
Schemas and data formats
Computing Over Ray Datasets
Dataset Pipelines
Example: Training Copies of a Classifier in Parallel
External Library Integrations
Building an ML Pipeline
Summary
7. Distributed Training with Ray Train
The Basics of Distributed Model Training
Introduction to Ray Train by Example
Predicting Big Tips in NYC Taxi Rides
Loading, Preprocessing, and Featurization
Defining a Deep Learning Model
Distributed Training with Ray Train
Distributed Batch Inference
More on Trainers in Ray Train
Migrating to Ray Train with Minimal Code Changes
Scaling Out Trainers
Preprocessing with Ray Train
Integrating Trainers with Ray Tune
Using Callbacks to Monitor Training
Summary
8. Online Inference with Ray Serve
Key Characteristics of Online Inference
ML Models Are Compute Intensive
ML Models Aren’t Useful in Isolation
An Introduction to Ray Serve
Architectural Overview
Defining a Basic HTTP Endpoint
Scaling and Resource Allocation
Request Batching
Multimodel Inference Graphs
Core feature: binding multiple deployments
Pattern 1: Pipelining
Pattern 2: Broadcasting
Pattern 3: Conditional logic
End-to-End Example: Building an NLP-Powered API
Fetching Content and Preprocessing
NLP Models
HTTP Handling and Driver Logic
Putting It All Together
Summary
9. Ray Clusters
Manually Creating a Ray Cluster
Deployment on Kubernetes
Setting Up Your First KubeRay Cluster
Interacting with the KubeRay Cluster
Running Ray programs with kubectl
Using the Ray Job Submission server
Ray Client
Exposing KubeRay
Configuring KubeRay
Configuring Logging for KubeRay
Using the Ray Cluster Launcher
Configuring Your Ray Cluster
Using the Cluster Launcher CLI
Interacting with a Ray Cluster
Working with Cloud Clusters
AWS
Using Other Cloud Providers
Autoscaling
Summary
10. Getting Started with the Ray AI Runtime
Why Use AIR?
Key AIR Concepts by Example
Ray Datasets and Preprocessors
Trainers
Tuners and Checkpoints
Batch Predictors
Deployments
Workloads That Are Suited for AIR
AIR Workload Execution
Stateless execution
Stateful execution
Composite workload execution
Online serving execution
AIR Memory Management
AIR Failure Model
Autoscaling AIR Workloads
Summary
11. Ray’s Ecosystem and Beyond
A Growing Ecosystem
Data Loading and Processing
Model Training
Model Serving
Building Custom Integrations
An Overview of Ray’s Integrations
Ray and Other Systems
Distributed Python Frameworks
Ray AIR and the Broader ML Ecosystem
How to Integrate AIR into Your ML Platform
Where to Go from Here?
Summary
Index