Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes
Key Features
• Use PyTorch 1.x to design and build self-learning artificial intelligence (AI) models
• Implement RL algorithms to solve control and optimization challenges faced by data scientists today
• Apply modern RL libraries to simulate a controlled environment for your projects
Book Description
Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.
With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.
By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.
What you will learn
• Use Q-learning and the state–action–reward–state–action (SARSA) algorithm to solve various Gridworld problems
• Develop a multi-armed bandit algorithm to optimize display advertising
• Scale up learning and control processes using Deep Q-Networks
• Simulate Markov Decision Processes, OpenAI Gym environments, and other common control problems
• Select and build RL models, evaluate their performance, and optimize and deploy them
• Use policy gradient methods to solve continuous RL problems
Who this book is for
Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.
Author(s): Yuxi (Hayden) Liu
Edition: 1
Publisher: Packt Publishing
Year: 2019
Language: English
Commentary: Vector PDF
Pages: 340
City: Birmingham, UK
Tags: Machine Learning; Reinforcement Learning; Python; OpenAI Gym; Dynamic Programming; PyTorch; Temporal Difference Learning; Q-Learning; Markov Decision Process; Monte Carlo Simulations; Deep Q-Networks; Policy Gradient Methods
Cover
Title Page
Copyright and Credits
About Packt
Contributors
Table of Contents
Preface
Chapter 1: Getting Started with Reinforcement Learning and PyTorch
Setting up the working environment
How to do it...
How it works...
There's more...
See also
Installing OpenAI Gym
How to do it...
How it works...
There's more...
See also
Simulating Atari environments
How to do it...
How it works...
There's more...
See also
Simulating the CartPole environment
How to do it...
How it works...
There's more...
Reviewing the fundamentals of PyTorch
How to do it...
There's more...
See also
Implementing and evaluating a random search policy
How to do it...
How it works...
There's more...
Developing the hill-climbing algorithm
How to do it...
How it works...
There's more...
See also
Developing a policy gradient algorithm
How to do it...
How it works...
There's more...
See also
Chapter 2: Markov Decision Processes and Dynamic Programming
Technical requirements
Creating a Markov chain
How to do it...
How it works...
There's more...
See also
Creating an MDP
How to do it...
How it works...
There's more...
See also
Performing policy evaluation
How to do it...
How it works...
There's more...
Simulating the FrozenLake environment
Getting ready
How to do it...
How it works...
There's more...
Solving an MDP with a value iteration algorithm
How to do it...
How it works...
There's more...
Solving an MDP with a policy iteration algorithm
How to do it...
How it works...
There's more...
See also
Solving the coin-flipping gamble problem
How to do it...
How it works...
There's more...
Chapter 3: Monte Carlo Methods for Making Numerical Estimations
Calculating Pi using the Monte Carlo method
How to do it...
How it works...
There's more...
See also
Performing Monte Carlo policy evaluation
How to do it...
How it works...
There's more...
Playing Blackjack with Monte Carlo prediction
How to do it...
How it works...
There's more...
See also
Performing on-policy Monte Carlo control
How to do it...
How it works...
There's more...
Developing MC control with epsilon-greedy policy
How to do it...
How it works...
Performing off-policy Monte Carlo control
How to do it...
How it works...
There's more...
See also
Developing MC control with weighted importance sampling
How to do it...
How it works...
There's more...
See also
Chapter 4: Temporal Difference and Q-Learning
Setting up the Cliff Walking environment playground
Getting ready
How to do it...
How it works...
Developing the Q-learning algorithm
How to do it...
How it works...
There's more...
Setting up the Windy Gridworld environment playground
How to do it...
How it works...
Developing the SARSA algorithm
How to do it...
How it works...
There's more...
Solving the Taxi problem with Q-learning
Getting ready
How to do it...
How it works...
Solving the Taxi problem with SARSA
How to do it...
How it works...
There's more...
Developing the Double Q-learning algorithm
How to do it...
How it works...
See also
Chapter 5: Solving Multi-armed Bandit Problems
Creating a multi-armed bandit environment
How to do it...
How it works...
Solving multi-armed bandit problems with the epsilon-greedy policy
How to do it...
How it works...
There's more...
Solving multi-armed bandit problems with the softmax exploration
How to do it...
How it works...
Solving multi-armed bandit problems with the upper confidence bound algorithm
How to do it...
How it works...
There's more...
See also
Solving internet advertising problems with a multi-armed bandit
How to do it...
How it works...
Solving multi-armed bandit problems with the Thompson sampling algorithm
How to do it...
How it works...
See also
Solving internet advertising problems with contextual bandits
How to do it...
How it works...
Chapter 6: Scaling Up Learning with Function Approximation
Setting up the Mountain Car environment playground
Getting ready
How to do it...
How it works...
Estimating Q-functions with gradient descent approximation
How to do it...
How it works...
See also
Developing Q-learning with linear function approximation
How to do it...
How it works...
Developing SARSA with linear function approximation
How to do it...
How it works...
Incorporating batching using experience replay
How to do it...
How it works...
Developing Q-learning with neural network function approximation
How to do it...
How it works...
See also
Solving the CartPole problem with function approximation
How to do it...
How it works...
Chapter 7: Deep Q-Networks in Action
Developing deep Q-networks
How to do it...
How it works...
See also
Improving DQNs with experience replay
How to do it...
How it works...
Developing double deep Q-Networks
How to do it...
How it works...
Tuning double DQN hyperparameters for CartPole
How to do it...
How it works...
Developing Dueling deep Q-Networks
How to do it...
How it works...
Applying Deep Q-Networks to Atari games
How to do it...
How it works...
Using convolutional neural networks for Atari games
How to do it...
How it works...
See also
Chapter 8: Implementing Policy Gradients and Policy Optimization
Implementing the REINFORCE algorithm
How to do it...
How it works...
See also
Developing the REINFORCE algorithm with baseline
How to do it...
How it works...
Implementing the actor-critic algorithm
How to do it...
How it works...
Solving Cliff Walking with the actor-critic algorithm
How to do it...
How it works...
Setting up the continuous Mountain Car environment
How to do it...
How it works...
Solving the continuous Mountain Car environment with the advantage actor-critic network
How to do it...
How it works...
There's more...
See also
Playing CartPole through the cross-entropy method
How to do it...
How it works...
Chapter 9: Capstone Project – Playing Flappy Bird with DQN
Setting up the game environment
Getting ready
How to do it...
How it works...
Building a Deep Q-Network to play Flappy Bird
How to do it...
How it works...
Training and tuning the network
How to do it...
How it works...
Deploying the model and playing the game
How to do it...
How it works...
Other Books You May Enjoy
Index