Your Python code may run correctly, but you need it to run faster. Updated for Python 3, this expanded edition shows you how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs. By exploring the fundamental theory behind design choices, High Performance Python helps you gain a deeper understanding of Python’s implementation.
How do you take advantage of multicore architectures or clusters? Or build a system that scales up and down without losing reliability? Experienced Python programmers will learn concrete solutions to many issues, along with war stories from companies that use high-performance Python for social media analytics, productionized machine learning, and more.
• Get a better grasp of NumPy, Cython, and profilers
• Learn how Python abstracts the underlying computer architecture
• Use profiling to find bottlenecks in CPU time and memory usage
• Write efficient programs by choosing appropriate data structures
• Speed up matrix and vector computations
• Use tools to compile Python down to machine code
• Manage multiple I/O and computational operations concurrently
• Convert multiprocessing code to run on local or remote clusters
• Deploy code faster using tools like Docker
Author(s): Micha Gorelick, Ian Ozsvald
Edition: 2
Publisher: O'Reilly Media
Year: 2020
Language: English
Commentary: Vector PDF
Pages: 468
City: Sebastopol, CA
Tags: Machine Learning; Deep Learning; Data Structures; Python; Big Data; Asynchronous Programming; Clusters; Memory Management; Profiling; Best Practices; NumPy; pandas; Multiprocessing; Numba; Cython; High Performance Computing; PyPy; Queues; Bottlenecks; PySpy
Cover
Copyright
Table of Contents
Foreword
Preface
Who This Book Is For
Who This Book Is Not For
What You’ll Learn
Python 3
Changes from Python 2.7
License
How to Make an Attribution
Errata and Feedback
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Understanding Performant Python
The Fundamental Computer System
Computing Units
Memory Units
Communications Layers
Putting the Fundamental Elements Together
Idealized Computing Versus the Python Virtual Machine
So Why Use Python?
How to Be a Highly Performant Programmer
Good Working Practices
Some Thoughts on Good Notebook Practice
Getting the Joy Back into Your Work
Chapter 2. Profiling to Find Bottlenecks
Profiling Efficiently
Introducing the Julia Set
Calculating the Full Julia Set
Simple Approaches to Timing—print and a Decorator
Simple Timing Using the Unix time Command
Using the cProfile Module
Visualizing cProfile Output with SnakeViz
Using line_profiler for Line-by-Line Measurements
Using memory_profiler to Diagnose Memory Usage
Introspecting an Existing Process with PySpy
Bytecode: Under the Hood
Using the dis Module to Examine CPython Bytecode
Different Approaches, Different Complexity
Unit Testing During Optimization to Maintain Correctness
No-op @profile Decorator
Strategies to Profile Your Code Successfully
Wrap-Up
Chapter 3. Lists and Tuples
A More Efficient Search
Lists Versus Tuples
Lists as Dynamic Arrays
Tuples as Static Arrays
Wrap-Up
Chapter 4. Dictionaries and Sets
How Do Dictionaries and Sets Work?
Inserting and Retrieving
Deletion
Resizing
Hash Functions and Entropy
Dictionaries and Namespaces
Wrap-Up
Chapter 5. Iterators and Generators
Iterators for Infinite Series
Lazy Generator Evaluation
Wrap-Up
Chapter 6. Matrix and Vector Computation
Introduction to the Problem
Aren’t Python Lists Good Enough?
Problems with Allocating Too Much
Memory Fragmentation
Understanding perf
Making Decisions with perf’s Output
Enter numpy
Applying numpy to the Diffusion Problem
Memory Allocations and In-Place Operations
Selective Optimizations: Finding What Needs to Be Fixed
numexpr: Making In-Place Operations Faster and Easier
A Cautionary Tale: Verify “Optimizations” (scipy)
Lessons from Matrix Optimizations
Pandas
Pandas’s Internal Model
Applying a Function to Many Rows of Data
Building DataFrames and Series from Partial Results Rather than Concatenating
There’s More Than One (and Possibly a Faster) Way to Do a Job
Advice for Effective Pandas Development
Wrap-Up
Chapter 7. Compiling to C
What Sort of Speed Gains Are Possible?
JIT Versus AOT Compilers
Why Does Type Information Help the Code Run Faster?
Using a C Compiler
Reviewing the Julia Set Example
Cython
Compiling a Pure Python Version Using Cython
pyximport
Cython Annotations to Analyze a Block of Code
Adding Some Type Annotations
Cython and numpy
Parallelizing the Solution with OpenMP on One Machine
Numba
Numba to Compile NumPy for Pandas
PyPy
Garbage Collection Differences
Running PyPy and Installing Modules
A Summary of Speed Improvements
When to Use Each Technology
Other Upcoming Projects
Graphics Processing Units (GPUs)
Dynamic Graphs: PyTorch
Basic GPU Profiling
Performance Considerations of GPUs
When to Use GPUs
Foreign Function Interfaces
ctypes
cffi
f2py
CPython Module
Wrap-Up
Chapter 8. Asynchronous I/O
Introduction to Asynchronous Programming
How Does async/await Work?
Serial Crawler
Gevent
tornado
aiohttp
Shared CPU–I/O Workload
Serial
Batched Results
Full Async
Wrap-Up
Chapter 9. The multiprocessing Module
An Overview of the multiprocessing Module
Estimating Pi Using the Monte Carlo Method
Estimating Pi Using Processes and Threads
Using Python Objects
Replacing multiprocessing with Joblib
Random Numbers in Parallel Systems
Using numpy
Finding Prime Numbers
Queues of Work
Verifying Primes Using Interprocess Communication
Serial Solution
Naive Pool Solution
A Less Naive Pool Solution
Using Manager.Value as a Flag
Using Redis as a Flag
Using RawValue as a Flag
Using mmap as a Flag
Using mmap as a Flag Redux
Sharing numpy Data with multiprocessing
Synchronizing File and Variable Access
File Locking
Locking a Value
Wrap-Up
Chapter 10. Clusters and Job Queues
Benefits of Clustering
Drawbacks of Clustering
$462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
Skype’s 24-Hour Global Outage
Common Cluster Designs
How to Start a Clustered Solution
Ways to Avoid Pain When Using Clusters
Two Clustering Solutions
Using IPython Parallel to Support Research
Parallel Pandas with Dask
NSQ for Robust Production Clustering
Queues
Pub/sub
Distributed Prime Calculation
Other Clustering Tools to Look At
Docker
Docker’s Performance
Advantages of Docker
Wrap-Up
Chapter 11. Using Less RAM
Objects for Primitives Are Expensive
The array Module Stores Many Primitive Objects Cheaply
Using Less RAM in NumPy with NumExpr
Understanding the RAM Used in a Collection
Bytes Versus Unicode
Efficiently Storing Lots of Text in RAM
Trying These Approaches on 11 Million Tokens
Modeling More Text with Scikit-Learn’s FeatureHasher
Introducing DictVectorizer and FeatureHasher
Comparing DictVectorizer and FeatureHasher on a Real Problem
SciPy’s Sparse Matrices
Tips for Using Less RAM
Probabilistic Data Structures
Very Approximate Counting with a 1-Byte Morris Counter
K-Minimum Values
Bloom Filters
LogLog Counter
Real-World Example
Chapter 12. Lessons from the Field
Streamlining Feature Engineering Pipelines with Feature-engine
Feature Engineering for Machine Learning
The Hard Task of Deploying Feature Engineering Pipelines
Leveraging the Power of Open Source Python Libraries
Feature-engine Smooths Building and Deployment of Feature Engineering Pipelines
Helping with the Adoption of a New Open Source Package
Developing, Maintaining, and Encouraging Contribution to Open Source Libraries
Highly Performant Data Science Teams
How Long Will It Take?
Discovery and Planning
Managing Expectations and Delivery
Numba
A Simple Example
Best Practices and Recommendations
Getting Help
Optimizing Versus Thinking
Adaptive Lab’s Social Media Analytics (2014)
Python at Adaptive Lab
SoMA’s Design
Our Development Methodology
Maintaining SoMA
Advice for Fellow Engineers
Making Deep Learning Fly with RadimRehurek.com (2014)
The Sweet Spot
Lessons in Optimizing
Conclusion
Large-Scale Productionized Machine Learning at Lyst.com (2014)
Cluster Design
Code Evolution in a Fast-Moving Start-Up
Building the Recommendation Engine
Reporting and Monitoring
Some Advice
Large-Scale Social Media Analysis at Smesh (2014)
Python’s Role at Smesh
The Platform
High Performance Real-Time String Matching
Reporting, Monitoring, Debugging, and Deployment
PyPy for Successful Web and Data Processing Systems (2014)
Prerequisites
The Database
The Web Application
OCR and Translation
Task Distribution and Workers
Conclusion
Task Queues at Lanyrd.com (2014)
Python’s Role at Lanyrd
Making the Task Queue Performant
Reporting, Monitoring, Debugging, and Deployment
Advice to a Fellow Developer
Index
About the Authors
Colophon