Python is a versatile language that has found applications in many industries. The clean syntax, rich standard library, and vast selection of third-party libraries make Python a wildly popular language.
Python High Performance is a practical guide that shows how to leverage the power of both native and third-party Python libraries to build robust applications.
The book explains how to use various profilers to find performance bottlenecks and apply the correct algorithm to fix them. The reader will learn how to effectively use NumPy and Cython to speed up numerical code. The book explains concepts of concurrent programming and how to implement robust and responsive applications using Reactive programming. Readers will learn how to write code for parallel architectures using Tensorflow and Theano, and use a cluster of computers for large-scale computations using technologies such as Dask and PySpark.
By the end of the book, readers will have learned to achieve performance and scale from their Python applications.
Author(s): Gabriele Lanaro
Edition: 2
Publisher: Packt Publishing
Year: 2017
Cover
Copyright
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Benchmarking and Profiling
Designing your application
Writing tests and benchmarks
Timing your benchmark
Better tests and benchmarks with pytest-benchmark
Finding bottlenecks with cProfile
Profile line by line with line_profiler
Optimizing our code
The dis module
Profiling memory usage with memory_profiler
Summary
Chapter 2: Pure Python Optimizations
Useful algorithms and data structures
Lists and deques
Dictionaries
Building an in-memory search index using a hash map
Sets
Heaps
Tries
Caching and memoization
Joblib
Comprehensions and generators
Summary
Chapter 3: Fast Array Operations with NumPy and Pandas
Getting started with NumPy
Creating arrays
Accessing arrays
Broadcasting
Mathematical operations
Calculating the norm
Rewriting the particle simulator in NumPy
Reaching optimal performance with numexpr
Pandas
Pandas fundamentals
Indexing Series and DataFrame objects
Database-style operations with Pandas
Mapping
Grouping, aggregations, and transforms
Joining
Summary
Chapter 4: C Performance with Cython
Compiling Cython extensions
Adding static types
Variables
Functions
Classes
Sharing declarations
Working with arrays
C arrays and pointers
NumPy arrays
Typed memoryviews
Particle simulator in Cython
Profiling Cython
Using Cython with Jupyter
Summary
Chapter 5: Exploring Compilers
Numba
First steps with Numba
Type specializations
Object mode versus native mode
Numba and NumPy
Universal functions with Numba
Generalized universal functions
JIT classes
Limitations in Numba
The PyPy project
Setting up PyPy
Running a particle simulator in PyPy
Other interesting projects
Summary
Chapter 6: Implementing Concurrency
Asynchronous programming
Waiting for I/O
Concurrency
Callbacks
Futures
Event loops
The asyncio framework
Coroutines
Converting blocking code into non-blocking code
Reactive programming
Observables
Useful operators
Hot and cold observables
Building a CPU monitor
Summary
Chapter 7: Parallel Processing
Introduction to parallel programming
Graphic processing units
Using multiple processes
The Process and Pool classes
The Executor interface
Monte Carlo approximation of pi
Synchronization and locks
Parallel Cython with OpenMP
Automatic parallelism
Getting started with Theano
Profiling Theano
Tensorflow
Running code on a GPU
Summary
Chapter 8: Distributed Processing
Introduction to distributed computing
An introduction to MapReduce
Dask
Directed Acyclic Graphs
Dask arrays
Dask Bag and DataFrame
Dask distributed
Manual cluster setup
Using PySpark
Setting up Spark and PySpark
Spark architecture
Resilient Distributed Datasets
Spark DataFrame
Scientific computing with mpi4py
Summary
Chapter 9: Designing for High Performance
Choosing a suitable strategy
Generic applications
Numerical code
Big data
Organizing your source code
Isolation, virtual environments, and containers
Using conda environments
Virtualization and Containers
Creating docker images
Continuous integration
Summary
Index