Get to grips with various performance improvement techniques such as concurrency, lock-free programming, atomic operations, parallelism, and memory management
Key Features
- Understand the limitations of modern CPUs and their performance impact
- Find out how you can avoid writing inefficient code and get the best optimizations from the compiler
- Learn the tradeoffs and costs of writing high-performance programs
Book Description
The great free lunch of "performance taking care of itself" is over. Until recently, programs got faster by themselves as CPUs were upgraded, but that doesn't happen anymore. The clock frequency of new processors has almost peaked. New architectures provide small improvements to existing programs, but this only helps slightly. Processors do get larger and more powerful, but most of this new power is consumed by the increased number of processing cores and other "extra" computing units. To write efficient software, you now have to know how to program by making good use of the available computing resources, and this book will teach you how to do that.
The book covers all the major aspects of writing efficient programs, such as using CPU resources and memory efficiently, avoiding unnecessary computations, measuring performance, and how to put concurrency and multithreading to good use. You'll also learn about compiler optimizations and how to use the programming language (C++) more efficiently. Finally, you'll understand how design decisions impact performance.
By the end of this book, you'll not only have enough knowledge of processors and compilers to write efficient programs, but you'll also be able to understand which techniques to use and what to measure while improving performance. At its core, this book is about learning how to learn.
What you will learn
- Discover how to use the hardware computing resources in your programs effectively
- Understand the relationship between memory order and memory barriers
- Familiarize yourself with the performance implications of different data structures and organizations
- Assess the performance impact of concurrent memory accessed and how to minimize it
- Discover when to use and when not to use lock-free programming techniques
- Explore different ways to improve the effectiveness of compiler optimizations
- Design APIs for concurrent data structures and high-performance data structures to avoid inefficiencies
Who this book is for
This book is for experienced developers and programmers who work on performance-critical projects and want to learn different techniques to improve the performance of their code. Programmers who belong to algorithmic trading, gaming, bioinformatics, computational genomics, or computational fluid dynamics communities can learn various techniques from this book and apply them in their domain of work.
Although this book uses the C++ language, the concepts demonstrated in the book can be easily transferred or applied to other compiled languages such as C, Java, Rust, Go, and more.
Table of Contents
- Introduction to Performance and Concurrency
- Performance Measurements
- CPU Architecture, Resources, and Performance Implications
- Memory Architecture and Performance
- Threads, Memory, and Concurrency
- Concurrency and Performance
- Data Structures for Concurrency
- Concurrency in C++
- High-Performance C++
- Compiler Optimizations in C++
- Undefined Behavior and Performance
- Design for Performance
Author(s): Fedor G. Pikus
Edition: 1
Publisher: Packt Publishing
Year: 2021
Language: English
Pages: 452
Tags: Concurrency; Performance Measurements; CPU Architecture; Memory Architecture; Data Structures for Concurrency; High-Performance C++; C++;
Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Section 1 – Performance Fundamentals
Chapter 1: Introduction to Performance and Concurrency
Why focus on performance?
Why performance matters
What is performance?
Performance as throughput
Performance as power consumption
Performance for real-time applications
Performance as dependent on context
Evaluating, estimating, and predicting performance
Learning about high performance
Summary
Questions
Chapter 2: Performance Measurements
Technical requirements
Performance measurements by example
Performance benchmarking
C++ chrono timers
High-resolution timers
Performance profiling
The perf profiler
Detailed profiling with perf
The Google Performance profiler
Profiling with call graphs
Optimization and inlining
Practical profiling
Micro-benchmarking
Basics of micro-benchmarking
Micro-benchmarking and compiler optimizations
Google Benchmark
Micro-benchmarks are lies
Summary
Questions
Chapter 3: CPU Architecture, Resources, and Performance
Technical requirements
The performance begins with the CPU
Probing performance with micro-benchmarks
Visualizing instruction-level parallelism
Data dependencies and pipelining
Pipelining and branches
Branch prediction
Profiling for branch mispredictions
Speculative execution
Optimization of complex conditions
Branchless computing
Loop unrolling
Branchless selection
Branchless computing examples
Summary
Questions
Chapter 4: Memory Architecture and Performance
Technical requirements
The performance begins with the CPU but does not end there
Measuring memory access speed
Memory architecture
Measuring memory and cache speeds
The speed of memory: the numbers
The speed of random memory access
The speed of sequential memory access
Memory performance optimizations in hardware
Optimizing memory performance
Memory-efficient data structures
Profiling memory performance
Optimizing algorithms for memory performance
The ghost in the machine
What is Spectre?
Spectre by example
Spectre, unleashed
Summary
Questions
Chapter 5: Threads, Memory, and Concurrency
Technical requirements
Understanding threads and concurrency
What is a thread?
Symmetric multi-threading
Threads and memory
Memory-bound programs and concurrency
Understanding the cost of memory synchronization
Why data sharing is expensive
Learning about concurrency and order
The need for order
Memory order and memory barriers
Memory order in C++
Memory model
Summary
Questions
Section 2 – Advanced Concurrency
Chapter 6: Concurrency and Performance
Technical requirements
What is needed to use concurrency effectively?
Locks, alternatives, and their performance
Lock-based, lock-free, and wait-free programs
Different locks for different problems
Lock-based versus lock-free, what is the real difference?
Building blocks for concurrent programming
The basics of concurrent data structures
Counters and accumulators
Publishing protocol
Smart pointers for concurrent programming
Summary
Questions
Chapter 7: Data Structures for Concurrency
Technical requirements
What is a thread-safe data structure?
The best kind of thread safety
The real thread safety
The thread-safe stack
Interface design for thread safety
Performance of mutex-guarded data structures
Performance requirements for different uses
Stack performance in detail
Performance estimates for synchronization schemes
Lock-free stack
The thread-safe queue
Lock-free queue
Non-sequentially consistent data structures
Memory management for concurrent data structures
The thread-safe list
Lock-free list
Summary
Questions
Chapter 8: Concurrency in C++
Technical requirements
Concurrency support in C++11
Concurrency support in C++17
Concurrency support in C++20
The foundations of coroutines
Coroutine C++ syntax
Coroutine examples
Summary
Questions
Section 3 – Designing and Coding High-Performance Programs
Chapter 9: High-Performance C++
Technical requirements
What is the efficiency of a programming language?
Unnecessary copying
Copying and argument passing
Copying as an implementation technique
Copying to store data
Copying of return values
Using pointers to avoid copying
How to avoid unnecessary copying
Inefficient memory management
Unnecessary memory allocations
Memory management in concurrent programs
Avoiding memory fragmentation
Optimization of conditional execution
Summary
Questions
Chapter 10: Compiler Optimizations in C++
Technical requirements
Compilers optimizing code
Basics of compiler optimizations
Function inlining
What does the compiler really know?
Lifting knowledge from runtime to compile time
Summary
Questions
Chapter 11: Undefined Behavior and Performance
Technical requirements
What is undefined behavior?
Why have undefined behavior?
Undefined behavior and C++ optimization
Using undefined behavior for efficient design
Summary
Questions
Chapter 12: Design for Performance
Technical requirements
Interaction between the design and performance
Design for performance
The minimum information principle
The maximum information principle
API design considerations
API design for concurrency
Copying and sending data
Design for optimal data access
Performance trade-offs
Interface design
Component design
Errors and undefined behavior
Making informed design decisions
Summary
Questions
Assessments
Chapter 1:
Chapter 2:
Chapter 3:
Chapter 4:
Chapter 5:
Chapter 6:
Chapter 7:
Chapter 8:
Chapter 9:
Chapter 10:
Chapter 11:
Chapter 12:
Other Books You May Enjoy
Index