The Art of Multiprocessor Programming, Second Edition, provides users with an authoritative guide to multicore programming. This updated edition introduces higher level software development skills relative to those needed for efficient single-core programming, and includes comprehensive coverage of the new principles, algorithms, and tools necessary for effective multiprocessor programming. The book is an ideal resource for students and professionals alike who will benefit from its thorough coverage of key multiprocessor programming issues.
Author(s): Maurice Herlihy, Nir Shavit, Victor Luchangco, Michael Spear
Edition: 2
Publisher: Morgan Kaufmann
Year: 2020
Language: English
Commentary: Vector PDF
Pages: 576
City: Cambridge, MA
Tags: Java; Concurrency; Parallel Programming; Transactional Programming
Contents
Preface
Acknowledgments
Suggested ways to teach the art of multiprocessor programming
1 Introduction
1.1 Shared objects and synchronization
1.2 A fable
1.2.1 Properties of a mutual exclusion protocol
1.2.2 The moral
1.3 The producer-consumer problem
1.4 The readers-writers problem
1.5 The harsh realities of parallelization
1.6 Parallel programming
1.7 Chapter notes
1.8 Exercises
2 Mutual exclusion
2.1 Time and events
2.2 Critical sections
2.3 Two-thread solutions
2.3.1 The LockOne class
2.3.2 The LockTwo class
2.3.3 The Peterson lock
2.4 Notes on deadlock
2.5 The filter lock
2.6 Fairness
2.7 Lamport's Bakery algorithm
2.8 Bounded timestamps
2.9 Lower bounds on the number of locations
2.10 Chapter notes
2.11 Exercises
3 Concurrent objects
3.1 Concurrency and correctness
3.2 Sequential objects
3.3 Sequential consistency
3.3.1 Sequential consistency versus real-time order
3.3.2 Sequential consistency is nonblocking
3.3.3 Compositionality
3.4 Linearizability
3.4.1 Linearization points
3.4.2 Linearizability versus sequential consistency
3.5 Quiescent consistency
3.5.1 Properties of quiescent consistency
3.6 Formal definitions
3.6.1 Histories
3.6.2 Linearizability
3.6.3 Linearizability is compositional
3.6.4 Linearizability is nonblocking
3.7 Memory consistency models
3.8 Progress conditions
3.8.1 Wait-freedom
3.8.2 Lock-freedom
3.8.3 Obstruction-freedom
3.8.4 Blocking progress conditions
3.8.5 Characterizing progress conditions
3.9 Remarks
3.10 Chapter notes
3.11 Exercises
4 Foundations of shared memory
4.1 The space of registers
4.2 Register constructions
4.2.1 Safe MRSW registers
4.2.2 A regular Boolean MRSW register
4.2.3 A regular M-valued MRSW register
4.2.4 An atomic SRSW register
4.2.5 An atomic MRSW register
4.2.6 An atomic MRMW register
4.3 Atomic snapshots
4.3.1 An obstruction-free snapshot
4.3.2 A wait-free snapshot
4.3.3 Correctness arguments
4.4 Chapter notes
4.5 Exercises
5 The relative power of primitive synchronization operations
5.1 Consensus numbers
5.1.1 States and valence
5.2 Atomic registers
5.3 Consensus protocols
5.4 FIFO queues
5.5 Multiple assignment objects
5.6 Read-modify-write operations
5.7 Common2 RMW operations
5.8 The compareAndSet operation
5.9 Chapter notes
5.10 Exercises
6 Universality of consensus
6.1 Introduction
6.2 Universality
6.3 A lock-free universal construction
6.4 A wait-free universal construction
6.5 Chapter notes
6.6 Exercises
7 Spin locks and contention
7.1 Welcome to the real world
7.2 Volatile fields and atomic objects
7.3 Test-and-set locks
7.4 Exponential back-off
7.5 Queue locks
7.5.1 Array-based locks
7.5.2 The CLH queue lock
7.5.3 The MCS queue lock
7.6 A queue lock with timeouts
7.7 Hierarchical locks
7.7.1 A hierarchical back-off lock
7.7.2 Cohort locks
7.7.3 A cohort lock implementation
7.8 A composite lock
7.9 A fast path for threads running alone
7.10 One lock to rule them all
7.11 Chapter notes
7.12 Exercises
8 Monitors and blocking synchronization
8.1 Introduction
8.2 Monitor locks and conditions
8.2.1 Conditions
8.2.2 The lost-wakeup problem
8.3 Readers-writers locks
8.3.1 Simple readers-writers lock
8.3.2 Fair readers-writers lock
8.4 Our own reentrant lock
8.5 Semaphores
8.6 Chapter notes
8.7 Exercises
9 Linked lists: The role of locking
9.1 Introduction
9.2 List-based sets
9.3 Concurrent reasoning
9.4 Coarse-grained synchronization
9.5 Fine-grained synchronization
9.6 Optimistic synchronization
9.7 Lazy synchronization
9.8 Nonblocking synchronization
9.9 Discussion
9.10 Chapter notes
9.11 Exercises
10 Queues, memory management, and the ABA problem
10.1 Introduction
10.2 Queues
10.3 A bounded partial queue
10.4 An unbounded total queue
10.5 A lock-free unbounded queue
10.6 Memory reclamation and the ABA problem
10.6.1 A naïve synchronous queue
10.7 Dual data structures
10.8 Chapter notes
10.9 Exercises
11 Stacks and elimination
11.1 Introduction
11.2 An unbounded lock-free stack
11.3 Elimination
11.4 The elimination back-off stack
11.4.1 A lock-free exchanger
11.4.2 The elimination array
11.5 Chapter notes
11.6 Exercises
12 Counting, sorting, and distributed coordination
12.1 Introduction
12.2 Shared counting
12.3 Software combining
12.3.1 Overview
12.3.2 An extended example
12.3.3 Performance and robustness
12.4 Quiescently consistent pools and counters
12.5 Counting networks
12.5.1 Networks that count
12.5.2 The bitonic counting network
12.5.2.1 A software bitonic counting network
12.5.2.2 Proof of correctness
12.5.2.3 A periodic counting network
12.5.2.4 A software periodic counting network
12.5.3 Performance and pipelining
12.6 Diffracting trees
12.7 Parallel sorting
12.8 Sorting networks
12.8.1 Designing a sorting network
12.8.1.1 A bitonic sorting algorithm
12.9 Sample sorting
12.10 Distributed coordination
12.11 Chapter notes
12.12 Exercises
13 Concurrent hashing and natural parallelism
13.1 Introduction
13.2 Closed-address hash sets
13.2.1 A coarse-grained hash set
13.2.2 A striped hash set
13.2.3 A refinable hash set
13.3 A lock-free hash set
13.3.1 Recursive split-ordering
13.3.2 The BucketList class
13.3.3 The LockFreeHashSet class
13.4 An open-address hash set
13.4.1 Cuckoo hashing
13.4.2 Concurrent cuckoo hashing
13.4.3 Striped concurrent cuckoo hashing
13.4.4 A refinable concurrent cuckoo hash set
13.5 Chapter notes
13.6 Exercises
14 Skiplists and balanced search
14.1 Introduction
14.2 Sequential skiplists
14.3 A lock-based concurrent skiplist
14.3.1 A bird's-eye view
14.3.2 The algorithm
14.4 A lock-free concurrent skiplist
14.4.1 A bird's-eye view
14.4.2 The algorithm in detail
14.5 Concurrent skiplists
14.6 Chapter notes
14.7 Exercises
15 Priority queues
15.1 Introduction
15.1.1 Concurrent priority queues
15.2 An array-based bounded priority queue
15.3 A tree-based bounded priority queue
15.4 An unbounded heap-based priority queue
15.4.1 A sequential heap
15.4.2 A concurrent heap
15.5 A skiplist-based unbounded priority queue
15.6 Chapter notes
15.7 Exercises
16 Scheduling and work distribution
16.1 Introduction
16.2 Analyzing parallelism
16.3 Realistic multiprocessor scheduling
16.4 Work distribution
16.4.1 Work stealing
16.4.2 Yielding and multiprogramming
16.5 Work-stealing deques
16.5.1 A bounded work-stealing deque
16.5.2 An unbounded work-stealing deque
16.5.3 Work dealing
16.6 Chapter notes
16.7 Exercises
17 Data parallelism
17.1 MapReduce
17.1.1 The MapReduce framework
17.1.2 A MapReduce-based WordCount application
17.1.3 A MapReduce-based KMeans application
17.1.4 The MapReduce implementation
17.2 Stream computing
17.2.1 A stream-based WordCount application
17.2.2 A stream-based KMeans application
17.2.3 Making aggregate operations parallel
17.3 Chapter notes
17.4 Exercises
18 Barriers
18.1 Introduction
18.2 Barrier implementations
18.3 Sense reversing barrier
18.4 Combining tree barrier
18.5 Static tree barrier
18.6 Termination detection barriers
18.7 Chapter notes
18.8 Exercises
19 Optimism and manual memory management
19.1 Transitioning from Java to C++
19.2 Optimism and explicit reclamation
19.3 Protecting pending operations
19.4 An object for managing memory
19.5 Traversing a list
19.6 Hazard pointers
19.7 Epoch-based reclamation
19.8 Chapter notes
19.9 Exercises
20 Transactional programming
20.1 Challenges in concurrent programming
20.1.1 Problems with locking
20.1.2 Problems with explicit speculation
20.1.3 Problems with nonblocking algorithms
20.1.4 Problems with compositionality
20.1.5 Summary
20.2 Transactional programming
20.2.1 An example of transactional programming
20.3 Hardware support for transactional programming
20.3.1 Hardware speculation
20.3.2 Basic cache coherence
20.3.3 Transactional cache coherence
20.3.4 Limitations of hardware support
20.4 Transactional lock elision
20.4.1 Discussion
20.5 Transactional memory
20.5.1 Run-time scheduling
20.5.2 Explicit self-abort
20.6 Software transactions
20.6.1 Transactions with ownership records
20.6.2 Transactions with value-based validation
20.7 Combining hardware and software transactions
20.8 Transactional data structure design
20.9 Chapter notes
20.10 Exercises
A Software basics
A.1 Introduction
A.2 Java
A.2.1 Threads
A.2.2 Monitors
A.2.3 Yielding and sleeping
A.2.4 Thread-local objects
A.2.5 Randomization
A.3 The Java memory model
A.3.1 Locks and synchronized blocks
A.3.2 Volatile fields
A.3.3 Final fields
A.4 C++
A.4.1 Threads in C++
A.4.2 Locks in C++
A.4.3 Condition variables
A.4.4 Atomic variables
A.4.5 Thread-local storage
A.5 C#
A.5.1 Threads
A.5.2 Monitors
A.5.3 Thread-local objects
A.6 Appendix notes
B Hardware basics
B.1 Introduction (and a puzzle)
B.2 Processors and threads
B.3 Interconnect
B.4 Memory
B.5 Caches
B.5.1 Coherence
B.5.2 Spinning
B.6 Cache-conscious programming, or the puzzle solved
B.7 Multicore and multithreaded architectures
B.7.1 Relaxed memory consistency
B.8 Hardware synchronization instructions
B.9 Appendix notes
B.10 Exercises
Bibliography
Index