An Introduction to Parallel Programming

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Author(s): Pacheco, Peter S.; Malensek, Matthew
Edition: 2.
Publisher: Elsevier
Year: 2021

Language: English
Pages: 777

Title page
Table of Contents
Copyright
Dedication
Preface
Chapter 1: Why parallel computing
1.1. Why we need ever-increasing performance
1.2. Why we're building parallel systems
1.3. Why we need to write parallel programs
1.4. How do we write parallel programs?
1.5. What we'll be doing
1.6. Concurrent, parallel, distributed
1.7. The rest of the book
1.8. A word of warning
1.9. Typographical conventions
1.10. Summary
1.11. Exercises
Bibliography
Chapter 2: Parallel hardware and parallel software
2.1. Some background
2.2. Modifications to the von Neumann model
2.3. Parallel hardware
2.4. Parallel software
2.5. Input and output
2.6. Performance
2.7. Parallel program design
2.8. Writing and running parallel programs
2.9. Assumptions
2.10. Summary
2.11. Exercises
Bibliography
Chapter 3: Distributed memory programming with MPI
3.1. Getting started
3.2. The trapezoidal rule in MPI
3.3. Dealing with I/O
3.4. Collective communication
3.5. MPI-derived datatypes
3.6. Performance evaluation of MPI programs
3.7. A parallel sorting algorithm
3.8. Summary
3.9. Exercises
3.10. Programming assignments
Bibliography
Chapter 4: Shared-memory programming with Pthreads
4.1. Processes, threads, and Pthreads
4.2. Hello, world
4.3. Matrix-vector multiplication
4.4. Critical sections
4.5. Busy-waiting
4.6. Mutexes
4.7. Producer–consumer synchronization and semaphores
4.8. Barriers and condition variables
4.9. Read-write locks
4.10. Caches, cache-coherence, and false sharing
4.11. Thread-safety
4.12. Summary
4.13. Exercises
4.14. Programming assignments
Bibliography
Chapter 5: Shared-memory programming with OpenMP
5.1. Getting started
5.2. The trapezoidal rule
5.3. Scope of variables
5.4. The reduction clause
5.5. The parallel for directive
5.6. More about loops in OpenMP: sorting
5.7. Scheduling loops
5.8. Producers and consumers
5.9. Caches, cache coherence, and false sharing
5.10. Tasking
5.11. Thread-safety
5.12. Summary
5.13. Exercises
5.14. Programming assignments
Bibliography
Chapter 6: GPU programming with CUDA
6.1. GPUs and GPGPU
6.2. GPU architectures
6.3. Heterogeneous computing
6.4. CUDA hello
6.5. A closer look
6.6. Threads, blocks, and grids
6.7. Nvidia compute capabilities and device architectures
6.8. Vector addition
6.9. Returning results from CUDA kernels
6.10. CUDA trapezoidal rule I
6.11. CUDA trapezoidal rule II: improving performance
6.12. Implementation of trapezoidal rule with warpSize thread blocks
6.13. CUDA trapezoidal rule III: blocks with more than one warp
6.14. Bitonic sort
6.15. Summary
6.16. Exercises
6.17. Programming assignments
Bibliography
Chapter 7: Parallel program development
7.1. Two n-body solvers
7.2. Sample sort
7.3. A word of caution
7.4. Which API?
7.5. Summary
7.6. Exercises
7.7. Programming assignments
Bibliography
Chapter 8: Where to go from here
Bibliography
Bibliography
Bibliography
Index