Computer Architecture for Scientists: Principles and Performance

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The dramatic increase in computer performance has been extraordinary, but not for all computations: it has key limits and structure. Software architects, developers, and even data scientists need to understand how exploit the fundamental structure of computer performance to harness it for future applications. Ideal for upper level undergraduates, Computer Architecture for Scientists covers four key pillars of computer performance and imparts a high-level basis for reasoning with and understanding these concepts: Small is fast – how size scaling drives performance; Implicit parallelism – how a sequential program can be executed faster with parallelism; Dynamic locality – skirting physical limits, by arranging data in a smaller space; Parallelism – increasing performance with teams of workers. These principles and models provide approachable high-level insights and quantitative modelling without distracting low-level detail. Finally, the text covers the GPU and machine-learning accelerators that have become increasingly important for mainstream applications.

Author(s): Andrew A. Cjien
Year: 2023

Language: English
Pages: 266

Cover
Half-title
Title page
Copyright information
Dedication
Contents
Preface
1 Computing and the Transformation of Society
1.1 Computing Transforms Society and Economy
1.1.1 Home
1.1.2 Automobiles and Transportation
1.1.3 Commerce
1.2 Computing Transforms Science and Discovery
1.3 Extraordinary Characteristics of Computing
1.4 What is Computer Architecture?
1.5 Four Pillars of Computer Performance: Miniaturization, Hidden Parallelism, Dynamic Locality, and Explicit Parallelism
1.6 Expected Background
1.7 Organization of the Book
1.8 Summary
1.9 Problems
2 Instructions Sets, Software, and Instruction Execution
2.1 Computer Instruction Sets
2.2 Computer Systems Architecture
2.3 Instruction Set Architecture: RISC-V Example
2.3.1 Computation Instructions
2.3.2 Conditional Control and Procedure Linkage Instructions
2.3.3 Memory Instructions
2.4 Machine Instructions and Basic Software Structures
2.4.1 Implementing Basic Expressions
2.4.2 Implementing Data Structures: Structs and Objects
2.4.3 Implementing One- and Multi-dimensional Arrays
2.4.4 Implementing Conditional Iterative Constructs: Loops
2.4.5 Implementing Procedure Call and Return and the Stack
2.5 Basic Instruction Execution and Implementation
2.5.1 Sequential State and Instruction Execution
2.5.2 Hardware Implementation of Instruction Execution
2.6 Speeding Up Instruction Execution and Program Performance
2.7 Summary
2.8 Digging Deeper
2.9 Problems
3 Processors and Scaling: Small is Fast!
3.1 Miniaturization and Information Processing
3.2 What Is the Natural Size of a Computer?
3.2.1 Example: Bit Size and Speed
3.2.2 Shrinking Computers
3.3 Computer Size and Speed
3.3.1 Smaller Computers Are Faster
3.3.2 Example: Applying the Size and Clock Period Model
3.3.3 Size Scaling Computers from Room-Sized to a Single Chip
3.3.4 Size Scaling Single-Chip Computers: The Power Problem and Dennard’s Solution
3.3.5 The End of Dennard Scaling
3.4 Computer Size and Power Consumption
3.5 Size in Other Technologies
3.6 Tiny Computers Enable an Explosion of Applications
3.7 Summary
3.8 Digging Deeper
3.9 Problems
4 Sequential Abstraction, But Parallel Implementation
4.1 Sequential Computation Abstraction
4.1.1 Sequential Programs
4.1.2 Instruction-Level Parallelism: Pipelining and More
4.1.3 Data Dependence and the Illusion of Sequence
4.2 The Illusion of Sequence: Renaming and Out-of-Order Execution
4.2.1 Variable and Register Renaming
4.2.2 Implementing Register Renaming: The Reorder Buffer
4.2.3 Limits of Out-of-Order Execution
4.3 Illusion of Causality: Speculative Execution
4.3.1 Branch Prediction
4.3.2 Speculative Execution
4.3.3 Accurate Branch Predictors
4.3.4 Security Risks of Speculation: Spectre and Meltdown
4.4 Summary
4.5 Digging Deeper
4.6 Problems
5 Memories: Exploiting Dynamic Locality
5.1 Memory Technologies, Miniaturization, and Growing Capacity
5.2 Software and Applications Demand Memory Capacity
5.3 Memory System Challenges: The Memory Wall
5.4 Memory Latency
5.4.1 Warping Space–Time (Caches)
5.4.2 Dynamic Locality in Programs
5.4.3 Address Filters (Caches)
5.4.4 The Effectiveness of Filters (Caches)
5.4.5 Implementing Caches (Warping and Filtering)
5.4.6 Recursive Filtering (Multi-level Caches)
5.4.7 Modeling Average Memory Hierarchy Performance
5.5 Why Caches Work so Well and Programming for Locality
5.6 Measuring Application Dynamic Locality and Modeling Performance
5.6.1 Measuring Dynamic Locality: Reuse Distance
5.6.2 Reuse Distance and Dynamic Locality
5.6.3 Modeling an Application's Memory Performance Using Reuse Distance
5.6.4 Tuning a Program for Dynamic Locality
5.7 Access Rate and Parallel Memory Systems
5.8 Summary
5.9 Digging Deeper
5.10 Problems
6 The General Purpose Computer
6.1 A Commercial Processor: Intel Skylake
6.2 A Commercial Memory Hierarchy: Intel Skylake
6.2.1 Caches and Power
6.3 CPUs Are General Purpose Computers
6.4 Perspective: Mathematical Universality and Complexity
6.5 Summary
6.6 Digging Deeper
6.7 Problems
7 Beyond Sequential: Parallelism in MultiCore and the Cloud
7.1 The End of Dennard Scaling and the Shift to Parallelism
7.2 Parallel Single-Chip Computers: Multicore CPUs
7.2.1 Example: AMD Ryzen Multicore Chip and System
7.3 Programming Multicore Computers: OpenMP and pthreads
7.3.1 OpenMP: Pragma-Based Parallelism
7.3.2 pthreads: Explicit Thread-Parallelism
7.3.3 Challenging Parallelism in a Single Multicore CPU
7.3.4 Simpler Use of Multicore: Libraries and Servers
7.4 Million-Way Parallelism: Supercomputers and the Cloud
7.5 Efficient Parallelism: Computation Grain Size
7.6 Programming Cloud Computers: Coarse-Grained Parallelism
7.6.1 Three-Tier Web: Scalable Web Services
7.6.2 Scale-Out Map–Reduce (Hadoop and Spark)
7.6.3 Microservices: Modular Reliability and Evolution
7.6.4 Serverless (Function-as-a-Service)
7.7 Summary
7.8 Digging Deeper
7.9 Problems
8 Accelerators: Customized Architectures for Performance
8.1 The Emergence of Accelerators
8.1.1 Accelerator Hardware Opportunities
8.1.2 Programming and Software Challenges
8.2 Parallelism Accelerators
8.2.1 The Architecture of GPUs
8.2.2 Diverse GPUs and Performance
8.3 Machine Learning Accelerators
8.3.1 Google’s Tensor Processing Unit
8.3.2 Cerebras CS-2: A Wafer-Scale Machine Learning Accelerator
8.3.3 Small Machine Learning Accelerators (Edge)
8.4 Other Opportunities for Acceleration
8.5 Limitations and Drawbacks of Accelerated Computing
8.6 Summary
8.7 Digging Deeper
8.8 Problems
9 Computing Performance: Past, Present, and Future
9.1 Historical Computer Performance
9.2 Future Computer Performance: Opportunities for Performance Increase
9.2.1 Hardware Scaling and Opportunities
9.2.2 Resulting Programming and Software Challenges
9.3 New Computing Models
9.3.1 Higher-Level Architecture
9.3.2 Quantum Computing
9.3.3 Neuromorphic Computing
9.4 Summary
9.5 Digging Deeper
9.6 Problems
Appendix RISC-V Instruction Set Reference Card
References
Index