Learn the fundamentals of x86 Single instruction multiple data (SIMD) programming using C++ intrinsic functions and x86-64 assembly language. This book emphasizes x86 SIMD programming topics and technologies that are relevant to modern software development in applications which can exploit data level parallelism, important for the processing of big data, large batches of data and related important in data science and much more.
Modern Parallel Programming with C++ and Assembly Language is an instructional text that explains x86 SIMD programming using both C++ and assembly language. The book’s content and organization are designed to help you quickly understand and exploit the SIMD capabilities of x86 processors. It also contains an abundance of source code that is structured to accelerate learning and comprehension of essential SIMD programming concepts and algorithms.
After reading this book, you will be able to code performance-optimized AVX, AVX2, and AVX-512 algorithms using either C++ intrinsic functions or x86-64 assembly language.
What You Will Learn
- Understand the essential details about x86 SIMD architectures and instruction sets including AVX, AVX2, and AVX-512.
- Master x86 SIMD data types, arithmetic instructions, and data management operations using both integer and floating-point operands.
- Code performance-enhancing functions and algorithms that fully exploit the SIMD capabilities of a modern x86 processor.
- Employ C++ intrinsic functions and x86-64 assembly language code to carry out arithmetic calculations using common programming constructs including arrays, matrices, and user-defined data structures.
- Harness the x86 SIMD instruction sets to significantly accelerate the performance of computationally intense algorithms in applications such as machine learning, image processing, computer graphics, statistics, and matrix arithmetic.
- Apply leading-edge coding strategies and techniques to optimally exploit the x86 SIMD instruction sets for maximum possible performance.
Who This Book Is For
Intermediate to advanced programmers/developers in general. Readers of this book should have previous programming experience with modern C++ (i.e., ANSI C++11 or later) and Assembly. Some familiarity with Microsoft’s Visual Studio or the GNU toolchain will be helpful. The target audience for Modern X86 SIMD Programming are experienced software developers, programmers and maybe some hobbyists.
Author(s): Daniel Kusswurm
Edition: 1
Publisher: Apress
Year: 2022
Language: English
Pages: 653
Tags: Parallel Programming; x86; x86-64; SIMD; C++; Assembly; AVX; AVX2; AVX-512
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: SIMD Fundamentals
What Is SIMD?
Historical Overview of x86 SIMD
SIMD Data Types
SIMD Arithmetic
SIMD Integer Arithmetic
Wraparound vs. Saturated Arithmetic
SIMD Floating-Point Arithmetic
SIMD Data Manipulation Operations
SIMD Programming
Summary
Chapter 2: AVX C++ Programming: Part 1
Integer Arithmetic
Integer Addition
Integer Subtraction
Integer Multiplication
Integer Bitwise Logical and Shift Operations
Bitwise Logical Operations
Shift Operations
C++ SIMD Intrinsic Function Naming Conventions
Image Processing Algorithms
Pixel Minimum and Maximum
Pixel Mean Intensity
Summary
Chapter 3: AVX C++ Programming: Part 2
Floating-Point Operations
Floating-Point Arithmetic
Floating-Point Compares
Floating-Point Conversions
Floating-Point Arrays
Mean and Standard Deviation
Distance Calculations
Floating-Point Matrices
Column Means
Summary
Chapter 4: AVX2 C++ Programming: Part 1
Integer Arithmetic
Addition and Subtraction
Unpacking and Packing
Size Promotions
Image Processing
Pixel Clipping
RGB to Grayscale
Thresholding
Pixel Conversions
Summary
Chapter 5: AVX2 C++ Programming: Part 2
Floating-Point Arrays
Least Squares
Floating-Point Matrices
Matrix Multiplication
Matrix (4 × 4) Multiplication
Matrix (4 × 4) Vector Multiplication
Matrix Inverse
Summary
Chapter 6: AVX2 C++ Programming: Part 3
Convolution Primer
Convolution Math: 1D
Convolution Math: 2D
1D Convolutions
2D Convolutions
Nonseparable Kernel
Separable Kernel
Summary
Chapter 7: AVX-512 C++ Programming: Part 1
AVX-512 Overview
Integer Arithmetic
Basic Arithmetic
Merge Masking and Zero Masking
Image Processing
RGB to Grayscale
Image Thresholding
Image Statistics
Summary
Chapter 8: AVX-512 C++ Programming: Part 2
Floating-Point Arithmetic
Basic Arithmetic
Compare Operations
Floating-Point Arrays
Floating-Point Matrices
Covariance Matrix
Matrix Multiplication
Matrix (4 x 4) Vector Multiplication
Convolutions
1D Convolutions
2D Convolutions
Summary
Chapter 9: Supplemental C++ SIMD Programming
Using CPUID
Short Vector Math Library
Rectangular to Polar Coordinates
Body Surface Area
Summary
Chapter 10: X86-64 Processor Architecture
Data Types
Fundamental Data Types
Numerical Data Types
SIMD Data Types
Strings
Internal Architecture
General-Purpose Registers
Instruction Pointer
RFLAGS Register
Floating-Point and SIMD Registers
MXCSR Register
Instruction Operands
Memory Addressing
Condition Codes
Summary
Chapter 11: Core Assembly Language Programming: Part 1
Integer Arithmetic
Addition and Subtraction
Multiplication
Division
Calling Convention: Part 1
Memory Addressing Modes
For-Loops
Condition Codes
Strings
Summary
Chapter 12: Core Assembly Language Programming: Part 2
Scalar Floating-Point Arithmetic
Single-Precision Arithmetic
Double-Precision Arithmetic
Compares
Conversions
Scalar Floating-Point Arrays
Calling Convention: Part 2
Stack Frames
Using Nonvolatile General-Purpose Registers
Using Nonvolatile SIMD Registers
Macros for Function Prologues and Epilogues
Summary
Chapter 13: AVX Assembly Language Programming: Part 1
Integer Arithmetic
Addition and Subtraction
Multiplication
Bitwise Logical Operations
Arithmetic and Logical Shifts
Image Processing Algorithms
Pixel Minimum and Maximum
Pixel Mean Intensity
Summary
Chapter 14: AVX Assembly Language Programming: Part 2
Floating-Point Operations
Floating-Point Arithmetic
Floating-Point Compares
Floating-Point Arrays
Mean and Standard Deviation
Distance Calculations
Floating-Point Matrices
Summary
Chapter 15: AVX2 Assembly Language Programming: Part 1
Integer Arithmetic
Basic Operations
Size Promotions
Image Processing
Pixel Clipping
RGB to Grayscale
Pixel Conversions
Summary
Chapter 16: AVX2 Assembly Language Programming: Part 2
Floating-Point Arrays
Floating-Point Matrices
Matrix Multiplication
Matrix (4 × 4) Multiplication
Matrix (4 × 4) Vector Multiplication
Signal Processing
Summary
Chapter 17: AVX-512 Assembly Language Programming: Part 1
Integer Arithmetic
Basic Operations
Masked Operations
Image Processing
Image Thresholding
Image Statistics
Summary
Chapter 18: AVX-512 Assembly Language Programming: Part 2
Floating-Point Arithmetic
Basic Arithmetic
Compare Operations
Floating-Point Matrices
Covariance Matrix
Matrix Multiplication
Matrix (4 x 4) Vector Multiplication
Signal Processing
Summary
Chapter 19: SIMD Usage and Optimization Guidelines
SIMD Usage Guidelines
C++ SIMD Intrinsic Functions or x86 Assembly Language
SIMD Software Development Guidelines
Identify Functions for SIMD Techniques
Select Default and Explicit SIMD Instruction Sets
Establish Benchmark Timing Objectives
Code Explicit SIMD Functions
Benchmark Code to Measure Performance
Optimize Explicit SIMD Code
Repeat Benchmarking and Optimization Steps
Optimization Guidelines and Techniques
General Techniques
Assembly Language Optimization Techniques
SIMD Code Complexity vs. Performance
Summary
Appendix A:
Source Code and Development Tools
Source Code Download and Setup
Development Tools
Visual Studio and Windows
Running a Source Code Example
Creating a Visual Studio C++ Project
Create a C++ Project
Add an Assembly Language File
Set Project Properties
Edit the Source Code
Build and Run the Project
GCC and Linux
Additional Configuration
Build and Run
Make Utility
Appendix B:
References and Resources
C++ SIMD Intrinsic Function Documentation
X86 Programming References
X86 Processor Information
Software Development Tools
Algorithm References
C++ References
Utilities, Tools, and Libraries
Index