This book is an instructional text that will teach you how to code x86-64 assembly language functions. It also explains how you can exploit the SIMD capabilities of an x86-64 processor using x86-64 assembly language and the AVX, AVX2, and AVX-512 instruction sets.
This updated edition’s content and organization are designed to help you quickly understand x86-64 assembly language programming and the unique computational capabilities of x86 processors. The source code is structured to accelerate learning and comprehension of essential x86-64 assembly language programming constructs and data structures. Modern X86 Assembly Language Programming, Third Edition includes source code for both Windows and Linux. The source code elucidates current x86-64 assembly language programming practices, run-time calling conventions, and the latest generation of software development tools.
What You Will Learn- Understand important details of the x86-64 processor platform, including its core architecture, data types, registers, memory addressing modes, and the basic instruction set
- Use the x86-64 instruction set to create assembly language functions that are callable from C++
- Create assembly language code for both Windows and Linux using modern software development tools including MASM (Windows) and NASM (Linux)
- Employ x86-64 assembly language to efficiently manipulate common data types and programming constructs including integers, text strings, arrays, matrices, and user-defined structures
- Explore indispensable elements of x86 SIMD architectures, register sets, and data types.
- Master x86 SIMD arithmetic and data operations using both integer and floating-point operands
- Harness the AVX, AVX2, and AVX-512 instruction sets to accelerate the performance of computationally-intense calculations in machine learning, image processing, signal processing, computer graphics, statistics, and matrix arithmetic applications
- Apply leading-edge coding strategies to optimally exploit the AVX, AVX2, and AVX-512 instruction sets for maximum possible performance
Who This Book Is ForSoftware developers who are creating programs for x86 platforms and want to learn how to code performance-enhanced algorithms using the core x86-64 instruction set; developers who need to learn how to write SIMD functions or accelerate the performance of existing code using the AVX, AVX2, and AVX-512 instruction sets; and computer science/engineering students or hobbyists who want to learn or better understand x86-64 assembly language programming and the AVX, AVX2, and AVX-512 instruction sets.
Author(s): Daniel Kusswurm
Edition: 3
Publisher: Apress
Year: 2023
Language: English
Commentary: Publisher PDF
Pages: 700
City: Berkeley, CA
Tags: X86 Assembly Programming; 32-bit; 64-bit; AVX; SIMD; AVX-512; Algorithm Analysis; Problem Complexity
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: X86-64 Core Architecture
Historical Overview
Data Types
Fundamental Data Types
Numerical Data Types
SIMD Data Types
Miscellaneous Data Types
X86-64 Processor Architecture
General-Purpose Registers
Instruction Pointer
RFLAGS Register
Floating-Point and SIMD Registers
MXCSR Register
Instruction Operands
Memory Addressing
Condition Codes
Differences Between X86-64 and X86-32
Legacy Instruction Sets
Summary
Chapter 2: X86-64 Core Programming – Part 1
Source Code Overview
Assembler Basics
Integer Arithmetic
Integer Addition and Subtraction – 32-Bit
Bitwise Logical Operations
Shift Operations
Integer Addition and Subtraction – 64-bit
Integer Multiplication and Division
Summary
Chapter 3: X86-64 Core Programming – Part 2
Simple Stack Arguments
Mixed-Type Integer Arithmetic
Memory Addressing Modes
Condition Codes
Assembly Language For-Loops
Summary
Chapter 4: X86-64 Core Programming – Part 3
Arrays
One-Dimensional Arrays
Multiple One-Dimensional Arrays
Two-Dimensional Arrays
Strings
Counting Characters
Array Compare
Array Copy and Fill
Array Reversal
Assembly Language Structures
Summary
Chapter 5: AVX Programming – Scalar Floating-Point
Floating-Point Programming Concepts
Scalar Floating-Point Registers
Single-Precision Floating-Point Arithmetic
Temperature Conversions
Cone Volume/Surface Area Calculation
Double-Precision Floating-Point Arithmetic
Floating-Point Comparisons and Conversions
Floating-Point Comparisons
Floating-Point Conversions
Floating-Point Arrays
Summary
Chapter 6: Run-Time Calling Conventions
Calling Convention Overview
Calling Convention Requirements for Visual C++
Stack Frames
Using Non-volatile General-Purpose Registers
Using Non-volatile XMM Registers
Calling External Functions
Calling Convention Requirements for GNU C++
Stack Arguments
Using Non-volatile General-Purpose Registers and Stack Frames
Calling External Functions
Summary
Chapter 7: Introduction to X86-AVX SIMD Programming
SIMD Programming Concepts
What Is SIMD?
SIMD Integer Arithmetic
Wraparound vs. Saturated Arithmetic
SIMD Floating-Point Arithmetic
SIMD Data Manipulation Operations
X86-AVX Overview
AVX/AVX2 SIMD Architecture Overview
SIMD Registers
SIMD Data Types
Instruction Syntax
Differences Between X86-SSE and X86-AVX
Summary
Chapter 8: AVX Programming – Packed Integers
Integer Arithmetic
Addition and Subtraction
Multiplication
Bitwise Logical Operations
Arithmetic and Logical Shifts
Integer Arrays
Pixel Minimum and Maximum
Pixel Mean
Summary
Chapter 9: AVX Programming – Packed Floating-Point
Packed Floating-Point Arithmetic
Elementary Operations
Packed Comparisons
Packed Conversions
Packed Floating-Point Arithmetic – Arrays
Mean and Standard Deviation
Distance Calculations
Packed Floating-Point Arithmetic – Matrices
Column Means
Summary
Chapter 10: AVX2 Programming – Packed Integers
Integer Arithmetic
Elementary Operations
Size Promotions
Image Processing
Pixel Clipping
Benchmarking
RGB to Grayscale
Pixel Conversions
Image Histogram
Summary
Chapter 11: AVX2 Programming – Packed Floating-Point – Part 1
Floating-Point Arrays
Least Squares
Floating-Point Matrices
Matrix Multiplication
Single-Precision
Double-Precision
Matrix (4 × 4) Multiplication
Single-Precision
Double-Precision
Matrix (4 × 4) Vector Multiplication
Single-Precision
Double-Precision
Covariance Matrix
Summary
Chapter 12: AVX2 Programming – Packed Floating-Point – Part 2
Matrix Inversion
Single-Precision
Double-Precision
Signal Processing – Convolutions
1D Convolution Arithmetic
1D Convolution Using Variable-Size Kernel
Single-Precision
Double-Precision
1D Convolution Using Fixed-Size Kernel
Single-Precision
Double-Precision
Summary
Chapter 13: AVX-512 Programming – Packed Integers
AVX-512 Overview
Execution Environment
Merge Masking and Zero Masking
Embedded Broadcasts
Instruction-Level Rounding
Integer Arithmetic
Elementary Operations
Masked Operations
Image Processing
Image Thresholding
Image Statistics
Image Histogram
Summary
Chapter 14: AVX-512 Programming – Packed Floating-Point – Part 1
Floating-Point Arithmetic
Elementary Operations
Packed Comparisons
Floating-Point Arrays
Floating-Point Matrices
Covariance Matrix
Matrix Multiplication
Single-Precision
Double-Precision
Matrix (4 × 4) Vector Multiplication
Single-Precision
Double-Precision
Summary
Chapter 15: AVX-512 Programming – Packed Floating-Point – Part 2
Signal Processing
1D Convolution Using Variable-Size Kernel
Single-Precision
Double-Precision
1D Convolution Using Fixed-Size Kernel
Single-Precision
Double-Precision
Summary
Chapter 16: Advanced Assembly Language Programming
CPUID Instruction
Processor Vendor Information
X86-AVX Detection
Non-temporal Memory Stores
Integer Non-temporal Memory Stores
Floating-Point Non-temporal Memory Stores
SIMD Text Processing
Summary
Chapter 17: Assembly Language Optimization and Development Guidelines
Assembly Language Optimization Guidelines
Basic Techniques
Floating-Point Arithmetic
Branch Instructions
Branch Prediction Unit
Data Alignment
SIMD Techniques
Assembly Language Development Guidelines
Identify Functions for x86-64 Assembly Language Coding
Select Target x86-AVX Instruction Set
Establish Benchmark Timing Objectives
Code x86-64 Assembly Language Functions
Benchmark x86-64 Assembly Language Code
Optimize x86-64 Assembly Language Code
Repeat Benchmarking and Optimization Steps
Summary
Appendix A: Source Code and Development Tools
Source Code Download and Setup
Source Code Development Tools
Windows Development Tools
Running a Source Code Example
Creating a Visual Studio C++ Project
Create a C++ Project
Add an Assembly Language File
Set Project Properties
Edit the Source Code
Build and Run the Project
Linux Development Tools
Additional Configuration
Build and Run
Make Utility
Appendix B: References and Resources
X86 Assembly Language Programming References
Algorithm References
C++ References
Software Development Tools
Miscellaneous Utilities, Tools, and Libraries
Index