This book is an instructional text that will teach you how to code x86-64 assembly language functions. It also explains how you can exploit the SIMD capabilities of an x86-64 processor using x86-64 assembly language and the AVX, AVX2, and AVX-512 instruction sets.
This updated edition’s content and organization are designed to help you quickly understand x86-64 assembly language programming and the unique computational capabilities of x86 processors. The source code is structured to accelerate learning and comprehension of essential x86-64 assembly language programming constructs and data structures. Modern X86 Assembly Language Programming, Third Edition includes source code for both Windows and Linux. The source code elucidates current x86-64 assembly language programming practices, run-time calling conventions, and the latest generation of software development tools.
While it is still theoretically possible to write large sections or an entire application program using assembly language, the demanding requirements of contemporary software development mean that such an approach is both impractical and ill-advised. Instead, this book accentuates the coding x86-64 assembly language functions that are callable from C++. The downloadable software package for this book includes source code that works on both Windows (Visual C++ and MASM) and Linux (GNU C++ and NASM).
Before proceeding, it warrants mentioning that this edition of the Modern X86 Assembly Language Programming book doesn’t cover x86-32 assembly language programming. It also doesn’t discuss legacy x86 technologies such as the x87 floating-point unit, MMX, and X86-SSE (Streaming SIMD Extensions). The first edition of this text remains relevant if you’re interested in learning about these topics. This book doesn’t explain x86 architectural features or privileged instructions that are used in operating systems and device drivers. However, if your goal is to develop x86-64 assembly language code for these use cases, you’ll need to thoroughly comprehend the material that’s presented in this book.
What You Will Learn:
Understand important details of the x86-64 processor platform, including its core architecture, data types, registers, memory addressing modes, and the basic instruction set
Use the x86-64 instruction set to create assembly language functions that are callable from C++
Create assembly language code for both Windows and Linux using modern software development tools including MASM (Windows) and NASM (Linux)
Employ x86-64 assembly language to efficiently manipulate common data types and programming constructs including integers, text strings, arrays, matrices, and user-defined structures
Explore indispensable elements of x86 SIMD architectures, register sets, and data types.
Master x86 SIMD arithmetic and data operations using both integer and floating-point operands
Harness the AVX, AVX2, and AVX-512 instruction sets to accelerate the performance of computationally-intense calculations in machine learning, image processing, signal processing, computer graphics, statistics, and matrix arithmetic applications
Apply leading-edge coding strategies to optimally exploit the AVX, AVX2, and AVX-512 instruction sets for maximum possible performance
Who This Book Is For:
Software developers who are creating programs for x86 platforms and want to learn how to code performance-enhanced algorithms using the core x86-64 instruction set; developers who need to learn how to write SIMD functions or accelerate the performance of existing code using the AVX, AVX2, and AVX-512 instruction sets; and computer science/engineering students or hobbyists who want to learn or better understand x86-64 assembly language programming and the AVX, AVX2, and AVX-512 instruction sets.
Author(s): Daniel Kusswurm
Edition: 3
Publisher: Apress
Year: 2023
Language: English
Pages: 688
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: X86-64 Core Architecture
Historical Overview
Data Types
Fundamental Data Types
Numerical Data Types
SIMD Data Types
Miscellaneous Data Types
X86-64 Processor Architecture
General-Purpose Registers
Instruction Pointer
RFLAGS Register
Floating-Point and SIMD Registers
MXCSR Register
Instruction Operands
Memory Addressing
Condition Codes
Differences Between X86-64 and X86-32
Legacy Instruction Sets
Summary
Chapter 2: X86-64 Core Programming – Part 1
Source Code Overview
Assembler Basics
Integer Arithmetic
Integer Addition and Subtraction – 32-Bit
Bitwise Logical Operations
Shift Operations
Integer Addition and Subtraction – 64-bit
Integer Multiplication and Division
Summary
Chapter 3: X86-64 Core Programming – Part 2
Simple Stack Arguments
Mixed-Type Integer Arithmetic
Memory Addressing Modes
Condition Codes
Assembly Language For-Loops
Summary
Chapter 4: X86-64 Core Programming – Part 3
Arrays
One-Dimensional Arrays
Multiple One-Dimensional Arrays
Two-Dimensional Arrays
Strings
Counting Characters
Array Compare
Array Copy and Fill
Array Reversal
Assembly Language Structures
Summary
Chapter 5: AVX Programming – Scalar Floating-Point
Floating-Point Programming Concepts
Scalar Floating-Point Registers
Single-Precision Floating-Point Arithmetic
Temperature Conversions
Cone Volume/Surface Area Calculation
Double-Precision Floating-Point Arithmetic
Floating-Point Comparisons and Conversions
Floating-Point Comparisons
Floating-Point Conversions
Floating-Point Arrays
Summary
Chapter 6: Run-Time Calling Conventions
Calling Convention Overview
Calling Convention Requirements for Visual C++
Stack Frames
Using Non-volatile General-Purpose Registers
Using Non-volatile XMM Registers
Calling External Functions
Calling Convention Requirements for GNU C++
Stack Arguments
Using Non-volatile General-Purpose Registers and Stack Frames
Calling External Functions
Summary
Untitled
Chapter 7: Introduction to X86-AVX SIMD Programming
SIMD Programming Concepts
What Is SIMD?
SIMD Integer Arithmetic
Wraparound vs. Saturated Arithmetic
SIMD Floating-Point Arithmetic
SIMD Data Manipulation Operations
X86-AVX Overview
AVX/AVX2 SIMD Architecture Overview
SIMD Registers
SIMD Data Types
Instruction Syntax
Differences Between X86-SSE and X86-AVX
Summary
Chapter 8: AVX Programming – Packed Integers
Integer Arithmetic
Addition and Subtraction
Multiplication
Bitwise Logical Operations
Arithmetic and Logical Shifts
Integer Arrays
Pixel Minimum and Maximum
Pixel Mean
Summary
Chapter 9: AVX Programming – Packed Floating-Point
Packed Floating-Point Arithmetic
Elementary Operations
Packed Comparisons
Packed Conversions
Packed Floating-Point Arithmetic – Arrays
Mean and Standard Deviation
Distance Calculations
Packed Floating-Point Arithmetic – Matrices
Column Means
Summary
Chapter 10: AVX2 Programming – Packed Integers
Integer Arithmetic
Elementary Operations
Size Promotions
Image Processing
Pixel Clipping
Benchmarking
RGB to Grayscale
Pixel Conversions
Image Histogram
Summary
Chapter 11: AVX2 Programming – Packed Floating-Point – Part 1
Floating-Point Arrays
Least Squares
Floating-Point Matrices
Matrix Multiplication
Single-Precision
Double-Precision
Matrix (4 × 4) Multiplication
Single-Precision
Double-Precision
Matrix (4 × 4) Vector Multiplication
Single-Precision
Double-Precision
Covariance Matrix
Summary
Chapter 12: AVX2 Programming – Packed Floating-Point – Part 2
Matrix Inversion
Single-Precision
Double-Precision
Signal Processing – Convolutions
1D Convolution Arithmetic
1D Convolution Using Variable-Size Kernel
Single-Precision
Double-Precision
1D Convolution Using Fixed-Size Kernel
Single-Precision
Double-Precision
Summary
Chapter 13: AVX-512 Programming – Packed Integers
AVX-512 Overview
Execution Environment
Merge Masking and Zero Masking
Embedded Broadcasts
Instruction-Level Rounding
Integer Arithmetic
Elementary Operations
Masked Operations
Image Processing
Image Thresholding
Image Statistics
Image Histogram
Summary
Chapter 14: AVX-512 Programming – Packed Floating-Point – Part 1
Floating-Point Arithmetic
Elementary Operations
Packed Comparisons
Floating-Point Arrays
Floating-Point Matrices
Covariance Matrix
Matrix Multiplication
Single-Precision
Double-Precision
Matrix (4 × 4) Vector Multiplication
Single-Precision
Double-Precision
Summary
Chapter 15: AVX-512 Programming – Packed Floating-Point – Part 2
Signal Processing
1D Convolution Using Variable-Size Kernel
Single-Precision
Double-Precision
1D Convolution Using Fixed-Size Kernel
Single-Precision
Double-Precision
Summary
Chapter 16: Advanced Assembly Language Programming
CPUID Instruction
Processor Vendor Information
X86-AVX Detection
Non-temporal Memory Stores
Integer Non-temporal Memory Stores
Floating-Point Non-temporal Memory Stores
SIMD Text Processing
Summary
Chapter 17: Assembly Language Optimization and Development Guidelines
Assembly Language Optimization Guidelines
Basic Techniques
Floating-Point Arithmetic
Branch Instructions
Branch Prediction Unit
Data Alignment
SIMD Techniques
Assembly Language Development Guidelines
Identify Functions for x86-64 Assembly Language Coding
Select Target x86-AVX Instruction Set
Establish Benchmark Timing Objectives
Code x86-64 Assembly Language Functions
Benchmark x86-64 Assembly Language Code
Optimize x86-64 Assembly Language Code
Repeat Benchmarking and Optimization Steps
Summary
Appendix A: Source Code and Development Tools
Source Code Download and Setup
Source Code Development Tools
Windows Development Tools
Running a Source Code Example
Creating a Visual Studio C++ Project
Create a C++ Project
Add an Assembly Language File
Set Project Properties
Edit the Source Code
Build and Run the Project
Linux Development Tools
Additional Configuration
Build and Run
Make Utility
Appendix B: References and Resources
X86 Assembly Language Programming References
Algorithm References
C++ References
Software Development Tools
Miscellaneous Utilities, Tools, and Libraries
Index
df-Capture.PNG