Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The growing popularity of multimedia extensions to general-purpose microprocessors has renewed the interest in vectorizing compilers. This book provides a detailed overview of compiler optimizations that convert sequential code into a form that exploits multimedia extensions. The primary focus is on the C programming language and multimedia extensions to the Intel® Architecture, although most conversion methods are easily generalized to other imperative programming languages and multimedia instruction sets. The presented optimizations are available in the high performance Intel C++/Fortran compilers that support automatic vectorization for the Intel MMX™ technology and Streaming SIMD Extensions (SSE). As such, the book has been written for those with an interest in improving software performance by means of multimedia extensions, such as compiler engineers and programmers of scientific, engineering, and multimedia applications.

Author(s): Aart J.C. Bik
Edition: 1st
Publisher: Intel Press
Year: 2006

Language: English
Pages: 236

Preface 10
1. Introduction 12
Architectural Acceleration Mechanisms 12
Pipelining and Replication 13
Speedup 14
Quick Tour of Parallel Architectures 15
Data Parallel Architectures 15
Instruction-Level Parallel Architectures 17
Process-Level Parallel Architectures 18
Multimedia Extensions 19
MMX™ Technology 20
Streaming-SIMD Extensions 22
Intra-Register Vectorization 24
2. Instruction Set Preliminaries 28
Instruction Set Summary 28
Instruction Format 29
Packed Data Elements 29
Data Movement Instructions 30
Arithmetic Instructions 36
Logical Instructions 39
Comparison Instructions 40
Conversion Instructions 41
Shift Instructions 43
Shuffle Instructions 44
Unpack Instructions 45
Cacheability Control and Prefetch Instructions 46
State Management Instructions 47
The Intel NetBurst® Microarchitecture 48
Execution Logic 48
Memory Hierarchy 50
3. Language Preliminaries 52
The C Programming Language 52
Data Types 53
Expressions 55
Statements 57
Loop and Idiom Recognition 58
Well-Behaved Loops 59
Idiom Recognition 61
4. Data Dependence Theory 64
Data Dependences 64
Data Dependence Definitions 64
Data Dependence Terminology 66
Data Dependence Graphs 67
Data Dependence Analysis 68
Data Dependence Problems 68
Data Dependence Solvers 70
Hierarchical Data Dependence Analysis 72
Improving Data Dependence Analysis 74
Compiler Hints for Data Dependences 75
Aliasing Analysis 75
Dynamic Data Dependence Analysis 76
5. Vectorization Essentials 80
Validity of Vectorization 80
Preserving Data Dependences 81
Preserving Integer Precision 84
Preserving Floating-Point Precision 86
Vector Code Generation 88
General Framework 88
Vector Data Type Selection 90
Unit-Stride Memory References 91
Rotating Read-Only Memory References 92
Non-Unit-Stride Memory References 93
Scalar Memory References 94
Operators 108
MIN, MAX, and ABS Operators 110
Type Conversions 113
Mathematical Functions 117
Conditional Statements 121
6. Alignment Optimizations 130
Intraprocedural Alignment Optimizations 131
Memory Allocation and Data Layout 131
Intraprocedural Alignment Analysis 132
Cache Line Split Optimizations 134
Interprocedural Alignment Optimizations 140
Interprocedural Alignment Analysis 140
Exploiting Interprocedural Alignment Information 145
Improving Alignment Optimizations 146
Compiler Hints for Alignment 146
Multi-Version Code 146
Dynamic Loop Peeling 149
7. Supplemental Optimizations 156
Idiom Recognition 156
Conversion Idioms 157
Arithmetic Idioms 158
Reduction Idioms 160
Saturation Idioms 161
Search Loops 171
Complex Data 174
Complex Numbers 174
Single-Precision Complex Data Types 175
Double-Precision Complex Data Types 179
Memory Hierarchy Optimizations 180
High-Level Optimizations 180
Vector Register Reuse 182
Low-Level Optimizations 185
8. Vectorization Beyond Loops 188
Loop Materialization 189
Rollable Statements and Expressions 189
Loop Materialization and Collapsing 191
Inexpensive Loop Materialization 192
Improved Loop Materialization 194
Performance Considerations 196
Low Trip-Count Loops 196
High Trip-Count Loops 199
9. Vectorization with the Intel Compilers 204
Vectorization Overview 205
Compiler Switches 205
Profile-Guided Optimization 209
Compiler Hints 210
Vectorization Guidelines 214
Design and Implementation Considerations 215
Focus of Optimization 219
Diagnostics-Guided Optimization 219
Final Remarks 224
Some More Experiments 224
Future Trends in Multimedia Extensions 227
References 230
Index 242