Using OpenCL: Programming Massively Parallel Computers

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

In 2011 many computer users were exploring the opportunities and the benefits of the massive parallelism offered by heterogeneous computing. In 2000 the Khronos Group, a not-for-profit industry consortium, was founded to create standard open APIs for parallel computing, graphics and dynamic media. Among them has been OpenCL, an open system for programming heterogeneous computers with components made by multiple manufacturers. This publication explains how heterogeneous computers work and how to program them using OpenCL. It also describes how to combine OpenCL with OpenGL for displaying graphical effects in real time. Chapter 1 describes briefly two older de facto standard and highly successful parallel programming systems: MPI and OpenMP. Collectively, the MPI, OpenMP, and OpenCL systems cover programming of all major parallel architectures: clusters, shared-memory computers, and the newest heterogeneous computers. Chapter 2, the technical core of the book, deals with OpenCL fundamentals: programming, hardware, and the interaction between them. Chapter 3 adds important information about such advanced issues as double-versus-single arithmetic precision, efficiency, memory use, and debugging. Chapters 2 and 3 contain several examples of code and one case study on genetic algorithms. These examples are related to linear algebra operations, which are very common in scientific, industrial, and business applications. Most of the books examples can be found on the enclosed CD, which also contains basic projects for Visual Studio, MinGW, and GCC. This supplementary material will assist the reader in getting a quick start on OpenCL projects.IOS Press is an international science, technical and medical publisher of high-quality books for academics, scientists, and professionals in all fields. Some of the areas we publish in: -Biomedicine -Oncology -Artificial intelligence -Databases and information systems -Maritime engineering -Nanotechnology -Geoengineering -All aspects of physics -E-governance -E-commerce -The knowledge economy -Urban studies -Arms control -Understanding and responding to terrorism -Medical informatics -Computer Sciences

Author(s): J. Kowalik, T. Puzniakowski
Series: Advances in Parallel Computing 21
Edition: Har/Cdr
Publisher: IOS Press
Year: 2012

Language: English
Pages: 312

Title Page......Page 1
Preface......Page 7
Contents......Page 9
MPI......Page 15
OpenMP......Page 18
Task Parallelism......Page 23
Example......Page 24
Origins of Using GPU in General Purpose Computing......Page 26
Short History of OpenCL......Page 27
Heterogeneous Computer Memories......Page 28
The Fourth Generation CUDA......Page 29
Host Code......Page 30
Phase c. Creating Command Queues and Kernel Execution......Page 31
Applications of Heterogeneous Computing......Page 32
Conjugate Gradient Method......Page 33
Jacobi Method......Page 35
Monte Carlo Methods......Page 36
Conclusions......Page 37
Algorithm Implementation and Timing Results......Page 38
Conclusions......Page 39
Massive Parallelism Idea......Page 41
OpenCL Execution Model......Page 43
Queues, Events and Context......Page 44
Data Parallelism in OpenCL......Page 45
How to Start Using OpenCL......Page 46
Libraries......Page 47
Platforms and Devices......Page 48
OpenCL Platform Properties......Page 50
Devices Provided by Platform......Page 51
OpenCL Platforms - C++......Page 54
OpenCL Context to Manage Devices......Page 55
CPU Device Type......Page 57
Different Device Types - Summary......Page 58
Context Initialization - by Device Type......Page 59
Context Initialization - Selecting Particular Device......Page 60
Getting Information about Context......Page 61
OpenCL Context to Manage Devices - C++......Page 62
Checking Error Codes......Page 64
Using Exceptions - Available in C++......Page 67
Using Custom Error Messages......Page 68
In-order Command Queue......Page 69
Out-of-order Command Queue......Page 71
Command Queue Control......Page 74
Profiling Using Events - C example......Page 75
Profiling Using Events - C++ example......Page 77
Work-Items and Work-Groups......Page 79
Information About Index Space from a Kernel......Page 80
NDRange Kernel Execution......Page 81
Using Work Offset......Page 84
Different Memory Regions - the Kernel Perspective......Page 85
Relaxed Memory Consistency......Page 87
Global and Constant Memory Allocation - Host Code......Page 89
Memory Transfers - the Host Code......Page 92
Programming and Calling Kernel......Page 93
Loading and Compilation of an OpenCL Program......Page 95
Kernel Invocation and Arguments......Page 102
Supported Scalar Data Types......Page 104
Vector Data Types and Common Functions......Page 106
Synchronization Functions......Page 108
Counting Parallel Sum......Page 110
Parallel Sum - Kernel......Page 111
Parallel Sum - Host Program......Page 114
Structure of the OpenCL Host Program......Page 117
Initialization......Page 118
Preparation of OpenCL Programs......Page 120
Using Binary OpenCL Programs......Page 121
Computation......Page 123
Release of Resources......Page 127
Structure of OpenCL host Programs in C++......Page 128
Preparation of OpenCL Programs......Page 129
Using Binary OpenCL Programs......Page 130
Computation......Page 134
Release of Resources......Page 135
The SAXPY Example......Page 136
The Example SAXPY Application - C Language......Page 137
The example SAXPY application - C++ language......Page 142
Step by Step Conversion of an Ordinary C Program to OpenCL......Page 145
OpenCL Initialization......Page 146
Data Allocation on the Device......Page 148
Sequential Function to OpenCL Kernel......Page 149
Loading and Executing a Kernel......Page 150
Matrix by Vector Multiplication Example......Page 153
The Program Calculating matrix times vector......Page 154
Experiment......Page 156
Conclusions......Page 158
Different Classes of Extensions......Page 161
Detecting Available Extensions from API......Page 162
Using Runtime Extension Functions......Page 163
Using Extensions from OpenCL Program......Page 167
Printf......Page 169
Using GDB......Page 171
Floating Point Arithmetics......Page 176
Arithmetics Precision - Practical Approach......Page 179
Profiling OpenCL Application......Page 186
Using the Internal Profiler......Page 187
Using External Profiler......Page 194
Effective Use of Memories - Memory Access Patterns......Page 197
Matrix Multiplication - Optimization Issues......Page 203
OpenCL and OpenGL......Page 208
Extensions Used......Page 209
Header Files......Page 210
Common Actions......Page 211
OpenGL Initialization......Page 212
OpenCL Initialization......Page 215
Creating Buffer for OpenGL and OpenCL......Page 217
Kernel......Page 223
Generating Effect......Page 227
Running Kernel that Operates on Shared Buffer......Page 229
Results Display......Page 230
Message Handling......Page 232
Cleanup......Page 233
Terminology......Page 235
Genetic Algorithm......Page 236
Genetic Algorithm Implementation Overview......Page 239
OpenCL Program......Page 240
Most Important Elements of Host Code......Page 248
Experiment Results......Page 255
CUDA 4.0 Release and Compatibility......Page 259
CUDA Versions and Device Capability......Page 261
CUDA Runtime API Example......Page 263
CUDA Program Explained......Page 265
Blocks and Threads Indexing Formulas......Page 271
Runtime Error Handling......Page 274
CUDA Driver API Example......Page 276
Clusters and SMP......Page 283
Performance of OpenCL Programs......Page 284
Combining MPI with OpenCL......Page 291
OpenCL Kernel......Page 293
Initialization and Setup......Page 294
Executing Kernel......Page 296
Appendix D
Using Examples Attached to the Book......Page 299
Windows......Page 300
Bibliography and References......Page 303