Julia is a high-level, high-performance dynamic programming language for numerical computing. If you want to understand how to avoid bottlenecks and design your programs for the highest possible performance, then this book is for you. The book starts with how Julia uses type information to achieve its performance goals, and how to use multiple dispatches to help the compiler emit high-performance machine code. After that, you will learn how to analyze Julia programs and identify issues with time and memory consumption. We teach you how to use Julia's typing facilities accurately to write high-performance code and describe how the Julia compiler uses type information to create fast machine code. Moving ahead, you'll master design constraints and learn how to use the power of the GPU in your Julia code and compile Julia code directly to the GPU. Then, you'll learn how tasks and asynchronous IO help you create responsive programs and how to use shared memory multithreading in Julia. Toward the end, you will get a flavor of Julia's distributed computing capabilities and how to run Julia programs on a large distributed cluster. By the end of this book, you will have the ability to build large-scale, high-performance Julia applications, design systems with a focus on speed, and improve the performance of existing programs.
Author(s): Avik Sengupta
Edition: 2
Publisher: Packt
Year: 2019
Language: English
Pages: 218
Cover
Title Page
Copyright and Credits
Dedication
About Packt
Foreword
Contributors
Table of Contents
Preface
Chapter 1: Julia is Fast
Julia – fast and dynamic
Designed for speed
JIT and LLVM
Types, type inference, and code specialization
How fast can Julia be?
Summary
Chapter 2: Analyzing Performance
Timing Julia functions
The @time macro
Other time macros
The Julia profiler
Using the profiler
ProfileView
Using Juno for profiling
Using TimerOutputs
Analyzing memory allocation
Using the memory allocation tracker
Statistically accurate benchmarking
Using BenchmarkTools.jl
Summary
Chapter 3: Types, Type Inference, and Stability
The Julia type system
Using types
Multiple dispatch
Abstract types
Julia's type hierarchy
Composite and immutable types
Type parameters
Type inference
Type-stability
Definitions
Fixing type instability
The performance pitfalls
Identifying type stability
Loop variables
Kernel methods and function barriers
Types in storage locations
Arrays
Composite types
Parametric composite types
Summary
Chapter 4: Making Fast Function Calls
Using globals
The trouble with globals
Fixing performance issues with globals
Inlining
Default inlining
Controlling inlining
Disabling inlining
Constant propagation
Using macros for performance
The Julia compilation process
Using macros
Evaluating a polynomial
Horner's method
The Horner macro
Generated functions
Using generated functions
Using generated functions for performance
Using keyword arguments
Summary
Chapter 5: Fast Numbers
Numbers in Julia, their layout, and storage
Integers
Integer overflow
BigInt
The floating point
Floating point accuracy
Unsigned integers
Trading performance for accuracy
The @fastmath macro
The K-B-N summation
Subnormal numbers
Subnormal numbers to zero
Summary
Chapter 6: Using Arrays
Array internals in Julia
Array representation and storage
Column-wise storage
Adjoints
Array initialization
Bounds checking
Removing the cost of bounds checking
Configuring bound checks at startup
Allocations and in-place operations
Preallocating function output
sizehint!
Mutating functions
Broadcasting
Array views
SIMD parallelization (AVX2, AVX512)
SIMD.jl
Specialized array types
Static arrays
Structs of arrays
Yeppp!
Writing generic library functions with arrays
Summary
Chapter 7: Accelerating Code with the GPU
Technical requirements
Getting started with GPUs
CUDA and Julia
CuArrays
Monte Carlo simulation on the GPU
Writing your own kernels
Measuring GPU performance
Performance tips
Scalar iteration
Combining kernels
Processing more data
Deep learning on the GPU
ArrayFire
Summary
Chapter 8: Concurrent Programming with Tasks
Tasks
Using tasks
The task life cycle
task_local_storage
Communicating between tasks
Task iteration
High-performance I/O
Port sharing for high-performance web serving
Summary
Chapter 9: Threads
Threads
Measuring CPU cores
Hwloc
Starting threads
The @threads macro
Prefix sum
Thread safety and synchronization primitives
Multithreaded Monte Carlo simulation
Atomics
Synchronization primitives
Threads and GC
Threaded libraries
Over-subscription
The future of threading
Summary
Chapter 10: Distributed Computing with Julia
Creating Julia clusters
Starting a cluster
Cluster managers
SSHManager
SLURM
Communication between Julia processes
Programming parallel tasks
The @everywhere macro
The @spawn macro
The @spawnat macro
Parallel for loops
Parallel map
Distributed Monte Carlo
Distributed arrays
Conway's Game of Life
Shared arrays
Parallel prefix sum with shared arrays
Summary
Licences
Other Books You May Enjoy
Index