Author(s): Matloff, Norman S.
Series: Chapman & Hall/CRC the R series (CRC Press)
Edition: 2
Publisher: CRC Press
Year: 2016
Language: English
Pages: 310
Tags: Библиотека;Компьютерная литература;R;
Introduction to Parallel Processing in R Recurring Theme: The Principle of Pretty Good Parallelism A Note on Machines Recurring Theme: Hedging One's Bets Extended Example: Mutual Web Outlinks "Why Is My Program So Slow?": Obstacles to Speed Obstacles to Speed Performance and Hardware Structures Memory Basics Network Basics Latency and Bandwidth Thread Scheduling How Many Processes/Threads? Example: Mutual Outlink Problem "Big O" Notation Data Serialization "Embarrassingly Parallel" Applications Principles of Parallel Loop Scheduling General Notions of Loop Scheduling Chunking in Snow A Note on Code Complexity Example: All Possible Regressions The partools Package Example: All Possible Regressions, Improved Version Introducing Another Tool: multicore Issues with Chunk Size Example: Parallel Distance Computation The foreach Package Stride Another Scheduling Approach: Random Task Permutation Debugging snow and multicore Code The Shared Memory Paradigm: A Gentle Introduction through R So, What Is Actually Shared? Clarity and Conciseness of Shared-Memory Programming High-Level Introduction to Shared-Memory Programming: Rdsm Package Example: Matrix Multiplication Shared Memory Can Bring a Performance Advantage Locks and Barriers Example: Finding the Maximal Burst in a Time Series Example: Transformation of an Adjacency Matrix Example: k-Means Clustering The Shared Memory Paradigm: C Level OpenMP Example: Finding the Maximal Burst in a Time Series OpenMP Loop Scheduling Options Example: Transformation an Adjacency Matrix Example: Transforming an Adjacency Matrix, R-Callable Code Speedup in C Run Time vs. Development Time Further Cache/Virtual Memory Issues Reduction Operations in OpenMP Debugging Intel Thread Building Blocks (TBB) Lockfree Synchronization The Shared Memory Paradigm: GPUs Overview Another Note on Code Complexity Goal of This Chapter Introduction to NVIDIA GPUs and CUDA Example: Mutual Inlinks Problem Synchronization on GPUs R and GPUs The Intel Xeon Phi Chip Thrust and Rth Hedging One's Bets Thrust Overview Rth Skipping the C++ Example: Finding Quantiles Introduction to Rth The Message Passing Paradigm Message Passing Overview The Cluster Model Performance Issues Rmpi Example: Pipelined Method for Finding Primes Memory Allocation Issues Message-Passing Performance Subtleties MapReduce Computation Apache Hadoop Other MapReduce Systems R Interfaces to MapReduce Systems An Alternative: "Snowdoop" Parallel Sorting and Merging The Elusive Goal of Optimality Sorting Algorithms Example: Bucket Sort in R Example: Quicksort in OpenMP Sorting in Rth Some Timing Comparisons Sorting on Distributed Data Parallel Prefix Scan General Formulation Applications General Strategies for Parallel Scan Computation Implementations of Parallel Prefix Scan Parallel cumsum() with OpenMP Example: Moving Average Parallel Matrix Operations Tiled Matrices Example: Snowdoop Approach to Matrix Operations Parallel Matrix Multiplication BLAS Libraries Example: A Look at the Performance of OpenBLAS Example: Graph Connectedness Solving Systems of Linear Equations Sparse Matrices Inherently Statistical Approaches: Subset Methods Chunk Averaging Bag of Little Bootstraps Subsetting Variables Appendix A: Review of Matrix Algebra Appendix B: R Quick Start Appendix C: Introduction to C for R Programmers