This book constitutes the refereed proceedings of the 15th International Conference on Parallel Computing, Euro-Par 2009, held in Delft, The Netherlands, in August 2009.
The 85 revised papers presented were carefully reviewed and selected from 256 submissions. The papers are organized in topical sections on support tools and environments; performance prediction and evaluation; scheduling and load balancing; high performance architectures and compilers; parallel and distributed databases; grid, cluster, and cloud computing; peer-to-peer computing; distributed systems and algorithms; parallel and distributed programming; parallel numerical algorithms; multicore and manycore programming; theory and algorithms for parallel computation; high performance networks; and mobile and ubiquitous computing.
Author(s): Michael Perrone (auth.), Henk Sips, Dick Epema, Hai-Xiang Lin (eds.)
Series: Lecture Notes in Computer Science 5704 : Theoretical Computer Science and General Issues
Edition: 1
Publisher: Springer-Verlag Berlin Heidelberg
Year: 2009
Language: English
Pages: 1120
Tags: Computer Systems Organization and Communication Networks; System Performance and Evaluation; Processor Architectures; Computer Communication Networks; Operating Systems; Software Engineering
Front Matter....Pages -
Multicore Programming Challenges....Pages 1-2
Ibis: A Programming System for Real-World Distributed Computing....Pages 3-3
What Is in a Namespace?....Pages 4-4
Front Matter....Pages 5-5
Introduction....Pages 7-8
Atune-IL: An Instrumentation Language for Auto-tuning Parallel Applications....Pages 9-20
Assigning Blame: Mapping Performance to High Level Parallel Programming Abstractions....Pages 21-32
A Holistic Approach towards Automated Performance Analysis and Tuning....Pages 33-44
Pattern Matching and I/O Replay for POSIX I/O in Parallel Programs....Pages 45-56
An Extensible I/O Performance Analysis Framework for Distributed Environments....Pages 57-68
Grouping MPI Processes for Partial Checkpoint and Co-migration....Pages 69-80
Process Mapping for MPI Collective Communications....Pages 81-92
Front Matter....Pages 93-93
Introduction....Pages 95-96
Stochastic Analysis of Hierarchical Publish/Subscribe Systems....Pages 97-109
Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core Processors....Pages 110-121
Hybrid Techniques for Fast Multicore Simulation....Pages 122-134
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications....Pages 135-148
A Methodology to Characterize Critical Section Bottlenecks in DSM Multiprocessors....Pages 149-161
Front Matter....Pages 163-163
Introduction....Pages 165-165
Dynamic Load Balancing of Matrix-Vector Multiplications on Roadrunner Compute Nodes....Pages 166-177
A Unified Framework for Load Distribution and Fault-Tolerance of Application Servers....Pages 178-190
Front Matter....Pages 163-163
On the Feasibility of Dynamically Scheduling DAG Applications on Shared Heterogeneous Systems....Pages 191-202
Steady-State for Batches of Identical Task Trees....Pages 203-215
A Buffer Space Optimal Solution for Re-establishing the Packet Order in a MPSoC Network Processor....Pages 216-227
Using Multicast Transfers in the Replica Migration Problem: Formulation and Scheduling Heuristics....Pages 228-240
A New Genetic Algorithm for Scheduling for Large Communication Delays....Pages 241-252
Comparison of Access Policies for Replica Placement in Tree Networks....Pages 253-264
Scheduling Recurrent Precedence-Constrained Task Graphs on a Symmetric Shared-Memory Multiprocessor....Pages 265-280
Energy-Aware Scheduling of Flow Applications on Master-Worker Platforms....Pages 281-292
Front Matter....Pages 293-293
Introduction....Pages 295-296
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs....Pages 297-308
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors....Pages 309-320
REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs....Pages 321-333
Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation....Pages 334-344
Front Matter....Pages 345-345
Introduction....Pages 347-348
Unifying Memory and Database Transactions....Pages 349-360
A DHT Key-Value Storage System with Carrier Grade Performance....Pages 361-374
Selective Replicated Declustering for Arbitrary Queries....Pages 375-386
Front Matter....Pages 387-387
Introduction....Pages 389-389
POGGI: Puzzle-Based Online Games on Grid Infrastructures....Pages 390-403
Enabling High Data Throughput in Desktop Grids through Decentralized Data and Metadata Management: The BlobSeer Approach....Pages 404-416
Front Matter....Pages 387-387
MapReduce Programming Model for .NET-Based Cloud Computing....Pages 417-428
The Architecture of the XtreemOS Grid Checkpointing Service....Pages 429-441
Scalable Transactions for Web Applications in the Cloud....Pages 442-453
Provider-Independent Use of the Cloud....Pages 454-465
MPI Applications on Grids: A Topology Aware Approach....Pages 466-477
Front Matter....Pages 479-479
Introduction....Pages 481-482
A Least-Resistance Path in Reasoning about Unstructured Overlay Networks....Pages 483-497
SiMPSON: Efficient Similarity Search in Metric Spaces over P2P Structured Overlay Networks....Pages 498-510
Uniform Sampling for Directed P2P Networks....Pages 511-522
Adaptive Peer Sampling with Newscast....Pages 523-534
Exploring the Feasibility of Reputation Models for Improving P2P Routing under Churn....Pages 535-547
Selfish Neighbor Selection in Peer-to-Peer Backup and Storage Applications....Pages 548-560
Zero-Day Reconciliation of BitTorrent Users with Their ISPs....Pages 561-573
Surfing Peer-to-Peer IPTV: Distributed Channel Switching....Pages 574-586
Front Matter....Pages 587-587
Introduction....Pages 589-589
Distributed Individual-Based Simulation....Pages 590-601
A Self-stabilizing K-Clustering Algorithm Using an Arbitrary Metric....Pages 602-614
Active Optimistic Message Logging for Reliable Execution of MPI Applications....Pages 615-626
Front Matter....Pages 627-627
Introduction....Pages 629-629
A Parallel Numerical Library for UPC....Pages 630-641
Front Matter....Pages 627-627
A Multilevel Parallelization Framework for High-Order Stencil Computations....Pages 642-653
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores....Pages 654-665
Parallel Skeletons for Variable-Length Lists in SkeTo Skeleton Library....Pages 666-677
Stkm on Sca : A Unified Framework with Components, Workflows and Algorithmic Skeletons....Pages 678-690
Grid-Enabling SPMD Applications through Hierarchical Partitioning and a Component-Based Runtime....Pages 691-703
Reducing Rollbacks of Transactional Memory Using Ordered Shared Locks....Pages 704-715
Front Matter....Pages 717-717
Introduction....Pages 719-720
Wavelet-Based Adaptive Solvers on Multi-core Architectures for the Simulation of Complex Systems....Pages 721-734
Localized Parallel Algorithm for Bubble Coalescence in Free Surface Lattice-Boltzmann Method....Pages 735-746
Fast Implicit Simulation of Oscillatory Flow in Human Abdominal Bifurcation Using a Schur Complement Preconditioner....Pages 747-759
A Parallel Rigid Body Dynamics Algorithm....Pages 760-771
Optimized Stencil Computation Using In-Place Calculation on Modern Multicore Systems....Pages 772-784
Parallel Implementation of Runge–Kutta Integrators with Low Storage Requirements....Pages 785-796
PSPIKE: A Parallel Hybrid Sparse Linear System Solver....Pages 797-808
Out-of-Core Computation of the QR Factorization on Multi-core Processors....Pages 809-820
Adaptive Parallel Householder Bidiagonalization....Pages 821-833
Front Matter....Pages 835-835
Introduction....Pages 837-838
Tile Percolation: An OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor....Pages 839-850
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs....Pages 851-862
StarPU : A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures....Pages 863-874
Front Matter....Pages 835-835
XJava: Exploiting Parallelism with Object-Oriented Stream Programming....Pages 875-886
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA....Pages 887-899
Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades....Pages 900-911
Searching for Concurrent Design Patterns in Video Games....Pages 912-923
Parallelization of a Video Segmentation Algorithm on CUDA–Enabled Graphics Processing Units....Pages 924-935
A Parallel Point Matching Algorithm for Landmark Based Image Registration Using Multicore Platform....Pages 936-947
High Performance Matrix Multiplication on Many Cores....Pages 948-959
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm....Pages 960-973
Efficient Parallel Implementation of Evolutionary Algorithms on GPGPU Cards....Pages 974-985
Front Matter....Pages 987-987
Introduction....Pages 989-989
Implementing Parallel Google Map-Reduce in Eden....Pages 990-1002
A Lower Bound for Oblivious Dimensional Routing....Pages 1003-1010
Front Matter....Pages 1011-1011
Introduction....Pages 1013-1014
A Case Study of Communication Optimizations on 3D Mesh Interconnects....Pages 1015-1028
Implementing a Change Assimilation Mechanism for Source Routing Interconnects....Pages 1029-1039
Dependability Analysis of a Fault-Tolerant Network Reconfiguring Strategy....Pages 1040-1051
RecTOR: A New and Efficient Method for Dynamic Network Reconfiguration....Pages 1052-1064
NIC-Assisted Cache-Efficient Receive Stack for Message Passing over Ethernet....Pages 1065-1077
A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks....Pages 1078-1088
Hardware Implementation Study of the SCFQ-CA and DRR-CA Scheduling Algorithms....Pages 1089-1100
Front Matter....Pages 1101-1101
Introduction....Pages 1103-1103
Optimal and Near-Optimal Energy-Efficient Broadcasting in Wireless Networks....Pages 1104-1115
Back Matter....Pages -