Large-scale Graph Analysis: System, Algorithm and Optimization

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Author(s): Yingxia Shao, Bin Cui, Lei Chen
Publisher: Springer Singapore
Year: 2020

Language: English

Preface
Acknowledgements
Contents
1 Introduction
1.1 Background
1.2 Graph Analysis Tasks
1.2.1 Subgraph Matching and Enumeration
1.2.2 Graph Extraction
1.2.3 Cohesive Subgraph Detection
1.3 The Research Issues
1.4 The Overview of the Book
References
2 Graph Computing Systems for Large-Scale Graph Analysis
2.1 Distributed Graph Computing Systems
2.2 Vertex Programming Abstraction
2.3 Gather–Apply–Scatter Programming Abstraction
2.4 Workload-Aware Cost Model for Optimizations
2.4.1 Cost Model Analysis
2.4.1.1 Cost Model of BSP-Based Systems
2.4.1.2 Cost Model in the Context of Graph
2.4.1.3 The Problem of Workload Balance
2.4.2 The Principles of Optimizations
2.4.2.1 Optimizations with Respect to the Workload Source
2.4.2.2 Optimizations with Respect to Workload Distribution
References
3 Partition-Aware Graph Computing System
3.1 Introduction
3.2 Message Processing in Pregel-Like Systems
3.2.1 The Influence of Graph Algorithms
3.2.2 The Total Cost of a Worker
3.3 PAGE: A Partition-Aware System for Large-Scale Graph Analysis
3.3.1 Graph Algorithm Execution in PAGE
3.3.2 Dual Concurrent Message Processor
3.3.3 Partition-Aware Concurrency Control Model
3.3.3.1 Mathematical Formulation of DCCM
3.3.3.2 Adaptiveness on Various Graph Partitions
3.3.3.3 Implementation of DCCM
3.4 Experiments
3.4.1 Experimental Setup
3.4.2 Evaluation on the Concurrency Control Model
3.4.2.1 Concurrency Determined by DCCM
3.4.2.2 Results by Manual Tuning
3.4.2.3 Adaptivity of DCCM
3.4.3 Comparison with Other Pregel-Like Systems
3.4.3.1 Advantage of PAGE
3.4.3.2 Performance on Various Graph Algorithms
3.4.3.3 Performance by Varying Numbers of Partitions
3.5 Summary
References
4 Efficient Parallel Subgraph Enumeration
4.1 Introduction
4.2 Problem Definition and Backgrounds
4.2.1 Problem and Notations
4.2.2 Partial Subgraph Instance
4.2.3 Automorphism of a Pattern Graph
4.3 Parallel Subgraph Enumeration Framework
4.3.1 Independence Property of Enumeration
4.3.2 PSgL Framework
4.3.3 Partial Subgraph Instance Expansion
4.3.4 Cost Analysis
4.4 The Optimizations of the Framework
4.4.1 Workload Distribution Strategy
4.4.1.1 Workload-Aware Distribution Strategy
4.4.1.2 Naive Distribution Strategies
4.4.2 Partial Subgraph Instance Reduction
4.4.2.1 Automorphism Breaking of the Pattern Graph
4.4.2.2 Initial Pattern Vertex Selection
4.4.2.3 Pruning Invalid Partial Subgraph Instance
4.4.3 Implementation Details
4.5 Experiments
4.5.1 Experimental Setup
4.5.2 Effects of Workload Distribution Strategies
4.5.3 Effects of Partial Subgraph Instance Reduction
4.5.3.1 Importance of the Initial Pattern Vertex
4.5.3.2 Efficiency of the Light-Weight Edge Index
4.5.4 Performance on Various Pattern Graphs
4.5.5 Scalability
4.5.5.1 Scalability on Large Graphs
4.5.5.2 Scalability on the Number of Workers
4.6 Summary
References
5 Efficient Parallel Graph Extraction
5.1 Introduction
5.2 Graph Extraction Problem
5.2.1 Preliminaries
5.2.2 Definition of Homogeneous Graph Extraction
5.2.3 The Characteristics of Graph Extraction
5.2.3.1 Path Enumeration and Pair-Wise Aggregation
5.2.3.2 The Hardness of Graph Extraction
5.3 Parallel Graph Extraction Framework
5.3.1 Primitive Pattern and Path Concatenation Plan
5.3.2 PCP Evaluation Algorithm
5.3.3 Cost Analysis
5.4 Aggregation in Homogeneous Graph Extraction
5.4.1 Distributive, Algebraic, and Holistic Aggregation
5.4.2 Optimization with Partial Aggregation
5.5 Path Concatenation Plan Selection
5.5.1 The Path Size Estimation for PCP
5.5.2 PCP Selection
5.5.2.1 Iteration Optimized Strategy
5.5.2.2 Path Optimized Strategy
5.5.2.3 Hybrid Strategy
5.6 Experiments
5.6.1 Experiment Settings
5.6.2 Effectiveness of Partial Aggregation Technique
5.6.3 Comparison of Different Plans
5.6.4 Comparison of Standalone Solution
5.6.5 Comparison of RPQ-Based Solution
5.6.6 Scalability
5.7 Summary
References
6 Efficient Parallel Cohesive Subgraph Detection
6.1 Introduction
6.2 Problem Definition
6.2.1 Preliminaries
6.2.2 Fundamental Operation
6.2.3 Parallel Computing Context
6.3 The Existing Parallel Algorithms
6.3.1 Improved L. Quick's Algorithm
6.3.2 Limitations Discussion
6.4 The Framework of PeTa
6.4.1 Subgraph-Oriented Model
6.4.2 Triangle Complete Subgraph
6.4.3 The Local Subgraph Algorithm in PeTa
6.4.4 Complexity Analysis of PeTa
6.4.4.1 Space Cost
6.4.4.2 Computation Complexity
6.4.4.3 Communication Cost
6.4.4.4 Number of Iterations
6.5 The Influence of Graph Partitions
6.5.1 Edge-support Law
6.5.2 Partition Influence on PeTa
6.5.3 Implementation Details
6.6 Experiments
6.6.1 Environment Setup
6.6.2 The Influence of Partition Schemes for PeTa
6.6.3 Performance Comparison
6.6.4 Scalability
6.7 Summary
References
7 Conclusions
References