This book presents the state of the art in distributed machine learning algorithms that are based on gradient optimization methods. In the big data era, large-scale datasets pose enormous challenges for the existing machine learning systems. As such, implementing machine learning algorithms in a distributed environment has become a key technology, and recent research has shown gradient-based iterative optimization to be an effective solution. Focusing on methods that can speed up large-scale gradient optimization through both algorithm optimizations and careful system implementations, the book introduces three essential techniques in designing a gradient optimization algorithm to train a distributed machine learning model: parallel strategy, data compression and synchronization protocol.
Written in a tutorial style, it covers a range of topics, from fundamental knowledge to a number of carefully designed algorithms and systems of distributed machine learning. It will appeal to a broad audience in the field of machine learning, artificial intelligence, big data and database management.
Author(s): Jiawei Jiang, Bin Cui, Ce Zhang
Series: Big Data Management
Publisher: Springer
Year: 2022
Language: English
Pages: 178
City: Singapore
Preface
Acknowledgments
Contents
Acronyms
1 Introduction
1.1 Background
1.1.1 Methodology of Machine Learning
1.1.2 Machine Learning Meets Big Data
1.2 Distributed Machine Learning
1.3 Gradient Optimization
1.3.1 First-Order Gradient Optimization Algorithms
1.3.1.1 Batch Gradient Descent
1.3.1.2 Stochastic Gradient Descent
1.3.1.3 Minibatch Gradient Descent
1.3.2 Serial Gradient Optimization
1.3.3 Distributed Gradient Optimization
1.4 Open Problems
References
2 Basics of Distributed Machine Learning
2.1 Anatomy of Distributed Machine Learning
2.2 Parallelism
2.2.1 Data Parallelism
2.2.1.1 Horizontal Partitioning
2.2.1.2 Vertical Partitioning
2.2.2 Model Parallelism
2.2.3 Hybrid Parallelism
2.3 Parameter Sharing
2.3.1 Shared-Nothing
2.3.1.1 Message Passing Interface
2.3.1.2 Remote Procedure Call
2.3.1.3 MapReduce
2.3.2 Shared-Memory
2.4 Synchronization
2.4.1 Bulk Synchronous Protocol
2.4.2 Asynchronous Protocol
2.4.3 Stale Synchronous Protocol
2.5 Communication Optimization
2.5.1 Lower Numerical Precision
2.5.2 Communication Compression
2.5.2.1 Lossless Compression for Integer Numbers
2.5.2.2 Lossless Compression for Sparse Matrices
2.5.2.3 Lossy Compression for Floating-point Numbers
References
3 Distributed Gradient Optimization Algorithms
3.1 Linear Models
3.1.1 Formalization of Linear Models
3.1.2 Overview of Popular Linear Models
3.1.3 Single-Node Gradient Optimization
3.1.3.1 Serial Gradient Optimization
3.1.3.2 Single-Node Parallel Gradient Optimization
3.1.4 Distributed Gradient Optimization
3.1.4.1 MR-BSP-SGD
3.1.4.2 MR-MA-SGD
3.1.4.3 PS-BSP-SGD
3.1.4.4 PS-SSP-SGD
3.1.4.5 Column-SGD
3.1.4.6 Other Related works
3.2 Neural Network Models
3.2.1 Formalization of Neural Network
3.2.1.1 Model Definition
3.2.1.2 Back-Propagation
3.2.2 Overview of Popular Neural Network Models
3.2.2.1 AutoEncoder
3.2.2.2 Deep Belief Network
3.2.2.3 Convolutional Neural Network
3.2.2.4 Recurrent Neural Network
3.2.2.5 Other Neural Networks
3.2.3 Distributed Gradient Optimization
3.2.3.1 PS-ASP-SGD
3.2.3.2 Decentralized-PSGD
3.2.3.3 Decentralized-ASP-SGD
3.2.3.4 QSGD
3.2.3.5 Sparsification-SGD
3.2.3.6 Model-Parallel SGD
3.3 Gradient Boosting Decision Tree
3.3.1 Formalization of Gradient Boosting Decision Tree
3.3.2 Distributed Gradient Optimization
References
4 Distributed Machine Learning Systems
4.1 General Machine Learning Systems
4.1.1 MapReduce Systems
4.1.2 Parameter Server Systems
4.2 Specialized Machine Learning Systems
4.3 Deep Learning Systems
4.4 Cloud Machine Learning Systems
4.4.1 Geo-Distributed Systems
4.4.2 Serverless Systems
4.5 In-Database Machine Learning Systems
References
5 Conclusion
5.1 Summary of the Book
5.2 Further Reading
References