Distributed Computing - Introduction, RPC, Time, State, Consensus, Replication, Fault-Tolerance, PAXOS, Transactions, Consistency, Peer-to-Peer, Analytics, Datacenter Computing, Machine Learning, Blockchain, IoT, Edge Computing

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Author(s): Various

Language: English
Tags: distributed systems; distributed computing; OS; systems design; operating systems; networking; cloud computing; cloud; paxos; byzantine; fault tolerance; gaia; servers; serverless; tetrisched; cluster; fasst; legoos; lego os; mapreduce; map; reduce; map-reduce; map reduce; chord; memcache; facebook; google; aurora; amazon; spanner; lamport clocks; raft; logical clock; message-passing; chain replication; snapshots; ordering of events; logical time; models; fallacies; fallacy

I
INTRODUCTION
BACKGROUND AND DEFINITIONS
System Model
Consistent System States
Interactions with the Outside World
In-Transit Messages
Logging Protocols
Stable Storage
Garbage Collection
CHECKPOINT-BASED ROLLBACK RECOVERY
Uncoordinated Checkpointing
Overview
Dependency Graphs and Recovery Line Calculation
The Domino Effect
Coordinated Checkpointing
Overview
Non-blocking Checkpoint Coordination
Checkpointing with Synchronized Clocks
Checkpointing and Communication Reliability
Minimal Checkpoint Coordination
Communication-induced Checkpointing
Overview
Model-based Protocols
Index-based Protocols
LOG-BASED ROLLBACK RECOVERY
The No-Orphans Consistency Condition
Pessimistic Logging
Overview
Techniques for Reducing Performance Overhead
Relaxing Logging Atomicity
Optimistic Logging
Overview
Synchronous vs. Asynchronous Recovery
Causal Logging
Overview
Tracking Causality
Comparison
IMPLEMENTATION ISSUES
Overview
Checkpointing Implementation
Concurrent Checkpointing
Incremental Checkpointing
System-level versus User-level Implementations
Compiler Support
Checkpoint Placement
Checkpointing Protocols in Comparison
Communication Protocols
Location-Independent Identities and Redirection
Reliable Channel Protocols
Log-based Recovery
Message Logging Overhead
Combining Log-Based Recovery with Coordinated Checkpointing
Stable Storage
Support for Nondeterminism
System Calls
Asynchronous Signals
Dependency Tracking
Recovery
Reinstating a Process in its Environment
Behavior During Recovery
Checkpointing and Mobility
Rollback Recovery in Practice
CONCLUDING REMARKS
Introduction
Implementation
Spanserver Software Stack
Directories and Placement
Data Model
TrueTime
Concurrency Control
Timestamp Management
Paxos Leader Leases
Assigning Timestamps to RW Transactions
Serving Reads at a Timestamp
Assigning Timestamps to RO Transactions
Details
Read-Write Transactions
Read-Only Transactions
Schema-Change Transactions
Refinements
Evaluation
Microbenchmarks
Availability
TrueTime
F1
Related Work
Future Work
Conclusions
Paxos Leader-Lease Management
Abstract
1 Introduction
2 Making Writes Efficient
2.1 Aurora System Architecture
2.2 Writes in Aurora
2.3 Storage Consistency Points and Commits
2.4 Crash Recovery in Aurora
3 Making Reads Efficient
3.1 Avoiding quorum reads
3.2 Scaling Reads Using Read Replicas
3.3 Structural Consistency in Aurora Replicas
3.4 Snapshot Isolation and Read View Anchors in Aurora Replicas
4 Failures and Quorum Membership
4.1 Using Quorum Sets to Change Membership
4.2 Using Quorum Sets to Reduce Costs
5 Related Work
6 Conclusions
Acknowledgments
References
Introduction
ALPS Systems and Trade-offs
Causal+ Consistency
Definition
Causal+ vs. Other Consistency Models
Causal+ in COPS
Scalable Causality
System Design of COPS
Overview of COPS
The COPS Key-Value Store
Client Library and Interface
Writing Values in COPS and COPS-GT
Reading Values in COPS
Get Transactions in COPS-GT
Garbage, Faults, and Conflicts
Garbage Collection Subsystem
Fault Tolerance
Conflict Detection
Evaluation
Implementation and Experimental Setup
Microbenchmarks
Dynamic Workloads
Scalability
Related Work
Conclusion
Formal Definition of Causal+
Abstract
1 Introduction
2 Background
2.1 Non-volatile main memory
2.2 High-performance networking
2.3 Goals of this paper
3 Evaluation setup
4 Low-latency writes
4.1 Persistent RDMA background
4.2 Durability guarantee of RDMA
4.3 Measurements
4.4 Newer NICs
4.5 Future RDMA extensions
4.6 Low-latency state machine replication
5 High-bandwidth bulk writes
5.1 Discussion on disabling DDIO
5.2 Improving RDMA bandwidth
5.3 DMA engine background
5.4 IOAT DMA microbenchmarks
5.5 Optimizing RPCs with DMA
6 Persistent log
6.1 Diagnosis: Cache line invalidation
6.2 Rotating counter
6.3 Extension to rotating registers
6.4 End-to-end performance
7 Related work
8 Conclusion
References
Introduction
Disaggregate Hardware Resource
Limitations of Monolithic Servers
Hardware Resource Disaggregation
OSes for Resource Disaggregation
The Splitkernel OS Architecture
LegoOS Design
Abstraction and Usage Model
Hardware Architecture
Process Management
Process Management and Scheduling
ExCache Management
Supporting Linux Syscall Interface
Memory Management
Memory Space Management
Optimization on Memory Accesses
Storage Management
Global Resource Management
Reliability and Failure Handling
LegoOS Implementation
Hardware Emulation
Network Stack
Processor Monitor
Memory Monitor
Storage Monitor
Experience and Discussion
Evaluation
Micro- and Macro-benchmark Results
Application Performance
Failure Analysis
Related Work
Discussion and Conclusion
Introduction
Background
Fast distributed transactions
RDMA
Choosing networking primitives
Advantage of RPCs
Advantage of datagram transport
Performance considerations
On small clusters
On medium-sized clusters
Reliability considerations
Stress tests for packet loss
FaSST RPCs
Coroutines
RPC interface and optimizations
Detecting packet loss
RPC limitations
Single-core RPC performance
Transactions
Handling failures and packet loss
Implementation
Transaction API
Evaluation
Object store
Single-key read-only transactions
Multi-key transactions
TATP
SmallBank
Latency
Future trends
Scalable one-sided RDMA
More queue pairs
Advanced one-sided RDMA
Related work
Conclusion
Introduction
Background
FaaS Workloads
Data Collection
Functions, Applications, and Triggers
Invocation Patterns
Function Execution Times
Memory Usage
Main Takeaways
Managing Cold Starts in FaaS
Design Challenges
Hybrid Histogram Policy
Implementation in Apache OpenWhisk
Evaluation
Methodology
Simulation Results
Experimental results
Production Implementation
Related Work
Conclusion
Abstract
1 Introduction
2 Motivation
3 Challenges
4 Overview of Cartel
5 Design Detail
5.1 Metadata Storage and Aggregation
5.2 Cartel – Three Key Mechanisms
5.3 Cartel Runtime
6 Evaluation
6.1 Experimental Methodology
6.2 Benefits from Cartel
6.3 Effect of Mechanisms
6.4 Use Case - Network Attack
7 Discussion
8 Related Work
9 Conclusion
References
Introduction
Background & Model
Smart-home Platforms
Programming Model
Failures in IoT Environments
Problem Study
Inconsistency
Dependency
Analysis and Findings
Transactuations
Abstraction & API
Chaining transactuations
Relacs
Relacs Store
Execution Model
Relacs Runtime
Fault Tolerance
Implementation
Discussion
Evaluation
Programmability
Correctness
Overhead
Related Work
Conclusion
Acknowledgment