Effective Multi-Tenant Distributed Systems: Challenges and Solutions When Running Complex Environments

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Organizations are eager to capitalize on real-time data analysis, move beyond batch processing for time-critical insights, and excel at big data in a predictable, reliable way. But performance has been an issue for distributed systems like Hadoop, especially when the use cases of a single cluster become multi-tenant or multi-workload. The worst part? You may not even know you have a performance issue. In this report, Chad Carson and Sean Suchter from Pepperdata describe the performance challenges of running multi-tenant distributed computing environments, especially within a Hadoop context. After examining pros and cons of current solutions for these problems, you’ll learn how to use real-time, intelligent software that tracks and dynamically adjusts each application’s usage of physical hardware. Get ahead of your Hadoop operations for faster, better decision-making and faster, better business returns. With this report, you’ll explore: - How Hadoop and other multi-tenant distributed systems work, and why performance matters - Business-visible symptoms of performance problems: late jobs, inconsistent runtimes, and underutilized hardware - Scheduling challenges in multi-tenant systems - Symptoms and solutions for CPU performance limitations - Physical and virtual limits of node memory—and what happens when you run out - Identifying and solving performance problems due to disk and network performance limits and other typical bottlenecks - Solutions for monitoring performance and accurately allocating cluster costs among users and business units

Author(s): Chad Carson; Sean Suschter
Publisher: O'Reilly
Year: 2016

Language: English
Pages: 102

Cover
Strata
Copyright
Table of Contents
Chapter 1. Introduction to Multi-Tenant Distributed Systems
The Benefits of Distributed Systems
Performance Problems in Distributed Systems
Scheduling
Hardware Bottlenecks
Lack of Visibility Within Multi-Tenant Distributed Systems
The Impact on Business from Performance Problems
Scope of This Book
Hadoop: An Example Distributed System
Terminology
Chapter 2. Scheduling in Distributed Systems
Introduction
Dominant Resource Fairness Scheduling
Aggressive Scheduling for Busy Queues
Special Scheduling Treatment for Small Jobs
Workload-Specific Scheduling Considerations
Inefficiencies in Scheduling
The Need to be Conservative with Memory
Inability to Effectively Schedule the Use of Other Resources
Deadlock and Starvation
Waste Due to Speculative Execution
Summary
Chapter 3. CPU Performance Considerations
Introduction
Algorithm Efficiency
Kernel Scheduling
Intentional or Accidental Bad Actors
Applying the Control Mechanisms in Multi-Tenant Distributed Systems
I/O Waiting and CPU Cache Impacts
Summary
Chapter 4. Memory Usage in Distributed Systems
Introduction
Physical Versus Virtual Memory
Node Thrashing
Detecting and Avoiding Thrashing
Kernel Out-Of-Memory Killer
Implications of Memory-Intensive Workloads for Multi-Tenant Distributed Systems
Solutions
Summary
Chapter 5. Disk Performance: Identifying and Eliminating Bottlenecks
Introduction
Overview of Disk Performance Limits
Disk Behavior When Using Multiple Disks
Disk Performance in Multi-Tenant Distributed Systems
Controlling Disk I/O Usage to Improve Performance for High-Priority Applications
Basic Disk I/O Prioritization Tools and Their Limitations
Effective Control of Disk I/O Usage
Solid-State Drives and Distributed Systems
Measuring Performance and Diagnosing Problems
Summary
Chapter 6. Network Performance Limits: Causes and Solutions
Introduction
Bandwidth Problems in Distributed Systems
Hadoop’s Solution to Network Bottlenecks: Move Computation to the Data
Why Network Quality of Service Does Not Solve the Problem of Network Bottlenecks
Controlling Network Usage on a Per-Application Basis
Other Network-Related Bottlenecks and Problems
Measuring Network Performance and Debugging Problems
ping and mtr
Retransmissions
Summary
Chapter 7. Other Bottlenecks in Distributed Systems
Introduction
NameNode Contention
ResourceManager Contention
ZooKeeper
Locks
External Databases and Related Systems
DNS Servers
Summary
Chapter 8. Monitoring Performance: Challenges and Solutions
Introduction
Why Monitor?
What to Monitor
Systems and Performance Aspects of Monitoring
Handling Huge Amounts of Metrics Data
Reliability of the Monitoring System
Some Commonly Used Monitoring Systems
Algorithmic and Logical Aspects of Monitoring
Challenges Specific to Multi-Tenant Distributed Systems
Measuring the Effect of Attempted Improvements
Allocating Cluster Costs Across Tenants
Summary
Chapter 9. Conclusion: Performance Challenges and Solutions for Effective Multi-Tenant Distributed Systems
About the Authors