Mastering Distributed Tracing: Analyzing performance in microservices and complex systems

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Understand how to apply distributed tracing to microservices-based architectures Key Features • A thorough conceptual introduction to distributed tracing • An exploration of the most important open standards in the space • A how-to guide for code instrumentation and operating a tracing infrastructure Book Description Mastering Distributed Tracing will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. The rise of Internet-scale companies, like Google and Amazon, ushered in a new era of distributed systems operating on thousands of nodes across multiple data centers. Microservices increased that complexity, often exponentially. It is harder to debug these systems, track down failures, detect bottlenecks, or even simply understand what is going on. Distributed tracing focuses on solving these problems for complex distributed systems. Today, tracing standards have developed and we have much faster systems, making instrumentation less intrusive and data more valuable. Yuri Shkuro, the creator of Jaeger, a popular open-source distributed tracing system, delivers end-to-end coverage of the field in Mastering Distributed Tracing. Review the history and theoretical foundations of tracing; solve the data gathering problem through code instrumentation, with open standards like OpenTracing, W3C Trace Context, and OpenCensus; and discuss the benefits and applications of a distributed tracing infrastructure for understanding, and profiling, complex systems. What you will learn • How to get started with using a distributed tracing system • How to get the most value out of end-to-end tracing • Learn about open standards in the space • Learn about code instrumentation and operating a tracing infrastructure • Learn where distributed tracing fits into microservices as a core function Who this book is for Any developer interested in testing large systems will find this book very revealing and in places, surprising. Every microservice architect and developer should have an insight into distributed tracing, and the book will help them on their way. System administrators with some development skills will also benefit. No particular programming language skills are required, although an ability to read Java, while non-essential, will help with the core chapters.

Author(s): Yuri Shkuro
Edition: 1
Publisher: Packt Publishing
Year: 2019

Language: English
Commentary: Vector PDF
Pages: 444
City: Birmingham, UK
Tags: Security; Data Mining; Python; Java; Asynchronous Programming; MySQL; Distributed Systems; Monitoring; Logging; Microservices; Apache Kafka; Docker; ZooKeeper; Deployment; Redis; Kibana; Go; Kubernetes; Software Architecture; Resilience; Cloud-native Applications; Zipkin; OpenTracing; Jaeger; Distributed Tracing; Observability; HotROD; Data Gathering; OpenZipkin; SkyWalking; Service Mesh; Distributed Context Propagation

Cover
Copyright
Mapt upsell
Contributors
Table of Contents
Preface
Part I: Introduction
Chapter 1: Why Distributed Tracing?
Microservices and cloud-native applications
What is observability?
The observability challenge of microservices
Traditional monitoring tools
Metrics
Logs
Distributed tracing
My experience with tracing
Why this book?
Summary
References
Chapter 2: Take Tracing for a HotROD Ride
Prerequisites
Running from prepackaged binaries
Running from Docker images
Running from the source code
Go language development environment
Jaeger source code
Start Jaeger
Meet the HotROD
The architecture
The data flow
Contextualized logs
Span tags versus logs
Identifying sources of latency
Resource usage attribution
Summary
References
Chapter 3: Distributed Tracing Fundamentals
The idea
Request correlation
Black-box inference
Schema-based
Metadata propagation
Anatomy of distributed tracing
Sampling
Preserving causality
Inter-request causality
Trace models
Event model
Span model
Clock skew adjustment
Trace analysis
Summary
References
Part II: Data Gathering Problem
Chapter 4: Instrumentation Basics with OpenTracing
Prerequisites
Project source code
Go development environment
Java development environment
Python development environment
MySQL database
Query tools (curl or wget)
Tracing backend (Jaeger)
OpenTracing
Exercise 1 – the Hello application
Hello application in Go
Hello application in Java
Hello application in Python
Exercise summary
Exercise 2 – the first trace
Step 1 – create a tracer instance
Create a tracer in Go
Create a tracer in Java
Create a tracer in Python
Step 2 – start a span
Start a span in Go
Start a span in Java
Start a span in Python
Step 3 – annotate the span
Annotate the span in Go
Annotate the span in Java
Annotate the span in Python
Exercise summary
Exercise 3 – tracing functions and passing context
Step 1 – trace individual functions
Trace individual functions in Go
Trace individual functions in Java
Trace individual functions in Python
Step 2 – combine multiple spans into a single trace
Combine multiple spans into a single trace in Go
Combine multiple spans into a single trace in Java
Combine multiple spans into a single trace in Python
Step 3 – propagate the in-process context
In-process context propagation in Python
In-process context propagation in Java
In-process context propagation in Go
Exercise summary
Exercise 4 – tracing RPC requests
Step 1 – break up the monolith
Microservices in Go
Microservices in Java
Microservices in Python
Step 2 – pass the context between processes
Passing context between processes in Go
Passing context between processes in Java
Passing context between processes in Python
Step 3 – apply OpenTracing-recommended tags
Standard tags in Go
Standard tags in Java
Standard tags in Python
Exercise summary
Exercise 5 – using baggage
Using baggage in Go
Using baggage in Java
Using baggage in Python
Exercise summary
Exercise 6 – auto-instrumentation
Open source instrumentation in Go
Auto-instrumentation in Java
Auto-instrumentation in Python
Exercise 7 – extra credit
Summary
References
Chapter 5: Instrumentation of Asynchronous Applications
Prerequisites
Project source code
Java development environment
Kafka, Zookeeper, Redis, and Jaeger
The Tracing Talk chat application
Implementation
The lib module
The chat-api service
The storage-service microservice
The giphy-service microservice
Running the application
Observing traces
Instrumenting with OpenTracing
Spring instrumentation
Tracer resolver
Redis instrumentation
Kafka instrumentation
Producing messages
Consuming messages
Instrumenting asynchronous code
Summary
References
Chapter 6: Tracing Standards and Ecosystem
Styles of instrumentation
Anatomy of tracing deployment and interoperability
Five shades of tracing
Know your audience
The ecosystem
Tracing systems
Zipkin and OpenZipkin
Jaeger
SkyWalking
X-Ray, Stackdriver, and more
Standards projects
W3C Trace Context
W3C "Data Interchange Format"
OpenCensus
OpenTracing
Summary
References
Chapter 7: Tracing with Service Meshes
Service meshes
Observability via a service mesh
Prerequisites
Project source code
Java development environment
Kubernetes
Istio
The Hello application
Distributed tracing with Istio
Using Istio to generate a service graph
Distributed context and routing
Summary
References
Chapter 8: All About Sampling
Head-based consistent sampling
Probabilistic sampling
Rate limiting sampling
Guaranteed-throughput probabilistic sampling
Adaptive sampling
Local adaptive sampling
Global adaptive sampling
Implications of adaptive sampling
Extensions
Context-sensitive sampling
Ad-hoc or debug sampling
How to deal with oversampling
Post-collection down-sampling
Throttling
Tail-based consistent sampling
Partial sampling
Summary
References
Part III: Getting Value from Tracing
Chapter 9: Turning the Lights On
Tracing as a knowledge base
Service graphs
Deep, path-aware service graphs
Detecting architectural problems
Performance analysis
Critical path analysis
Recognizing trace patterns
Look for error markers
Look for the longest span on the critical path
Look out for missing details
Avoid sequential execution or "staircase"
Be wary when things finish at exactly the same time
Exemplars
Latency histograms
Long-term profiling
Summary
References
Chapter 10: Distributed Context Propagation
Brown Tracing Plane
Pivot tracing
Chaos engineering
Traffic labeling
Testing in production
Debugging in production
Developing in production
Summary
References
Chapter 11: Integration with Metrics and Logs
Three pillars of observability
Prerequisites
Project source code
Java development environment
Running the servers in Docker
Declaring index pattern in Kibana
Running the clients
The Hello application
Integration with metrics
Standard metrics via tracing instrumentation
Adding context to metrics
Context-aware metrics APIs
Integration with logs
Structured logging
Correlating logs with trace context
Context-aware logging APIs
Capturing logs in the tracing system
Do we need separate logging and tracing backends?
Summary
References
Chapter 12: Gathering Insights with Data Mining
Feature extraction
Components of a data mining pipeline
Tracing backend
Trace completion trigger
Feature extractor
Aggregator
Feature extraction exercise
Prerequisites
Project source code
Running the servers in Docker
Defining index mapping in Elasticsearch
Java development environment
Microservices simulator
Running as a Docker image
Running from source
Verify
Define an index pattern in Kibana
The Span Count job
Trace completion trigger
Feature extractor
Observing trends
Beware of extrapolations
Historical analysis
Ad hoc analysis
Summary
References
Part IV: Deploying and Operating Tracing Infrastructure
Chapter 13: Implementing Tracing in Large Organizations
Why is it hard to deploy tracing instrumentation?
Reduce the barrier to adoption
Standard frameworks
In-house adapter libraries
Tracing enabled by default
Monorepos
Integration with existing infrastructure
Where to start
Building the culture
Explaining the value
Integrating with developer workflows
Tracing Quality Metrics
Troubleshooting guide
Don't be on the critical path
Summary
References
Chapter 14: Under the Hood of a Distributed Tracing System
Why host your own?
Customizations and integrations
Bandwidth cost
Own the data
Bet on emerging standards
Architecture and deployment modes
Basic architecture: agent + collector + query service
Client
Agent
Collector
Query service and UI
Data mining jobs
Streaming architecture
Multi-tenancy
Cost accounting
Complete isolation
Granular access controls
Security
Running in multiple DCs
Capturing origin zone
Cross-zone federation
Monitoring and troubleshooting
Resiliency
Over-sampling
Debug traces
Traffic spikes due to DC failover
Perpetual traces
Very long traces
Summary
References
Afterword
References
Other Books You May Enjoy
Index