Software Telemetry shows you how to efficiently collect, store, and analyze system and application log data so you can monitor and improve your systems.
In Software Telemetry you will learn how to:
• Manage toxic telemetry and confidential records
• Master multi-tenant techniques and transformation processes
• Update to improve the statistical validity of your metrics and dashboards
• Make software telemetry emissions easier to parse
• Build easily-auditable logging systems
• Prevent and handle accidental data leaks
• Maintain processes for legal compliance
• Justify increased spend on telemetry software
Software Telemetry teaches you best practices for operating and updating telemetry systems. These vital systems trace, log, and monitor infrastructure by observing and analyzing the events generated by the system. This practical guide is filled with techniques you can apply to any size of organization, with troubleshooting techniques for every eventuality, and methods to ensure your compliance with standards like GDPR.
About the technology
Take advantage of the data generated by your IT infrastructure! Telemetry systems provide feedback on what’s happening inside your data center and applications, so you can efficiently monitor, maintain, and audit them. This practical book guides you through instrumenting your systems, setting up centralized logging, doing distributed tracing, and other invaluable telemetry techniques.
About the book
Software Telemetry shows you how to efficiently collect, store, and analyze system and application log data so you can monitor and improve your systems. Manage the pillars of observability—logs, metrics, and traces—in an end-to-end telemetry system that integrates with your existing infrastructure. You’ll discover how software telemetry benefits both small startups and legacy enterprises. And at a time when data audits are increasingly common, you’ll appreciate the thorough coverage of legal compliance processes, so there’s no reason to panic when a discovery request arrives.
What's inside
• Multi-tenant techniques and transformation processes
• Toxic telemetry and confidential records
• Updates to improve the statistical validity of your metrics and dashboards
• Revisions that make software telemetry emissions easier to parse
About the reader
For software developers and infrastructure engineers supporting and building telemetry systems.
About the author
Jamie Riedesel is a staff engineer at Dropbox with over twenty years of experience in IT.
Author(s): Jamie Riedesel
Edition: 1
Publisher: Manning Publications
Year: 2021
Language: English
Commentary: Vector PDF
Pages: 560
City: Shelter Island, NY
Tags: Software Engineering; Monitoring; Logging; Data Aggregation; Software Architecture; Regular Expressions; SNMP; Legal; Distributed Tracing; Metrics; Multitenancy; Telemetry; Software-as-a-Service
Software Telemetry
brief contents
contents
preface
acknowledgments
about this book
Who should read this book
How this book is organized: A road map
About the code
liveBook discussion forum
Other online resources
about the author
about the cover illustration
Chapter 1: Introduction
1.1 Defining the styles of telemetry
1.1.1 Defining centralized logging
1.1.2 Defining metrics
1.1.3 Defining distributed tracing
1.1.4 Defining SIEM
1.2 How telemetry is consumed by different teams
1.2.1 Telemetry use by Operations, DevOps, and SRE teams
1.2.2 Telemetry use by Security and Compliance teams
1.2.3 Telemetry use by Software Engineering and SRE teams
1.2.4 Telemetry use by Customer Support teams
1.2.5 Telemetry use by business intelligence
1.3 Challenges facing telemetry systems
1.3.1 Chronic underinvestment harms decision-making
1.3.2 Diverse needs resist standardization
1.3.3 Information spills and cleaning them up to avoid legal problems
1.3.4 Court orders break your assumptions
1.4 What you will learn
Part 1: Telemetry system architecture
Chapter 2: The Emitting stage: Creating and submitting telemetry
2.1 Emitting from production code
2.1.1 Emitting telemetry into a log file
2.1.2 Emitting telemetry into the system log
2.1.3 Emitting telemetry into standard output
2.1.4 Formatting telemetry for emissions
2.2 Emitting from hardware
2.2.1 Explaining SNMP
2.2.2 Ingesting telemetry from a Cisco ASA firewall
2.3 Emitting from as-a-Service systems
2.3.1 Emitting events from SaaS systems
2.3.2 Emitting events from IaaS systems
Chapter 3: The Shipping stage: Moving and storing telemetry
3.1 Emitter/shipper functions, telemetry from production code
3.1.1 Shipping directly into storage
3.1.2 Shipping through queues and streams
3.1.3 Shipping to SaaS systems
3.2 Shipping between SaaS systems
3.3 Tipping points in Shipping-stage architecture
Chapter 4: The Shipping stage: Unifying diverse telemetry formats
4.1 Shipping locally-emitted telemetry
4.1.1 Shipping telemetry from a log file
4.1.2 Shipping telemetry from the system logger
4.1.3 Shipping telemetry from standard output
4.2 Unifying diverse emitting formats
4.2.1 Encoding telemetry into strings
4.2.2 Picking a shipping format
4.2.3 Converting Syslog to JSON or other object-encoding formats
4.2.4 Designing with cardinality in mind
Chapter 5: The Presentation stage: Displaying telemetry
5.1 Displaying telemetry in metrics systems
5.1.1 Making pretty pictures with telemetry
5.1.2 Feeding the graphs with aggregation functions
5.1.3 Using aggregations with pdf_pages
5.2 Displaying telemetry in centralized logging systems
5.2.1 Selecting needed features in a display system for centralized logging
5.2.2 Demonstrating centralized logging display
5.3 Displaying telemetry in security systems
5.4 Displaying telemetry distributed tracing systems
5.5 Displaying telemetry in large organizations
Chapter 6: Marking up and enriching telemetry
6.1 Markup in the Emitting stage
6.2 Markup and enrichment in the Shipping stage
6.2.1 Applying context-related telemetry in the Shipping stage
6.2.2 Extracting and enriching telemetry in-flight
6.2.3 Converting field types during the Shipping stage
6.3 Enrichment in the Presentation stage
6.4 How telemetry style affects markup and enrichment
6.4.1 Markup and enrichment with centralized logging
6.4.2 Markup and enrichment with SIEM systems
6.4.3 Markup and enrichment with metrics
6.4.4 Markup and enrichment with distributed tracing systems
Chapter 7: Handling multitenancy
7.1 How multitenant architectures come about
7.1.1 Evolving multitenancy in an early-stage startup
7.1.2 Evolving multitenancy in a culture of free sharing
7.1.3 Evolving multitenancy in a culture of strong separation
7.2 Designing multitenant telemetry systems
7.2.1 Multitenancy in the Shipping stage
7.2.2 Multitenancy in the Presentation stage
Part 2: Use cases revisited: Applying architecture concepts
Chapter 8: Growing cloud-based startup
8.1 Telemetry at the small-company stage
8.1.1 Describing the small company’s telemetry system
8.1.2 Analyzing the small company’s telemetry system
8.2 Telemetry at the medium-size company stage
8.2.1 Describing the medium-size company’s telemetry system
8.2.2 Analyzing the medium-size company’s telemetry system
8.3 Telemetry at the large-company stage
8.3.1 Describing the large company’s telemetry system
8.3.2 Analyzing the large company’s telemetry system
8.4 Telemetry at the enterprise stage
8.5 Looking back at all this growth
Chapter 9: Nonsoftware business
9.1 Telemetry use in small organizations
9.2 Telemetry use in medium-size organizations
9.3 Telemetry use in large organizations
9.4 Telemetry use in enterprise organizations
Chapter 10: Long-established business IT
10.1 Telemetry use in medium-size organizations
10.1.1 Telemetry use in office IT
10.1.2 Telemetry use in production systems
10.2 Telemetry use in large organizations
10.3 Telemetry use in global organizations
10.3.1 Telemetry use in the Booking and Passenger Manifest department
10.3.2 Telemetry use in the Loyalty Programs department
Part 3: Techniques for handling telemetry
Chapter 11: Optimizing for regular expressions at scale
11.1 Anchoring expressions for speed
11.2 Building expressions to fail fast
11.3 Digging into the Cisco ASA firewall telemetry
11.4 Refining emissions to speed regular-expression performance
11.5 Additional regular-expression resources
Chapter 12: Standardized logging and event formats
12.1 Implementing structured logging in your code
12.2 Implementing standards in your code
12.3 Implementing standards in the Shipping stage
Chapter 13: Using more nonfile emitting techniques
13.1 Designing for socket- and datagram-based emitters
13.2 Emitting and shipping for container- and serverless-based code
13.2.1 Emitting and shipping from containerd-based code
13.2.2 Emitting and shipping from serverless-based code
13.3 Encrypting UDP-based telemetry
Chapter 14: Managing cardinality in telemetry
14.1 Identifying cardinality problems
14.1.1 Cardinality in time-series databases
14.1.2 Cardinality in logging databases
14.2 Lowering the cost of cardinality
14.2.1 Use logging standards to contain cardinality
14.2.2 Using storage-side methods to tame cardinality
14.2.3 Make cardinality someone else’s problem
Chapter 15: Ensuring telemetry integrity
15.1 Getting telemetry out of reach of an attacker
15.1.1 Move telemetry too fast to catch
15.1.2 Use ACLs to enforce write-only telemetry
15.1.3 Durable telemetry when using SaaS providers
15.2 Making telemetry harder to mess with
15.2.1 Using access control requirements to defend against attacks
15.2.2 Ensuring configuration integrity in your telemetry systems
15.2.3 Making changes obvious
Chapter 16: Redacting and reprocessing telemetry
16.1 Identifying toxic data and where it comes from
16.2 Redacting toxic information spills
16.3 Reprocessing telemetry to support upgrades
16.4 Isolating toxic data to reduce cleanup costs
Chapter 17: Building policies for telemetry retention and aggregation
17.1 Creating a retention policy
17.1.1 Building a policy for centralized logging
17.1.2 Building a policy for metrics
17.1.3 Building a policy for distributed tracing
17.1.4 Building a policy for SIEM systems
17.2 Creating an aggregation policy
17.3 Using sampling to reduce costs and increase retention
Chapter 18: Surviving legal processes
18.1 Defining the eDiscovery process
18.2 Dealing with records-retention requests
18.2.1 Examining an ELK-based centralized logging system
18.2.2 Examining a Sumo Logic-based centralized logging system
18.3 Dealing with document-production requests
18.3.1 Telemetry in the collection phase
18.3.2 Telemetry in the review phase
18.3.3 Telemetry in the production phase
18.4 Working with lawyers
appendix A: Telemetry storage systems
A.1 Analyzing Elasticsearch
A.1.1 What Elasticsearch is good at
A.1.2 What is challenging for Elasticsearch
A.2 Analyzing Apache Cassandra
A.2.1 What Cassandra is good at
A.2.2 What is challenging for Cassandra
A.3 Analyzing Grafana Labs’ Loki
A.3.1 What Loki is good at
A.3.2 What is challenging for Loki
A.4 Analyzing MongoDB
A.4.1 What MongoDB is good at
A.4.2 What is challenging for MongoDB
A.5 Analyzing Prometheus
A.5.1 What Prometheus is good at
A.5.2 What is challenging for Prometheus
A.6 Analyzing InfluxDB
A.6.1 What InfluxDB is good at
A.6.2 What is challenging for InfluxDB
A.7 Analyzing Jaeger
A.7.1 What Jaeger is good at
A.7.2 What is challenging for Jaeger
appendix B: Recommendation checklist reference
B.1 Telemetry standards, structure, and setting policies
B.2 Presentation-stage recommendations
B.3 Cardinality management
B.4 Telemetry safety and effects
B.5 Legal topics
appendix C: Exercise answers
index
Numerics
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z