Fundamentals of Data Observability: Implement Trustworthy End-to-End Data Solutions

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work. Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. • Learn the core principles and benefits of data observability • Use data observability to detect, troubleshoot, and prevent data issues • Follow the book's recipes to implement observability in your data projects • Use data observability to create a trustworthy communication framework with data consumers • Learn how to educate your peers about the benefits of data observability

Author(s): Andy Petrella
Edition: 1
Publisher: O'Reilly Media
Year: 2023

Language: English
Commentary: Publisher's PDF | Published: August 2023 | Revision History: 2023-08-11: First Release
Pages: 267
City: Sebastopol, CA
Tags: Data Ingestion; Data Architecture; Data Observability

Copyright
Table of Contents
Preface
Overview of the Book
Who Should Read This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Part I. Introducing Data Observability
Chapter 1. Introducing Data Observability
Scaling Data Teams
Challenges of Scaling Data Teams
Segregated Roles and Responsibilities and Organizational Complexity
Anatomy of Data Issues and Consequences
Impact of Data Issues on Data Team Dynamics
Scaling AI Roadblocks
Challenges with Current Data Management Practices
Effects of Data Governance at Scale
Data Observability to the Rescue
The Areas of Observability
How Data Teams Can Leverage Data Observability Now
Low Latency Data Issues Detection
Efficient Data Issues Troubleshooting
Preventing Data Issues
Decentralized Data Quality Management
Complementing Existing Data Governance Capabilities
The Future and Beyond
Conclusion
Chapter 2. Components of Data Observability
Channels of Data Observability Information
Logs
Traces
Metrics
Observations Model
Physical Space
Server
User
Static Space
Dynamic Space
Expectations
Rules
Automatic Anomaly Detection
Prevent Garbage In, Garbage Out
Conclusion
Chapter 3. Roles of Data Observability in a Data Organization
Data Architecture
Where Does Data Observability Fit in a Data Architecture?
Data Architecture with Data Observability
How Data Observability Helps with Data Engineering Undercurrents
Security
Data Management
Support for Data Mesh’s Data as Products
Conclusion
Part II. Implementing Data Observability
Chapter 4. Generate Data Observations
At the Source
Generating Data Observations at the Source
Low-Level API in Python
Description of the Data Pipeline
Definition of the Status of the Data Pipeline
Data Observations for the Data Pipeline
Generate Contextual Data Observations
Generate Data-Related Observations
Generate Lineage-Related Data Observations
Wrap-Up: The Data-Observable Data Pipeline
Using Data Observations to Address Failures of the Data Pipeline
Conclusion
Chapter 5. Automate the Generation of Data Observations
Abstraction Strategies
Event Listeners
Aspect-Oriented Programming
High-Level Applications
No-Code Applications
Low-Code Applications
Differences Among Monitoring Alternatives
Conclusion
Chapter 6. Implementing Expectations
Introducing Expectations
Shift-Left Data Quality
Corner Cases Discovery
Lifting Service Level Indicators
Using Data Profilers
Maintaining Expectations
Overarching Practices
Fail Fast and Fail Safe
Simplify Tests and Extend CI/CD
Conclusion
Part III. Data Observability in Action
Chapter 7. Integrating Data Observability in Your Data Stack
Ingestion Stage
Ingestion Stage Data Observability Recipes
Airbyte Agent
Transformation
Transformation Stage Data Observability Recipes
Apache Spark
dbt Agent
Serving
Recipes
BigQuery in Python
Orchestrated SQL with Airflow
Analytics
Machine Learning Recipes
Business Intelligence Recipes
Conclusion
Chapter 8. Making Opaque Systems Translucent
Data Translucence
Opaque Systems
SaaS
Don’t Touch It; It (Kinda) Works
Inherited Systems
Strategies for Data Translucence
Strategies
The Data Observability Connector
Example: Building a dbt Data Observability Connector (SaaS)
Conclusion
Afterword: Future Observations
Unification of Processing
Generative Milestones
Trustable Expanded Creativity
Conclusion
Index
About the Author
Colophon