Combing the web is simple, but how do you search for data at work? It's difficult and time-consuming, and can sometimes seem impossible. This book introduces a practical solution: the data catalog. Data analysts, data scientists, and data engineers will learn how to create true data discovery in their organizations, making the catalog a key enabler for data-driven innovation and data governance.
Author Ole Olesen-Bagneux explains the benefits of implementing a data catalog. You'll learn how to organize data for your catalog, search for what you need, and manage data within the catalog. Written from a data management perspective and from a library and information science perspective, this book helps you:
- Learn what a data catalog is and how it can help your organization
- Organize data and its sources into domains and describe them with metadata
- Search data using very simple-to-complex search techniques and learn to browse in domains, data lineage, and graphs
- Manage the data in your company via a data catalog
- Implement a data catalog in a way that exactly matches the strategic priorities of your organization
- Understand what the future has in store for data catalogs
Author(s): Ole Olesen-Bagneux
Edition: 1
Publisher: O'Reilly Media
Year: 2023
Language: English
Commentary: Revision History for the First Edition: 2023-02-15: First Release
Pages: 216
City: Sebastopol, CA
Tags: Data Catalogs; Organizing Data; Data Discovery; Access Data
Foreword
Preface
Who Should Read This Book
Navigating This Book
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
I. Organizing Data So You Can Search for It
1. Introduction to Data Catalogs
The Core Functionality of a Data Catalog
Create an Overview of the IT Landscape
Organize Data
Enable Search of Company Data
Data Discovery
The Data Discovery Team
Data Architects
Data Engineers
Data Discovery Team Setup
End-User Roles and Responsibilities
Summary
2. Organize Data: Design a Robust Architecture for Search
Organizing Domains in the Data Catalog
Domain Architecture in a Data Catalog
Understanding Domains
Processes and Capabilities
Data Sources
Getting Assets into the Data Catalog
Pull
Push
Organizing Assets in the Domains
Asset Metadata
Metadata Quality
Classification
Summary
3. Understand Search: Concepts, Features, and Mechanics
Why Do You Search in a Data Catalog?
Search Features in a Data Catalog
Searching in Data Versus Searching for Data
How Do You Search a Data Catalog?
Data Catalog Query Language
The Search Features in a Data Catalog Explained
Searching for Everything?
The Mechanics of Search
Recall and Precision
Zipf’s Law
Serendipity
Summary
4. Apply Search: From Simple to Advanced Patterns
Search Like Librarians—Not Like Data Scientists
Search Patterns
Basic Simple Search
Detailed Simple Search
Flexible Simple Search
Range Search
Block Search
Statement Search
Browsing Patterns
Glossary Browsing
Domain Browsing
Lineage Browsing
Graph Browsing
Searching a Graph-Based Data Catalog
Summary
II. Democratizing Data with a Data Catalog
5. Discover Data: Empower End Users and Engage Stakeholders
A Data Catalog Is a Social Network
Active Metadata
Ensure Stakeholder Engagement
Engage Data Governance Leaders
Engage Data Analytics Leaders
Engage Domain Leaders
Seeing All Data Through One Lens
The Operational Backbone and the Data Platform
Summary
6. Access Data: The Keys to Successful Implementation
Choosing a Data Catalog
Vendor Analysis
Some Key Vendors
Catalog of Catalogs
How to Access Data
Data Providers and Data Consumers
Centralized Approach
Decentralized Approach
Combined Approach
Building Domains
Questionnaire No. 1: Domain Owner Description of Domain and Assets
Questionnaire No. 2: Asset Steward Description of Assets in the Domain
Questionnaire No. 3: Asset Steward Description of the Glossary Terms of Their Assets
Summary
7. Manage Data: Improve Lifecycle Management
The Value of Data Lifecycle Management and Why the Data Catalog Is a Game Changer
Various Lifecycles
Data Lifecycle
Using the Data Catalog for Data Lifecycle Management
The Data Asset Lifecycle in the Data Catalog
Glossary Term Lifecycle
Data Source Lifecycle
Lifecycle Influence and Support
Applied Search Based on Lifecycles
Applied Search for Regulatory Compliance
Maintenance Best Practices
Maintenance of the Data Outside the Data Catalog
Maintenance of Metadata Inside the Data Catalog
Improved Data Lifecycle Management
Summary
III. Envisioning the Future of Data Catalogs
8. Looking Ahead: The Company Search Engine and Improved Data Management
The Company Search Engine
The Company Search Engine in Hugin & Munin
From Data to Knowledge
A Medium Theoretical Take on the Company Search Engine
Is the Company Search Engine New?
Will the Company Search Engine Become Reality?
Summary
Afterword
Consider Implementing a Data Catalog
Follow Me
Appendix. Data Catalog Query Language
Index
About the Author