The Enterprise Data Catalog: Improve Data Discovery, Ensure Data Governance, and Enable Innovation

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Combing the web is simple, but how do you search for data at work? It's difficult and time-consuming, and can sometimes seem impossible. This book introduces a practical solution: the data catalog. Data analysts, data scientists, and data engineers will learn how to create true data discovery in their organizations, making the catalog a key enabler for data-driven innovation and data governance. Author Ole Olesen-Bagneux explains the benefits of implementing a data catalog. You'll learn how to organize data for your catalog, search for what you need, and manage data within the catalog. Written from a data management perspective and from a library and information science perspective, this book helps you: • Learn what a data catalog is and how it can help your organization • Organize data and its sources into domains and describe them with metadata • Search data using very simple-to-complex search techniques and learn to browse in domains, data lineage, and graphs • Manage the data in your company via a data catalog • Implement a data catalog in a way that exactly matches the strategic priorities of your organization • Understand what the future has in store for data catalogs

Author(s): Ole Olesen-Bagneux
Edition: 1
Publisher: O'Reilly Media
Year: 2023

Language: English
Commentary: Publisher's PDF. Revision History for the First Edition: 2023-02-15: First Release
Pages: 216
City: Sebastopol, CA
Tags: Innovations; Searching; Data Management; Data Governance; Data Graphs; Data Architecture; Data Catalog; Data Discovery; Data Lifecycle

Cover
Copyright
Table of Contents
Foreword
Preface
Who Should Read This Book
Navigating This Book
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Part I. Organizing Data So You Can Search for It
Chapter 1. Introduction to Data Catalogs
The Core Functionality of a Data Catalog
Create an Overview of the IT Landscape
Organize Data
Enable Search of Company Data
Data Discovery
The Data Discovery Team
Data Architects
Data Engineers
Data Discovery Team Setup
End-User Roles and Responsibilities
Summary
Chapter 2. Organize Data: Design a Robust Architecture for Search
Organizing Domains in the Data Catalog
Domain Architecture in a Data Catalog
Understanding Domains
Processes and Capabilities
Data Sources
Getting Assets into the Data Catalog
Pull
Push
Organizing Assets in the Domains
Asset Metadata
Metadata Quality
Classification
Summary
Chapter 3. Understand Search: Concepts, Features, and Mechanics
Why Do You Search in a Data Catalog?
Search Features in a Data Catalog
Searching in Data Versus Searching for Data
How Do You Search a Data Catalog?
Data Catalog Query Language
The Search Features in a Data Catalog Explained
Searching for Everything?
The Mechanics of Search
Recall and Precision
Zipf’s Law
Serendipity
Summary
Chapter 4. Apply Search: From Simple to Advanced Patterns
Search Like Librarians—Not Like Data Scientists
Search Patterns
Basic Simple Search
Detailed Simple Search
Flexible Simple Search
Range Search
Block Search
Statement Search
Browsing Patterns
Glossary Browsing
Domain Browsing
Lineage Browsing
Graph Browsing
Searching a Graph-Based Data Catalog
Summary
Part II. Democratizing Data with a Data Catalog
Chapter 5. Discover Data: Empower End Users and Engage Stakeholders
A Data Catalog Is a Social Network
Active Metadata
Ensure Stakeholder Engagement
Engage Data Governance Leaders
Engage Data Analytics Leaders
Engage Domain Leaders
Seeing All Data Through One Lens
The Operational Backbone and the Data Platform
Summary
Chapter 6. Access Data: The Keys to Successful Implementation
Choosing a Data Catalog
Vendor Analysis
Some Key Vendors
Catalog of Catalogs
How to Access Data
Data Providers and Data Consumers
Centralized Approach
Decentralized Approach
Combined Approach
Building Domains
Questionnaire No. 1: Domain Owner Description of Domain and Assets
Questionnaire No. 2: Asset Steward Description of Assets in the Domain
Questionnaire No. 3: Asset Steward Description of the Glossary Terms of Their Assets
Summary
Chapter 7. Manage Data: Improve Lifecycle Management
The Value of Data Lifecycle Management and Why the Data Catalog Is a Game Changer
Various Lifecycles
Data Lifecycle
Using the Data Catalog for Data Lifecycle Management
The Data Asset Lifecycle in the Data Catalog
Glossary Term Lifecycle
Data Source Lifecycle
Lifecycle Influence and Support
Applied Search Based on Lifecycles
Applied Search for Regulatory Compliance
Maintenance Best Practices
Maintenance of the Data Outside the Data Catalog
Maintenance of Metadata Inside the Data Catalog
Improved Data Lifecycle Management
Summary
Part III. Envisioning the Future of Data Catalogs
Chapter 8. Looking Ahead: The Company Search Engine and Improved Data Management
The Company Search Engine
The Company Search Engine in Hugin & Munin
From Data to Knowledge
A Medium Theoretical Take on the Company Search Engine
Is the Company Search Engine New?
Will the Company Search Engine Become Reality?
Summary
Afterword
Consider Implementing a Data Catalog
Follow Me
Appendix. Data Catalog Query Language
Index
About the Author
Colophon