"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Recent shifts in the data landscape—including the emergence of lakehouse architectures and the rising importance of high-scale real-time data—mean that today's data practitioners must approach ETL a bit differently.
This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You will be equipped to make informed decisions when implementing ETL and choose the technology stack that will help you succeed.
Author(s): Matt Palmer
Publisher: O’Reilly Media, Inc.
Year: 2023
Language: English
City: Sebastopol
Preface
The Bread and Butter of Data Engineering
The Brave New World of AI
A Changing Data Landscape
What About ELT (and Other Flavors)?
1. Data Ingestion
Data Ingestion : Now vs. Then
Sources and Targets
The Source
Examining sources
Source Checklist
The Destination
Examining Destinations
Staging Ingested Data
Change Data Capture (CDC)
Destination Checklist
Ingestion Considerations
Frequency
Batch
Micro Batch
Streaming
Methods
Message Services
Stream Processing Engines
Simplifying Stream Processing
Payload
Volume
Structure and Shape
Unstructured
Semi-structured
Structured
Format
Variety
Choosing a Solution
Declarative Solutions
Cost to build/maintain
Extensibility
Cost to switch
Imperative Solutions
Extensibility
Cost to build/maintain
Cost to switch
Hybrid Solutions
Data Ingestion Checklist