Understanding ETL: Data Pipelines for Modern Data Architectures (Early Release)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Recent shifts in the data landscape—including the emergence of lakehouse architectures and the rising importance of high-scale real-time data—mean that today's data practitioners must approach ETL a bit differently. This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You will be equipped to make informed decisions when implementing ETL and choose the technology stack that will help you succeed.

Author(s): Matt Palmer
Publisher: O’Reilly Media, Inc.
Year: 2023

Language: English
City: Sebastopol

Preface
The Bread and Butter of Data Engineering
The Brave New World of AI
A Changing Data Landscape
What About ELT (and Other Flavors)?
1. Data Ingestion
Data Ingestion : Now vs. Then
Sources and Targets
The Source
Examining sources
Source Checklist
The Destination
Examining Destinations
Staging Ingested Data
Change Data Capture (CDC)
Destination Checklist
Ingestion Considerations
Frequency
Batch
Micro Batch
Streaming
Methods
Message Services
Stream Processing Engines
Simplifying Stream Processing
Payload
Volume
Structure and Shape
Unstructured
Semi-structured
Structured
Format
Variety
Choosing a Solution
Declarative Solutions
Cost to build/maintain
Extensibility
Cost to switch
Imperative Solutions
Extensibility
Cost to build/maintain
Cost to switch
Hybrid Solutions
Data Ingestion Checklist