Kafka Connect: Build and Run Data Pipelines

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Used by more than 80% of Fortune 100 companies, Apache Kafka has become the de facto event streaming platform. Kafka Connect is a key component of Kafka that lets you flow data between your existing systems and Kafka to process data in real time. With this practical guide, authors Mickael Maison and Kate Stanley show data engineers, site reliability engineers, and application developers how to build data pipelines between Kafka clusters and a variety of data sources and sinks. Kafka Connect allows you to quickly adopt Kafka by tapping into existing data and enabling many advanced use cases. No matter where you are in your event streaming journey, Kafka Connect is the ideal tool for building a modern data pipeline. • Learn Kafka Connect's capabilities, main concepts, and terminology • Design data and event streaming pipelines that use Kafka Connect • Configure and operate Kafka Connect environments at scale • Deploy secured and highly available Kafka Connect clusters • Build sink and source connectors and single message transforms and converters

Author(s): Mickael Maison, Kate Stanley
Publisher: O’Reilly Media, Inc.
Year: 2023

Language: English
Pages: 403

Cover
Copyright
Table of Contents
Foreword
Preface
Who Should Read This Book
Kafka Versions
Navigating This Book
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgements
Part I. Introduction to Kafka Connect
Chapter 1. Meet Kafka Connect
Kafka Connect Features
Pluggable Architecture
Scalability and Reliability
Declarative Pipeline Definition
Part of Apache Kafka
Use Cases
Capturing Database Changes
Mirroring Kafka Clusters
Building Data Lakes
Aggregating Logs
Modernizing Legacy Systems
Alternatives to Kafka Connect
Summary
Chapter 2. Apache Kafka Basics
A Distributed Event Streaming Platform
Open Source
Distributed
Event Streaming
Platform
Kafka Concepts
Publish-Subscribe
Brokers and Records
Topics and Partitions
Replication
Retention and Compaction
KRaft and ZooKeeper
Interacting with Kafka
Producers
Consumers
Kafka Streams
Getting Started with Kafka
Starting Kafka
Sending and Receiving Records
Running a Kafka Streams Application
Summary
Part II. Developing Data Pipelines with Kafka Connect
Chapter 3. Components in a Kafka Connect Data Pipeline
Kafka Connect Runtime
Running Kafka Connect
Kafka Connect REST API
Installing Plug-Ins
Deployment Modes
Source and Sink Connectors
Connectors and Tasks
Configuring Connectors
Running Connectors
Converters
Data Format and Schemas
Configuring Converters
Using Converters
Transformations and Predicates
Transformation Use Cases
Predicates
Configuring Transformations and Predicates
Using Transformations and Predicates
Summary
Chapter 4. Designing Effective Data Pipelines
Choosing a Connector
Pipeline Direction
Licensing and Support
Connector Features
Defining Data Models
Data Transformation
Mapping Data Between Systems
Formatting Data
Data Formats
Schemas
Exploring Kafka Connect Internals
Internal Topics
Group Membership
Rebalance Protocols
Handling Failures in Kafka Connect
Worker Failure
Connector/Task Failure
Kafka/External Systems Failure
Dead Letter Queues
Understanding Processing Semantics
Sink Connectors
Source Connectors
Summary
Chapter 5. Connectors in Action
Confluent S3 Sink Connector
Configuring the Connector
Exactly-Once Semantics
Running the Connector
Confluent JDBC Source Connector
Configuring the Connector
Running the Connector
Debezium MySQL Source Connector
Configuring the Connector
Event Formats
Running the Connector
Summary
Chapter 6. Mirroring Clusters with MirrorMaker
Introduction to Mirroring
Exploring Mirroring Use Cases
Mirroring in Practice
Introduction to MirrorMaker
Common Concepts
Deployment Modes
MirrorMaker Connectors
MirrorSourceConnector
MirrorCheckpointConnector
MirrorHeartbeatConnector
Running MirrorMaker
Disaster Recovery Example
Geo-Replication Example
Summary
Part III. Running Kafka Connect in Production
Chapter 7. Deploying and Operating Kafka Connect Clusters
Preparing the Kafka Connect Environment
Building a Kafka Connect Environment
Installing Plug-Ins
Networking and Permissions
Worker Plug-Ins
Configuration Providers
REST Extensions
Connector Client Configuration Override Policies
Sizing and Planning Capacity
Understanding Kafka Connect Resource Utilization
How Many Workers and Tasks?
Operating Kafka Connect Clusters
Adding Workers
Removing Workers
Upgrading and Applying Maintenance to Workers
Restarting Failed Tasks and Connectors
Resetting Offsets of Connectors
Administering Kafka Connect Using the REST API
Creating and Deleting a Connector
Connector and Task Configuration
Controlling the Lifecycle of Connectors
Listing Connector Offsets
Debugging Issues
Summary
Chapter 8. Configuring Kafka Connect
Configuring the Runtime
Configurations for Production
Fine-Tuning Configurations
Configuring Connectors
Topic Configurations
Client Overrides
Configurations for Exactly-Once
Configurations for Error Handling
Configuring Kafka Connect Clusters for Security
Securing the Connection to Kafka
Configuring Permissions
Securing the REST API
Summary
Chapter 9. Monitoring Kafka Connect
Monitoring Logs
Logging Configuration
Understanding Startup Logs
Analyzing Logs
Monitoring Metrics
Metrics Reporters
Analyzing Metrics
Exploring Metrics
Key Metrics
Kafka Connect Runtime Metrics
Other System Metrics
Summary
Chapter 10. Administering Kafka Connect on Kubernetes
Introduction to Kubernetes
Virtualization Technologies
Kubernetes Fundamentals
Running Kafka Connect on Kubernetes
Container Image
Deploying Workers
Networking and Monitoring
Configuration
Using a Kubernetes Operator to Deploy Kafka Connect
Introduction to Kubernetes Operators
Kubernetes Operators for Kafka Connect
Strimzi
Getting a Kubernetes Environment
Starting the Operator
Kafka Connect CRDs
Deploying a Kafka Connect Cluster and Connectors
MirrorMaker CRD
Summary
Part IV. Building Custom Connectors and Plug-Ins
Chapter 11. Building Source and Sink Connectors
Common Concepts and APIs
Building a Custom Connector
The Connector API
Configurations
The Task API
Kafka Connect Records
The ConnectorContext API
Implementing Source Connectors
The SourceTask API
Source Records
The SourceConnectorContext and SourceTaskContext APIs
Exactly-Once Support
Implementing Sink Connectors
The SinkTask API
Sink Records
The SinkConnectorContext and SinkTaskContext APIs
Summary
Chapter 12. Extending Kafka Connect with Connector and Worker Plug-Ins
Implementing Connector Plug-Ins
The Transformation API
The Predicate API
The Converter and HeaderConverter APIs
Implementing Worker Plug-Ins
The ConfigProvider API
The ConnectorClientConfigOverridePolicy API
The ConnectRestExtension APIs
Summary
Index
About the Authors
Colophon