Mastering Kafka Streams and ksqlDB: Building Real-Time Data Systems by Example

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Working with unbounded and fast-moving data streams has historically been difficult. But with Kafka Streams and ksqlDB, building stream processing applications is easy and fun. This practical guide shows data engineers how to use these tools to build highly scalable stream processing applications for moving, enriching, and transforming large amounts of data in real time. Mitch Seymour, data services engineer at Mailchimp, explains important stream processing concepts against a backdrop of several interesting business problems. You'll learn the strengths of both Kafka Streams and ksqlDB to help you choose the best tool for each unique stream processing project. Non-Java developers will find the ksqlDB path to be an especially gentle introduction to stream processing. • Learn the basics of Kafka and the pub/sub communication pattern • Build stateless and stateful stream processing applications using Kafka Streams and ksqlDB • Perform advanced stateful operations, including windowed joins and aggregations • Understand how stateful processing works under the hood • Learn about ksqlDB's data integration features, powered by Kafka Connect • Work with different types of collections in ksqlDB and perform push and pull queries • Deploy your Kafka Streams and ksqlDB applications to production

Author(s): Mitch Seymour
Edition: 1
Publisher: O'Reilly Media
Year: 2021

Language: English
Commentary: Vector PDF
Pages: 434
City: Sebastopol, CA
Tags: SQL; Relational Databases; Monitoring; Stream Processing; Apache Kafka; Deployment; Data Modeling; Testing; Stateless Applications; Data Integration; Data Engineering; Stateful Applications; ksqlDB

Cover
Copyright
Table of Contents
Foreword
Preface
Who Should Read This Book
Navigating This Book
Source Code
Kafka Streams Version
ksqlDB Version
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Part I. Kafka
Chapter 1. A Rapid Introduction to Kafka
Communication Model
How Are Streams Stored?
Topics and Partitions
Events
Kafka Cluster and Brokers
Consumer Groups
Installing Kafka
Hello, Kafka
Summary
Part II. Kafka Streams
Chapter 2. Getting Started with Kafka Streams
The Kafka Ecosystem
Before Kafka Streams
Enter Kafka Streams
Features at a Glance
Operational Characteristics
Scalability
Reliability
Maintainability
Comparison to Other Systems
Deployment Model
Processing Model
Kappa Architecture
Use Cases
Processor Topologies
Sub-Topologies
Depth-First Processing
Benefits of Dataflow Programming
Tasks and Stream Threads
High-Level DSL Versus Low-Level Processor API
Introducing Our Tutorial: Hello, Streams
Project Setup
Creating a New Project
Adding the Kafka Streams Dependency
DSL
Processor API
Streams and Tables
Stream/Table Duality
KStream, KTable, GlobalKTable
Summary
Chapter 3. Stateless Processing
Stateless Versus Stateful Processing
Introducing Our Tutorial: Processing a Twitter Stream
Project Setup
Adding a KStream Source Processor
Serialization/Deserialization
Building a Custom Serdes
Defining Data Classes
Implementing a Custom Deserializer
Implementing a Custom Serializer
Building the Tweet Serdes
Filtering Data
Branching Data
Translating Tweets
Merging Streams
Enriching Tweets
Avro Data Class
Sentiment Analysis
Serializing Avro Data
Registryless Avro Serdes
Schema Registry–Aware Avro Serdes
Adding a Sink Processor
Running the Code
Empirical Verification
Summary
Chapter 4. Stateful Processing
Benefits of Stateful Processing
Preview of Stateful Operators
State Stores
Common Characteristics
Persistent Versus In-Memory Stores
Introducing Our Tutorial: Video Game Leaderboard
Project Setup
Data Models
Adding the Source Processors
KStream
KTable
GlobalKTable
Registering Streams and Tables
Joins
Join Operators
Join Types
Co-Partitioning
Value Joiners
KStream to KTable Join (players Join)
KStream to GlobalKTable Join (products Join)
Grouping Records
Grouping Streams
Grouping Tables
Aggregations
Aggregating Streams
Aggregating Tables
Putting It All Together
Interactive Queries
Materialized Stores
Accessing Read-Only State Stores
Querying Nonwindowed Key-Value Stores
Local Queries
Remote Queries
Summary
Chapter 5. Windows and Time
Introducing Our Tutorial: Patient Monitoring Application
Project Setup
Data Models
Time Semantics
Timestamp Extractors
Included Timestamp Extractors
Custom Timestamp Extractors
Registering Streams with a Timestamp Extractor
Windowing Streams
Window Types
Selecting a Window
Windowed Aggregation
Emitting Window Results
Grace Period
Suppression
Filtering and Rekeying Windowed KTables
Windowed Joins
Time-Driven Dataflow
Alerts Sink
Querying Windowed Key-Value Stores
Summary
Chapter 6. Advanced State Management
Persistent Store Disk Layout
Fault Tolerance
Changelog Topics
Standby Replicas
Rebalancing: Enemy of the State (Store)
Preventing State Migration
Sticky Assignment
Static Membership
Reducing the Impact of Rebalances
Incremental Cooperative Rebalancing
Controlling State Size
Deduplicating Writes with Record Caches
State Store Monitoring
Adding State Listeners
Adding State Restore Listeners
Built-in Metrics
Interactive Queries
Custom State Stores
Summary
Chapter 7. Processor API
When to Use the Processor API
Introducing Our Tutorial: IoT Digital Twin Service
Project Setup
Data Models
Adding Source Processors
Adding Stateless Stream Processors
Creating Stateless Processors
Creating Stateful Processors
Periodic Functions with Punctuate
Accessing Record Metadata
Adding Sink Processors
Interactive Queries
Putting It All Together
Combining the Processor API with the DSL
Processors and Transformers
Putting It All Together: Refactor
Summary
Part III. ksqlDB
Chapter 8. Getting Started with ksqlDB
What Is ksqlDB?
When to Use ksqlDB
Evolution of a New Kind of Database
Kafka Streams Integration
Connect Integration
How Does ksqlDB Compare to a Traditional SQL Database?
Similarities
Differences
Architecture
ksqlDB Server
ksqlDB Clients
Deployment Modes
Interactive Mode
Headless Mode
Tutorial
Installing ksqlDB
Running a ksqlDB Server
Precreating Topics
Using the ksqlDB CLI
Summary
Chapter 9. Data Integration with ksqlDB
Kafka Connect Overview
External Versus Embedded Connect
External Mode
Embedded Mode
Configuring Connect Workers
Converters and Serialization Formats
Tutorial
Installing Connectors
Creating Connectors with ksqlDB
Showing Connectors
Describing Connectors
Dropping Connectors
Verifying the Source Connector
Interacting with the Kafka Connect Cluster Directly
Introspecting Managed Schemas
Summary
Chapter 10. Stream Processing Basics with ksqlDB
Tutorial: Monitoring Changes at Netflix
Project Setup
Source Topics
Data Types
Custom Types
Collections
Creating Source Collections
With Clause
Working with Streams and Tables
Showing Streams and Tables
Describing Streams and Tables
Altering Streams and Tables
Dropping Streams and Tables
Basic Queries
Insert Values
Simple Selects (Transient Push Queries)
Projection
Filtering
Flattening/Unnesting Complex Structures
Conditional Expressions
Coalesce
IFNULL
Case Statements
Writing Results Back to Kafka (Persistent Queries)
Creating Derived Collections
Putting It All Together
Summary
Chapter 11. Intermediate and Advanced Stream Processing with ksqlDB
Project Setup
Bootstrapping an Environment from a SQL File
Data Enrichment
Joins
Windowed Joins
Aggregations
Aggregation Basics
Windowed Aggregations
Materialized Views
Clients
Pull Queries
Curl
Push Queries
Push Queries via Curl
Functions and Operators
Operators
Showing Functions
Describing Functions
Creating Custom Functions
Additional Resources for Custom ksqlDB Functions
Summary
Part IV. The Road to Production
Chapter 12. Testing, Monitoring, and Deployment
Testing
Testing ksqlDB Queries
Testing Kafka Streams
Behavioral Tests
Benchmarking
Kafka Cluster Benchmarking
Final Thoughts on Testing
Monitoring
Monitoring Checklist
Extracting JMX Metrics
Deployment
ksqlDB Containers
Kafka Streams Containers
Container Orchestration
Operations
Resetting a Kafka Streams Application
Rate-Limiting the Output of Your Application
Upgrading Kafka Streams
Upgrading ksqlDB
Summary
Appendix A. Kafka Streams Configuration
Configuration Management
Configuration Properties
Consumer-Specific Configurations
Appendix B. ksqlDB Configuration
Query Configurations
Server Configurations
Security Configurations
Index
About the Author
Colophon