Mastering Apache Pulsar: Cloud Native Event Streaming at Scale

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Every enterprise application creates data, including log messages, metrics, user activity, and outgoing messages. Learning how to move these items is almost as important as the data itself. If you’re an application architect, developer, or production engineer new to Apache Pulsar, this practical guide shows you how to use this open source event streaming platform to handle real-time data feeds. Jowanza Joseph, staff software engineer at Finicity, explains how to deploy production Pulsar clusters, write reliable event streaming applications, and build scalable real-time data pipelines with this platform. Through detailed examples, you’ll learn Pulsar’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the load manager, and the storage layer. This book helps you: • Understand how event streaming fits in the big data ecosystem • Explore Pulsar producers, consumers, and readers for writing and reading events • Build scalable data pipelines by connecting Pulsar with external systems • Simplify event-streaming application building with Pulsar Functions • Manage Pulsar to perform monitoring, tuning, and maintenance tasks • Use Pulsar’s operational measurements to secure a production cluster • Process event streams using Flink and query event streams using Presto

Author(s): Jowanza Joseph
Edition: 1
Publisher: O'Reilly Media
Year: 2021

Language: English
Commentary: Vector PDF
Pages: 240
City: Sebastopol, CA
Tags: Java; Stream Processing; Messages; Real-Time Systems; Event Brokers; Apache Pulsar; Apache BookKeeper; Apache ZooKeeper; Pulsar SQL

Cover
Copyright
Table of Contents
Preface
Why I Wrote This Book
Who This Book Is For
How I Organized This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. The Value of Real-Time Messaging
Data in Motion
Resource Efficiency
Interesting Applications
Banking
Medical
Security
Internet of Things
Summary
Chapter 2. Event Streams and Event Brokers
Publish/Subscribe
Queues
Failure Modes
Push Versus Poll
The Need for Pulsar
Unification
Modularity
Performance
Summary
Chapter 3. Pulsar
Origins of Pulsar
Pulsar Design Principles
Multitenancy
Geo-Replication
Performance
Modularity
Pulsar Ecosystem
Pulsar Functions
Pulsar IO
Pulsar SQL
Pulsar Success Stories
Yahoo! JAPAN
Splunk
Iterable
Summary
Chapter 4. Pulsar Internals
Brokers
Message Cache
BookKeeper and ZooKeeper Communication
Schema Validation
Inter-Broker Communication
Pulsar Functions and Pulsar IO
Apache BookKeeper
Write-Ahead Logging
Message Storing
Object/Blob Storage
Pravega
Majordodo
Apache ZooKeeper
Naming Service
Configuration Management
Leader Election
Notification System
Apache Kafka
Apache Druid
Pulsar Proxy
Java Virtual Machine (JVM)
Netty
Apache Spark
Apache Lucene
Summary
Chapter 5. Consumers
What Does It Mean to Be a Consumer?
Subscriptions
Exclusive
Shared
Key_Shared
Failover
Acknowledgments
Individual Ack
Cumulative Ack
Schemas
Consumer Schema Management
Consumption Modes
Batching
Chunking
Advanced Configuration
Delayed Messages
Retention Policy
Backlog Quota
Configuring a Consumer
Replay
Dead Letter Topics
Retry Letter Topics
Summary
Chapter 6. Producers
Synchronous Producers
Asynchronous Producers
Producer Routing
Round-Robin Routing
Single Partition Routing
Custom Partition Routing 
Producer Configuration
topicName
producerName
sendTimeoutMs
blockIfQueueFull
maxPendingMessages
maxPendingMessagesAcrossPartitions
messageRoutingMode
hashingScheme
cryptoFailureAction
batchingMaxPublishDelayMicros
batchingMaxMessages
batchingEnabled
compressionType
Schema on Write
Using the Schema Registry
Nonpersistent Topics
Use Cases
Using Nonpersistent Topics
Transactions
Summary
Chapter 7. Pulsar IO
Pulsar IO Architecture
Runtime
Performance Considerations
Use Cases
Simple Event Processing Pipelines
Change Data Capture
Considerations
Message Serialization
Pipeline Stability
Failure Handling
Examples
Elasticsearch
Netty
Writing Your Connector
TimescaleDB
Summary
Chapter 8. Pulsar Functions
Stream Processing
Pulsar Functions Architecture
Runtime
Isolation
Isolation with Kubernetes Function Deployments
Use Cases
Creating Pulsar Functions
Simple Event Processing
Topic Hygiene
Topic Accounting
Summary
Chapter 9. Tiered Storage
Storing Data in the Cloud
Object Storage
Use Cases
Replication
CQRS
Disaster Recovery
Offloading Data
Pulsar Offloaders
Retrieving Offloaded Data
Interacting with Object Store Data
Repopulating Topics
Utilizing Pulsar Client
Summary
Chapter 10. Pulsar SQL
Streams as Tables
SQL-on-Anything Engines
Apache Flink: An Alternative Perspective
Presto/Trino
How Pulsar SQL Works
Configuring Pulsar SQL
Performance Considerations
Summary
Chapter 11. Deploying Pulsar
Docker
Bare Metal
Minimum Requirements
Getting Started
Deploying ZooKeeper
Starting BookKeeper
Starting Pulsar
Public Cloud Providers
AWS
Azure
Google Cloud Platform
Kubernetes
Summary
Chapter 12. Operating Pulsar
Apache BookKeeper Metrics
Server Metrics
Journal Metrics
Storage Metrics
Apache ZooKeeper Metrics
Server Metrics
Request Metrics
Topic Metrics
Consumer Metrics
Pulsar Transaction Metrics
Pulsar Function Metrics
Advanced Operating Techniques
Interceptors and Tracing
Pulsar SQL Metrics
Metrics Forwarding
Dashboards
Summary
Chapter 13. The Future
Programming Language Support
Extension Interface
Enhancements to Pulsar Functions
Architectural Simplification/Expansion
Messaging Platform Bridges
Summary
Appendix A. Pulsar Admin API
Use Cases
Examples
Creating a Partitioned Topic
Deleting a Partitioned Topic
Creating a Namespace with Specific Policies
Deleting a Namespace
Summary
Appendix B. Pulsar Admin CLI
CLI API
Examples
Creating a Partitioned Topic
Creating a Pulsar IO Source
Creating a Pulsar IO Sink
Uploading a Schema
Deleting a Schema
Creating a Namespace
Deleting a Namespace
Summary
Appendix C. Geo-Replication
Synchronous Replication
Asynchronous Replication
Replication Patterns
Mesh
Aggregation
Standby
Admin- and Producer-Level Control
Summary
Appendix D. Security, Authentication, and Authorization in Pulsar
Encryption in Transit
Encryption at Rest
Authentication
Authorization
Summary
Index
About the Author
Colophon