Cassandra: The Definitive Guide, (Revised) Third Edition: Distributed Data at Web Scale

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This revised third edition--updated for Cassandra 4.0 and new developments in the Cassandra ecosystem, including deployments in Kubernetes with K8ssandra--provides technical details and practical examples to help you put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's nonrelational design, with special attention to data modeling. Developers, DBAs, and application architects looking to solve a database scaling issue or future-proof an application will learn how to harness Cassandra's speed and flexibility. • Understand Cassandra's distributed and decentralized structure • Use the Cassandra Query Language (CQL) and cqlsh (the CQL shell) • Create a working data model and compare it with an equivalent relational model • Design and develop applications using client drivers • Explore cluster topology and learn how nodes exchange data • Maintain a high level of performance in your cluster • Deploy Cassandra onsite, in the cloud, or with Docker and Kubernetes • Integrate Cassandra with Spark, Kafka, Elasticsearch, Solr, and Lucene

Author(s): Jeff Carpenter, Eben Hewitt
Edition: 3
Publisher: O'Reilly Media
Year: 2022

Language: English
Commentary: Publisher's PDF
Pages: 430
City: Sebastopol, CA
Tags: Security; NoSQL; Distributed Systems; Monitoring; Logging; Microservices; Queries; Application Development; Kubernetes; Performance Tuning; Data Modeling; K8ssandra; Cassandra; CQL

Copyright
Table of Contents
Foreword
Preface
Why Apache Cassandra?
Is This Book for You?
What’s in This Book?
New for the Third Edition
Note on the Revised Third Edition
Conventions Used in This Book
Using Code Examples
O’Reilly Interactive Katacoda Scenarios
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Beyond Relational Databases
What’s Wrong with Relational Databases?
A Quick Review of Relational Databases
Transactions, ACID-ity, and Two-Phase Commit
Schema
Sharding and Shared-Nothing Architecture
Web Scale
The Rise of NoSQL
Summary
Chapter 2. Introducing Cassandra
The Cassandra Elevator Pitch
Cassandra in 50 Words or Less
Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tuneable Consistency
Brewer’s CAP Theorem
Row-Oriented
High Performance
Where Did Cassandra Come From?
Is Cassandra a Good Fit for My Project?
Large Deployments
Lots of Writes, Statistics, and Analysis
Geographical Distribution
Hybrid Cloud and Multicloud Deployment
Getting Involved
Summary
Chapter 3. Installing Cassandra
Installing the Apache Distribution
Extracting the Download
What’s in There?
Building from Source
Additional Build Targets
Running Cassandra
Setting the Environment
Starting the Server
Stopping Cassandra
Other Cassandra Distributions
Running the CQL Shell
Basic cqlsh Commands
cqlsh Help
Describing the Environment in cqlsh
Creating a Keyspace and Table in cqlsh
Writing and Reading Data in cqlsh
Running Cassandra in Docker
Summary
Chapter 4. The Cassandra Query Language
The Relational Data Model
Cassandra’s Data Model
Clusters
Keyspaces
Tables
Columns
CQL Types
Numeric Data Types
Textual Data Types
Time and Identity Data Types
Other Simple Data Types
Collections
Tuples
User-Defined Types
Summary
Chapter 5. Data Modeling
Conceptual Data Modeling
RDBMS Design
Design Differences Between RDBMS and Cassandra
Defining Application Queries
Logical Data Modeling
Hotel Logical Data Model
Reservation Logical Data Model
Physical Data Modeling
Hotel Physical Data Model
Reservation Physical Data Model
Evaluating and Refining
Calculating Partition Size
Calculating Size on Disk
Breaking Up Large Partitions
Defining Database Schema
Cassandra Data Modeling Tools
Summary
Chapter 6. The Cassandra Architecture
Data Centers and Racks
Gossip and Failure Detection
Snitches
Rings and Tokens
Virtual Nodes
Partitioners
Replication Strategies
Consistency Levels
Queries and Coordinator Nodes
Hinted Handoff
Anti-Entropy, Repair, and Merkle Trees
Lightweight Transactions and Paxos
Memtables, SSTables, and Commit Logs
Bloom Filters
Caching
Compaction
Deletion and Tombstones
Managers and Services
Cassandra Daemon
Storage Engine
Storage Service
Storage Proxy
Messaging Service
Stream Manager
CQL Native Transport Server
System Keyspaces
Summary
Chapter 7. Designing Applications with Cassandra
Hotel Application Design
Cassandra and Microservice Architecture
Microservice Architecture for a Hotel Application
Identifying Bounded Contexts
Identifying Services
Designing Microservice Persistence
Extending Designs
Secondary Indexes
Materialized Views
Reservation Service: A Sample Microservice
Design Choices for a Java Microservice
Deployment and Integration Considerations
Services, Keyspaces, and Clusters
Data Centers and Load Balancing
Interactions Between Microservices
Summary
Chapter 8. Application Development with Drivers
DataStax Java Driver
Development Environment Configuration
Connecting to a Cluster
Statements
Simple Statements
Prepared Statements
Query Builder
Object Mapper
Asynchronous Execution
Driver Configuration
Metadata
Debugging and Monitoring
DataStax Python Driver
DataStax Node.js Driver
DataStax C# Driver
Other Cassandra Drivers
Summary
Chapter 9. Writing and Reading Data
Writing
Write Consistency Levels
The Cassandra Write Path
Writing Files to Disk
Lightweight Transactions
Batches
Reading
Read Consistency Levels
The Cassandra Read Path
Read Repair
Range Queries, Ordering and Filtering
Paging
Deleting
Summary
Chapter 10. Configuring and Deploying Cassandra
Cassandra Cluster Manager
Creating a Cluster
Adding Nodes to a Cluster
Dynamic Ring Participation
Node Configuration
Seed Nodes
Snitches
Partitioners
Tokens and Virtual Nodes
Network Interfaces
Data Storage
Startup and JVM Settings
Planning a Cluster Deployment
Cluster Topology and Replication Strategies
Sizing Your Cluster
Selecting Instances
Storage
Network
Cloud Deployment
Amazon Web Services
Google Cloud Platform
Microsoft Azure
Summary
Chapter 11. Monitoring
Monitoring Cassandra with JMX
Cassandra’s MBeans
Database MBeans
Cluster-Related MBeans
Internal MBeans
Monitoring with nodetool
Getting Cluster Information
Getting Statistics
Virtual Tables
System Virtual Schema
System Views
Metrics
Logging
Examining Log Files
Full Query Logging
Summary
Chapter 12. Maintenance
Health Check
Common Maintenance Tasks
Flush
Cleanup
Repair
Rebuilding Indexes
Moving Tokens
Adding Nodes
Adding Nodes to an Existing Data Center
Adding a Data Center to a Cluster
Handling Node Failure
Repairing Failed Nodes
Replacing Nodes
Removing Nodes
Upgrading Cassandra
Backup and Recovery
Taking a Snapshot
Clearing a Snapshot
Enabling Incremental Backup
Restoring from Snapshot
SSTable Utilities
Maintenance Tools
Netflix Priam
DataStax OpsCenter
Cassandra Sidecars
Cassandra Kubernetes Operators
Summary
Chapter 13. Performance Tuning
Managing Performance
Setting Performance Goals
Benchmarking and Stress Testing
Monitoring Performance
Analyzing Performance Issues
Tracing
Tuning Methodology
Caching
Key Cache
Row Cache
Chunk Cache
Counter Cache
Saved Cache Settings
Memtables
Commit Logs
SSTables
Hinted Handoff
Compaction
Concurrency and Threading
Networking and Timeouts
JVM Settings
Memory
Garbage Collection
Summary
Chapter 14. Security
Authentication and Authorization
Password Authenticator
Using CassandraAuthorizer
Role-Based Access Control
Encryption
SSL, TLS, and Certificates
Node-to-Node Encryption
Client-to-Node Encryption
JMX Security
Securing JMX Access
Security MBeans
Audit Logging
Summary
Chapter 15. Migrating and Integrating
Knowing When to Migrate
Adapting the Data Model
Translating Entities
Translating Relationships
Adapting the Application
Refactoring Data Access
Maintaining Consistency
Migrating Stored Procedures
Planning the Deployment
Migrating Data
Zero-Downtime Migration
Bulk Loading
Common Integrations
Managing Data Flow with Apache Kafka
Searching with Apache Lucene, SOLR, and Elasticsearch
Analyzing Data with Apache Spark
Summary
Index
About the Authors
Colophon