Managing Cloud Native Data on Kubernetes: Architecting Cloud Native Data Services Using Open Source Technology (Final)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Is Kubernetes ready for stateful workloads? This open source system has become the primary platform for deploying and managing cloud native applications. But because it was originally designed for stateless workloads, working with data on Kubernetes has been challenging. If you want to avoid the inefficiencies and duplicative costs of having separate infrastructure for applications and data, this practical guide can help.

Author(s): Jeff Carpenter and Patrick McFadin
Publisher: O'Reilly Media, Inc.
Year: 2023

Language: English
Pages: 331

Foreword
Preface
Why We Wrote This Book
Who Is This Book For?
How to Read This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics
Infrastructure Types
What Is Cloud Native Data?
More Infrastructure, More Problems
Kubernetes Leading the Way
Managing Compute on Kubernetes
Managing Network on Kubernetes
Managing Storage on Kubernetes
Cloud Native Data Components
Looking Forward
Getting Ready for the Revolution
Adopt an SRE Mindset
Embrace Distributed Computing
Principles of Cloud Native Data Infrastructure
Principle 1: Leverage compute, network, and storage as commodity APIs
Principle 2: Separate the control and data planes
Principle 3: Make observability easy
Principle 4: Make the default configuration secure
Principle 5: Prefer declarative configuration
Summary
2. Managing Data Storage on Kubernetes
Docker, Containers, and State
Managing State in Docker
Bind Mounts
Volumes
Tmpfs Mounts
Volume Drivers
Kubernetes Resources for Data Storage
Pods and Volumes
Ephemeral volumes
Configuration volumes
hostPath volumes
Cloud volumes
Additional volume providers
PersistentVolumes
Local PersistentVolumes
PersistentVolumeClaims
StorageClasses
Kubernetes Storage Architecture
Flexvolume
Container Storage Interface
Container Attached Storage
OpenEBS
Longhorn
Rook and Ceph
Container Object Storage Interface
Summary
3. Databases on Kubernetes the Hard Way
The Hard Way
Prerequisites for Running Data Infrastructure on Kubernetes
Running MySQL on Kubernetes
ReplicaSets
Deployments
Services
Accessing MySQL
Running Apache Cassandra on Kubernetes
StatefulSets
Defining StatefulSets
StatefulSet lifecycle management
Accessing Cassandra
Summary
4. Automating Database Deployment on Kubernetes with Helm
Deploying Applications with Helm Charts
Using Helm to Deploy MySQL
How Helm Works
Labels
ServiceAccounts
Secrets
ConfigMaps
Updating Helm Charts
Uninstalling Helm Charts
Using Helm to Deploy Apache Cassandra
Affinity and Anti-Affinity
Helm, CI/CD, and Operations
Summary
5. Automating Database Management on Kubernetes with Operators
Extending the Kubernetes Control Plane
Extending Kubernetes Clients
Extending Kubernetes Control Plane Components
Extending Kubernetes Worker Node Components
The Operator Pattern
Controllers
Events
Custom Resources
Operators
Managing MySQL in Kubernetes Using the Vitess Operator
Vitess Overview
PlanetScale Vitess Operator
Installing the Vitess Operator
Roles and RoleBindings
PriorityClasses
Creating a VitessCluster
A Growing Ecosystem of Operators
Choosing Operators
Building Operators
Summary
6. Integrating Data Infrastructure in a Kubernetes Stack
K8ssandra: Production-Ready Cassandra on Kubernetes
K8ssandra Architecture
Installing the K8ssandra Operator
Creating a K8ssandraCluster
Managing Cassandra in Kubernetes with Cass Operator
Enabling Developer Productivity with Stargate APIs
Unified Monitoring Infrastructure with Prometheus and Grafana
Performing Repairs with Cassandra Reaper
Backing Up and Restoring Data with Cassandra Medusa
Creating a Backup
Restoring from Backup
Deploying Multicluster Applications in Kubernetes
Summary
7. The Kubernetes Native Database
Why a Kubernetes Native Approach Is Needed
Hybrid Data Access at Scale with TiDB
TiDB Architecture
Deploying TiDB in Kubernetes
Installing the TiDB CRDs
Installing the TiDB Operator
Creating a TidbCluster
Serverless Cassandra with DataStax Astra DB
What to Look for in a Kubernetes Native Database
Basic Requirements
The Future of Kubernetes Native
Scalability through multidimensional architectures
Community-focused innovation through open source and cloud services
Summary
8. Streaming Data on Kubernetes
Introduction to Streaming
Types of Delivery
Delivery Guarantees
Feature Scope
The Role of Streaming in Kubernetes
Streaming on Kubernetes with Apache Pulsar
Preparing Your Environment
Securing Communications by Default with cert-manager
Using Helm to Deploy Apache Pulsar
Stream Analytics with Apache Flink
Deploying Apache Flink on Kubernetes
Summary
9. Data Analytics on Kubernetes
Introduction to Analytics
Deploying Analytic Workloads in Kubernetes
Introduction to Apache Spark
Deploying Apache Spark in Kubernetes
Build Your Custom Container
Submit and Run Your Application
Kubernetes Operator for Apache Spark
Alternative Schedulers for Kubernetes
Apache YuniKorn
Volcano
Analytic Engines for Kubernetes
Dask
Ray
Summary
10. Machine Learning and Other Emerging Use Cases
The Cloud Native AI/ML Stack
AI/ML Definitions
Defining an AI/ML Stack
Real-Time Model Serving with KServe
Full Lifecycle Feature Management with Feast
Vector Similarity Search with Milvus
Efficient Data Movement with Apache Arrow
Versioned Object Storage with lakeFS
Summary
11. Migrating Data Workloads to Kubernetes
The Vision: Application-Aware Platforms
Charting Your Path to Success
People
Critical people roles for cloud native data
Communities to fast-track your innovation
Technology
Selecting cloud native data projects
New architectures for cloud native data
Deploy services, not servers
Process
DevOps practices
Basic Kubernetes maturity
Deploy stateful workloads
Continually optimize your deployments
The Future of Cloud Native Data
Summary
Index