Managing Cloud Native Data on Kubernetes: Architecting Cloud Native Data Services Using Open Source Technology

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Is Kubernetes ready for stateful workloads? This open source system has become the primary platform for deploying and managing cloud native applications. But because it was originally designed for stateless workloads, working with data on Kubernetes has been challenging. If you want to avoid the inefficiencies and duplicative costs of having separate infrastructure for applications and data, this practical guide can help.

Using Kubernetes as your platform, you'll learn open source technologies that are designed and built for the cloud. Authors Jeff Carpenter and Patrick McFadin provide case studies to help you explore new use cases and avoid the pitfalls others have faced. You'll get an insider's view of what's coming from innovators who are creating next-generation architectures and infrastructure.

With this book, you will:

  • Learn how to use basic Kubernetes resources to compose data infrastructure
  • Automate the deployment and operations of data...
  • Author(s): Jeff Carpenter
    Publisher: O'Reilly Media
    Year: 2023

    Language: English
    Pages: 329

    Foreword
    Preface
    Why We Wrote This Book
    Who Is This Book For?
    How to Read This Book
    Conventions Used in This Book
    Using Code Examples
    O’Reilly Online Learning
    How to Contact Us
    Acknowledgments
    1. Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics
    Infrastructure Types
    What Is Cloud Native Data?
    More Infrastructure, More Problems
    Kubernetes Leading the Way
    Managing Compute on Kubernetes
    Managing Network on Kubernetes
    Managing Storage on Kubernetes
    Cloud Native Data Components
    Looking Forward
    Getting Ready for the Revolution
    Adopt an SRE Mindset
    Embrace Distributed Computing
    Principles of Cloud Native Data Infrastructure
    Principle 1: Leverage compute, network, and storage as commodity APIs
    Principle 2: Separate the control and data planes
    Principle 3: Make observability easy
    Principle 4: Make the default configuration secure
    Principle 5: Prefer declarative configuration
    Summary
    2. Managing Data Storage on Kubernetes
    Docker, Containers, and State
    Managing State in Docker
    Bind Mounts
    Volumes
    Tmpfs Mounts
    Volume Drivers
    Kubernetes Resources for Data Storage
    Pods and Volumes
    Ephemeral volumes
    Configuration volumes
    hostPath volumes
    Cloud volumes
    Additional volume providers
    PersistentVolumes
    Local PersistentVolumes
    PersistentVolumeClaims
    StorageClasses
    Kubernetes Storage Architecture
    Flexvolume
    Container Storage Interface
    Container Attached Storage
    OpenEBS
    Longhorn
    Rook and Ceph
    Container Object Storage Interface
    Summary
    3. Databases on Kubernetes the Hard Way
    The Hard Way
    Prerequisites for Running Data Infrastructure on Kubernetes
    Running MySQL on Kubernetes
    ReplicaSets
    Deployments
    Services
    Accessing MySQL
    Running Apache Cassandra on Kubernetes
    StatefulSets
    Defining StatefulSets
    StatefulSet lifecycle management
    Accessing Cassandra
    Summary
    4. Automating Database Deployment on Kubernetes with Helm
    Deploying Applications with Helm Charts
    Using Helm to Deploy MySQL
    How Helm Works
    Labels
    ServiceAccounts
    Secrets
    ConfigMaps
    Updating Helm Charts
    Uninstalling Helm Charts
    Using Helm to Deploy Apache Cassandra
    Affinity and Anti-Affinity
    Helm, CI/CD, and Operations
    Summary
    5. Automating Database Management on Kubernetes with Operators
    Extending the Kubernetes Control Plane
    Extending Kubernetes Clients
    Extending Kubernetes Control Plane Components
    Extending Kubernetes Worker Node Components
    The Operator Pattern
    Controllers
    Events
    Custom Resources
    Operators
    Managing MySQL in Kubernetes Using the Vitess Operator
    Vitess Overview
    PlanetScale Vitess Operator
    Installing the Vitess Operator
    Roles and RoleBindings
    PriorityClasses
    Creating a VitessCluster
    A Growing Ecosystem of Operators
    Choosing Operators
    Building Operators
    Summary
    6. Integrating Data Infrastructure in a Kubernetes Stack
    K8ssandra: Production-Ready Cassandra on Kubernetes
    K8ssandra Architecture
    Installing the K8ssandra Operator
    Creating a K8ssandraCluster
    Managing Cassandra in Kubernetes with Cass Operator
    Enabling Developer Productivity with Stargate APIs
    Unified Monitoring Infrastructure with Prometheus and Grafana
    Performing Repairs with Cassandra Reaper
    Backing Up and Restoring Data with Cassandra Medusa
    Creating a Backup
    Restoring from Backup
    Deploying Multicluster Applications in Kubernetes
    Summary
    7. The Kubernetes Native Database
    Why a Kubernetes Native Approach Is Needed
    Hybrid Data Access at Scale with TiDB
    TiDB Architecture
    Deploying TiDB in Kubernetes
    Installing the TiDB CRDs
    Installing the TiDB Operator
    Creating a TidbCluster
    Serverless Cassandra with DataStax Astra DB
    What to Look for in a Kubernetes Native Database
    Basic Requirements
    The Future of Kubernetes Native
    Scalability through multidimensional architectures
    Community-focused innovation through open source and cloud services
    Summary
    8. Streaming Data on Kubernetes
    Introduction to Streaming
    Types of Delivery
    Delivery Guarantees
    Feature Scope
    The Role of Streaming in Kubernetes
    Streaming on Kubernetes with Apache Pulsar
    Preparing Your Environment
    Securing Communications by Default with cert-manager
    Using Helm to Deploy Apache Pulsar
    Stream Analytics with Apache Flink
    Deploying Apache Flink on Kubernetes
    Summary
    9. Data Analytics on Kubernetes
    Introduction to Analytics
    Deploying Analytic Workloads in Kubernetes
    Introduction to Apache Spark
    Deploying Apache Spark in Kubernetes
    Build Your Custom Container
    Submit and Run Your Application
    Kubernetes Operator for Apache Spark
    Alternative Schedulers for Kubernetes
    Apache YuniKorn
    Volcano
    Analytic Engines for Kubernetes
    Dask
    Ray
    Summary
    10. Machine Learning and Other Emerging Use Cases
    The Cloud Native AI/ML Stack
    AI/ML Definitions
    Defining an AI/ML Stack
    Real-Time Model Serving with KServe
    Full Lifecycle Feature Management with Feast
    Vector Similarity Search with Milvus
    Efficient Data Movement with Apache Arrow
    Versioned Object Storage with lakeFS
    Summary
    11. Migrating Data Workloads to Kubernetes
    The Vision: Application-Aware Platforms
    Charting Your Path to Success
    People
    Critical people roles for cloud native data
    Communities to fast-track your innovation
    Technology
    Selecting cloud native data projects
    New architectures for cloud native data
    Deploy services, not servers
    Process
    DevOps practices
    Basic Kubernetes maturity
    Deploy stateful workloads
    Continually optimize your deployments
    The Future of Cloud Native Data
    Summary
    Index