Modern System Administration: Managing Reliable and Sustainable Systems

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Early system administration required in-depth knowledge of a variety of services on individual systems. Now, the job is increasingly complex and different from one company to the next with an ever-growing list of technologies and third-party services to integrate. How does any one individual stay relevant in systems and services? This practical guide helps anyone in operations—sysadmins, automation engineers, IT professionals, and site reliability engineers—understand the essential concepts of the role today.

Collaboration, automation, and the evolution of systems change the fundamentals of operations work. No matter where you are in your journey, this book provides you the information to craft your path to advancing essential system administration skills. Author Jennifer Davis provides examples of modern practices and tools with recommended materials to advance your skills.

Topics include:

  • Development and testing: Version control, fundamentals of...
  • Author(s): Jennifer Davis
    Publisher: O'Reilly Media
    Year: 2022

    Language: English
    Pages: 325

    Foreword
    Preface
    Who Should Read This Book?
    What This Book Is Not
    Scope of This Book
    If I Could Tell You Only One Thing
    If I Could Tell You Only One More Thing
    Conventions Used in This Book
    O’Reilly Online Learning
    How to Contact Us
    Acknowledgments
    Introducing Modern System Administration
    Map Your Journey
    Embrace a Mindset Shift
    What Is the Job?
    Flavors of System Administration
    Embrace Evolving Practices
    Embrace Collaboration
    Embrace Sustainability
    Wrapping Up
    I. Reasoning About Systems
    1. Patterns and Interconnections
    How to Connect Things
    How Things Communicate
    Application Layer
    Transport Layer
    Network Layer
    Data Link Layer
    Physical Layer
    Wrapping Up
    2. Computing Environments
    Common Workloads
    Choosing the Location of Your Workloads
    On-Prem
    Cloud Computing
    Compute Options
    Serverless
    Unikernels
    Functions
    App services
    Containers
    Virtual Machines
    Guidelines for Choosing Compute
    Wrapping Up
    3. Storage
    Why Care About Storage?
    Key Characteristics
    Storage Categories
    Block Storage
    File Storage
    Object Storage
    Database Storage
    Considerations for Your Storage Strategy
    Anticipate Your Capacity and Latency Requirements
    Retain Your Data as Long as Is Reasonably Necessary
    Respect the Privacy Concerns of Your Users
    Defend Your Data
    Be Prepared to Handle Disaster Recovery Situations
    Wrapping Up
    4. Network
    Caring About Networks
    Key Characteristics of Networks
    Build a Network
    Virtualization
    Software-Defined Networks
    Content Distribution Networks
    Guidelines to Your Network Strategy
    Wrapping Up
    II. Practices
    5. Sysadmin Toolkit
    What Is Your Digital Toolkit?
    The Components of Your Toolkit
    Choosing an Editor
    Integrated static code analysis
    Code completion
    Establish and validate team conventions
    Integrate workflow with Git
    Choosing Programming Languages
    Frameworks and Libraries
    Other Helpful Utilities
    Wrapping Up
    6. Version Control
    What Is Version Control?
    Benefits of Version Control
    Organizing Infra Projects
    Wrapping Up
    7. Testing
    You’re Already Testing
    Common Types of Testing
    Linting
    Unit Tests
    Integration Tests
    End-to-End Tests
    Explicit Testing Strategy
    Improving Your Tests; Learning from Failure
    Next Steps
    Wrapping Up
    8. Infrastructure Security
    What Is Infrastructure Security?
    Share Security Responsibilities
    Borrow the Attacker Lens
    Design for Security Operability
    Categorize Discovered Issues
    Wrapping Up
    9. Documentation
    Know Your Audience
    Dimensions of Documentation
    Organization Practices
    Organizing a Topic
    Organizing a Site
    Recommendations for Quality Documentation
    Wrapping Up
    10. Presentations
    Know Your Audience
    Choose Your Channel
    Choose Your Story Type
    Storytelling in Practice
    Case #1: Charts Are Worth a Thousand Words
    Case #2: Telling the Same Story with a Different Audience
    Team dashboard
    Manager dashboard
    Customer dashboards
    The Key Takeaways
    Know Your Visuals
    Visual Cues
    Chart Types
    Data tables
    Bar charts
    Line charts
    Area charts
    Heat maps
    Flame graphs
    Treemaps
    Recommended Visualization Practices
    Wrapping Up
    III. Assembling the System
    11. Scripting Infrastructure
    Why Script Your Infrastructure?
    Three Lenses to Model Your Infrastructure
    Code to Build Machine Images
    Code to Provision Infrastructure
    Code to Configure Infrastructure
    Getting Started
    Wrapping Up
    12. Managing Your Infrastructure
    Infrastructure as Code
    Treating Your Infrastructure as Data
    Getting Started with Infrastructure Management
    Linting
    Writing Unit Tests
    Writing Integration Tests
    Writing End-to-End Tests
    Wrapping Up
    13. Securing Your Infrastructure
    Assessing Attack Vectors
    Manage Identity and Access
    How Should You Control Access to Your System?
    Who Should Have Access to Your System?
    Manage Secrets
    Password Managers and Secret Management Software
    Defending Secrets and Monitoring Usage
    Securing Your Computing Environment
    Securing Your Network
    Security Recommendations for Your Infrastructure Management
    Wrapping Up
    IV. Monitoring the System
    14. Monitoring Theory
    Why Monitor?
    How Do Monitoring and Observability Differ?
    Monitoring Building Blocks
    Events
    Monitors
    Data: Metrics, Logs, and Tracing
    First-Level Monitoring
    Event Detection
    Data Collection
    Data Reduction
    Data Analysis
    Data Presentation
    Second-Level Monitoring
    Wrapping Up
    15. Compute and Software Monitoring in Practice
    Identify Your Desired Outputs
    What Should You Monitor?
    Do What You Can Now
    Monitors That Matter
    Plan for a Monitoring Project
    What Alerts Should You Set?
    Examine Monitoring Platforms
    Choose a Monitoring Tool or Platform
    Wrapping Up
    16. Managing Monitoring Data
    What Is Monitoring Data?
    Metrics
    Logs
    Structured Logs
    Tracing
    Distributed Tracing
    Choose Your Data Types
    Retain Log Data
    Analyze Log Data
    Monitoring Data at Scale
    Wrapping Up
    17. Monitor Your Work
    Why Should You Monitor Your Work?
    Manage Your Work with Kanban
    Choose a Platform
    Find the Interesting Information
    Wrapping Up
    V. Scaling the System
    18. Capacity Management
    What Is Capacity?
    The Capacity Management Model
    Resource Procurement
    Justification
    Management
    Monitoring
    The Framework for Capacity Planning
    Do You Need Capacity Planning with Cloud Computing?
    Wrapping Up
    19. Developing On-Call Resilience
    What Is On-Call?
    Humane On-Call Processes
    Check Your On-Call Policies
    Preparing for On-Call
    One Week Out
    The Night Before
    Your On-Call Rotation
    On-Call Handoff
    The Day After On-Call
    Monitor the On-Call Experience
    Wrapping Up
    20. Managing Incidents
    What Is an Incident?
    What Is Incident Management?
    Planning and Preparing for Incidents
    Set Up and Document Communication Channels
    Train for Effective Communication
    Create Templates
    Maintain Documentation
    Document the Risks
    Practice Failure
    Understand Your Tools
    Clearly Define Roles and Responsibilities
    Understand Severity Levels and Escalation Protocols
    Responding to Incidents
    Learning from the Incident
    How Deep Should You Dig?
    Aiding Discovery
    Documenting Incidents Effectively
    Distributing the Information
    Next Steps
    Wrapping Up
    21. Leading Sustainable Teams
    Collective Leadership
    Adopt a Whole-Team Approach
    Build Resilient On-Call Teams
    Update On-Call Processes
    Monitor the Team’s Work
    Why Monitor the Team?
    What Should You Monitor?
    What are the team’s objectives?
    What is the team’s definition of a task?
    What is the team’s definition of a project?
    What is the service catalog that your team offers?
    Examine the work
    Measure Impact on the Team
    Support Team Infrastructure with Documentation
    Budget a Learning Culture
    Adapt to Challenges
    Wrapping Up
    Conclusion
    A. Protocols in Practice
    Hypertext Transfer Protocol
    QUIC
    Domain Name System
    B. Resolving Test Failures
    Test Failure Type #1: Environment Problems
    Test Failure Type #2: Flawed Test Logic
    Test Failure Type #3: Changing Assumptions
    Test Failure Type #4: Flaky Tests
    Test Failure Type #5: Code Defects
    Index