Snowflake: The Definitive Guide: Architecting, Designing, and Deploying on the Snowflake Data Cloud

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you're an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you. You'll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you'll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily. You'll be able to: • Efficiently capture, store, and process large amounts of data at an amazing speed • Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes • Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs • Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace

Author(s): Joyce Avila
Edition: 1
Publisher: O'Reilly Media
Year: 2023

Language: English
Commentary: Publisher's PDF
Pages: 465
City: Sebastopol, CA
Tags: Cloud Computing; Data Visualization; SQL; Query Performance Optimization; Data Warehouse; Data Governance; Access Control; Snowflake

Cover
Copyright
Table of Contents
Preface
Origin of the Book
Who Is This Book For?
Goals of the Book
Navigating this Book
Using Code Examples
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Getting Started
Snowflake Web User Interfaces
Prep Work
Snowsight Orientation
Snowsight Preferences
Navigating Snowsight Worksheets
Context Setting
Improved Productivity
Snowflake Community
Snowflake Certifications
Snowday and Snowflake Summit Events
Important Caveats About Code Examples in the Book
Code Cleanup
Summary
Knowledge Check
Chapter 2. Creating and Managing the Snowflake Architecture
Prep Work
Traditional Data Platform Architectures
Shared-Disk (Scalable) Architecture
Shared-Nothing (Scalable) Architecture
NoSQL Alternatives
The Snowflake Architecture
The Cloud Services Layer
Managing the Cloud Services Layer
Billing for the Cloud Services Layer
The Query Processing (Virtual Warehouse) Compute Layer
Virtual Warehouse Size
Scaling Up a Virtual Warehouse to Process Large Data Volumes and Complex Queries
Scaling Out with Multicluster Virtual Warehouses to Maximize Concurrency
Creating and Using Virtual Warehouses
Separation of Workloads and Workload Management
Billing for the Virtual Warehouse Layer
Centralized (Hybrid Columnar) Database Storage Layer
Introduction to Zero-Copy Cloning
Introduction to Time Travel
Billing for the Storage Layer
Snowflake Caching
Query Result Cache
Metadata Cache
Virtual Warehouse Local Disk Cache
Code Cleanup
Summary
Knowledge Check
Chapter 3. Creating and Managing Snowflake Securable Database Objects
Prep Work
Creating and Managing Snowflake Databases
Creating and Managing Snowflake Schemas
INFORMATION_SCHEMA
ACCOUNT_USAGE Schema
Schema Object Hierarchy
Introduction to Snowflake Tables
Creating and Managing Views
Introduction to Snowflake Stages: File Format Included
Extending SQL with Stored Procedures and UDFs
User-Defined Function (UDF): Task Included
Secure SQL UDTF That Returns Tabular Value (Market Basket Analysis Example)
Stored Procedures
Introduction to Pipes, Streams, and Sequences
Snowflake Streams (Deep Dive)
Snowflake Tasks (Deep Dive)
Code Cleanup
Summary
Knowledge Check
Chapter 4. Exploring Snowflake SQL Commands, Data Types, and Functions
Prep Work
Working with SQL Commands in Snowflake
DDL Commands
DCL Commands
DML Commands
TCL Commands
DQL Command
SQL Query Development, Syntax, and Operators in Snowflake
SQL Development and Management
Query Syntax
Query Operators
Long-Running Queries, and Query Performance and Optimization
Snowflake Query Limits
Introduction to Data Types Supported by Snowflake
Numeric Data Types
String and Binary Data Types
Date and Time Input/Output Data Types
Semi-Structured Data Types
Unstructured Data Types
How Snowflake Supports Unstructured Data Use
Snowflake SQL Functions and Session Variables
Using System-Defined (Built-In) Functions
Creating SQL and JavaScript UDFs and Using Session Variables
External Functions
Code Cleanup
Summary
Knowledge Check
Chapter 5. Leveraging Snowflake Access Controls
Prep Work
Creating Snowflake Objects
Snowflake System-Defined Roles
Creating Custom Roles
Functional-Level Business and IT Roles
System-Level Service Account and Object Access Roles
Role Hierarchy Assignments: Assigning Roles to Other Roles
Granting Privileges to Roles
Assigning Roles to Users
Testing and Validating Our Work
User Management
Role Management
Snowflake Multi-Account Strategy
Managing Users and Groups with SCIM
Code Cleanup
Summary
Knowledge Check
Chapter 6. Data Loading and Unloading
Prep Work
Basics of Data Loading and Unloading
Data Types
File Formats
Data File Compression
Frequency of Data Processing
Snowflake Stage References
Data Sources
Data Loading Tools
Snowflake Worksheet SQL Using INSERT INTO and INSERT ALL Commands
Web UI Load Data Wizard
SnowSQL CLI SQL PUT and COPY INTO Commands
Data Pipelines
Third-Party ETL and ELT Tools
Alternatives to Loading Data
Tools to Unload Data
Data Loading Best Practices for Snowflake Data Engineers
Select the Right Data Loading Tool and Consider the Appropriate Data Type Options
Avoid Row-by-Row Data Processing
Choose the Right Snowflake Virtual Warehouse Size and Split Files as Needed
Transform Data in Steps and Use Transient Tables for Intermediate Results
Code Cleanup
Summary
Knowledge Check
Chapter 7. Implementing Data Governance, Account Security, and Data Protection and Recovery
Prep Work
Snowflake Security
Controlling Account Access
Monitoring Activity with the Snowflake ACCESS_HISTORY Account Usage View
Data Protection and Recovery
Replication and Failover
Democratizing Data with Data Governance Controls
INFORMATION_SCHEMA Data Dictionary
Object Tagging
Classification
Data Masking
Row Access Policies and Row-Level Security
External Tokenization
Secure Views and UDFs
Object Dependencies
Code Cleanup
Summary
Knowledge Check
Chapter 8. Managing Snowflake Account Costs
Prep Work
Snowflake Monthly Bill
Storage Fees
Data Transfer Costs
Compute Credits Consumed
Creating Resource Monitors to Manage Virtual Warehouse Usage and Reduce Costs
Resource Monitor Credit Quota
Resource Monitor Credit Usage
Resource Monitor Notifications and Other Actions
Resource Monitor Rules for Assignments
DDL Commands for Creating and Managing Resource Monitors
Using Object Tagging for Cost Centers
Querying the ACCOUNT_USAGE View
Using BI Partner Dashboards to Monitor Snowflake Usage and Costs
Snowflake Agile Software Delivery
Why Do We Need DevOps?
Continuous Data Integration, Continuous Delivery, and Continuous Deployment
What Is Database Change Management?
How Zero-Copy Cloning Can Be Used to Support Dev/Test Environments
Code Cleanup
Summary
Knowledge Check
Chapter 9. Analyzing and Improving Snowflake Query Performance
Prep Work
Analyzing Query Performance
QUERY_HISTORY Profiling
HASH() Function
Web UI History
Understanding Snowflake Micro-Partitions and Data Clustering
Partitions Explained
Snowflake Micro-Partitions Explained
Snowflake Data Clustering Explained
Clustering Width and Depth
Choosing a Clustering Key
Creating a Clustering Key
Reclustering
Performance Benefits of Materialized Views
Exploring Other Query Optimization Techniques
Search Optimization Service
Query Optimization Techniques Compared
Summary
Code Cleanup
Knowledge Check
Chapter 10. Configuring and Managing Secure Data Sharing
Snowflake Architecture Data Sharing Support
The Power of Snowgrid
Data Sharing Use Cases
Snowflake Support for Unified ID 2.0
Snowflake Secure Data Sharing Approaches
Prep Work
Snowflake’s Direct Secure Data Sharing Approach
Creating Outbound Shares
How Inbound Shares Are Used by Snowflake Data Consumers
How to List and Shop on the Public Snowflake Marketplace
Snowflake Marketplace for Providers
Standard Versus Personalized Data Listings
Harnessing the Power of a Snowflake Private Data Exchange
Snowflake Data Clean Rooms
Important Design, Security, and Performance Considerations
Share Design Considerations
Share Security Considerations
Share Performance Considerations
Difference Between Database Sharing and Database Cloning
Data Shares and Time Travel Considerations
Sharing of Data Shares
Summary
Code Cleanup
Knowledge Check
Chapter 11. Visualizing Data in Snowsight
Prep Work
Data Sampling in Snowsight
Fixed-Size Sampling Based on a Specific Number of Rows
Fraction-Based Sampling Based on Probability
Previewing Fields and Data
Sampling Examples
Using Automatic Statistics and Interactive Results
Snowsight Dashboard Visualization
Creating a Dashboard and Tiles
Working with Chart Visualizations
Aggregating and Bucketing Data
Editing and Deleting Tiles
Collaboration
Sharing Your Query Results
Using a Private Link to Collaborate on Dashboards
Summary
Code Cleanup
Knowledge Check
Chapter 12. Workloads for the Snowflake Data Cloud
Prep Work
Data Engineering
Data Warehousing
Data Vault 2.0 Modeling
Transforming Data within Snowflake
Data Lake
Data Collaboration
Data Monetization
Regulatory and Compliance Requirements for Data Sharing
Data Analytics
Advanced Analytics for the Finance Industry
Advanced Analytics for the Healthcare Industry
Advanced Analytics for the Manufacturing Industry and Logistics Services
Marketing Analytics for Retail Verticals and the Communications and Media Industry
Data Applications
Data Science
Snowpark
Streamlit
Cybersecurity Using Snowflake as a Security Data Lake
Overcoming the Challenges of a SIEM-Only Architecture
Search Optimization Service Versus Clustering
Unistore
Transactional Workload Versus Analytical Workload
Hybrid Tables
Summary
Code Cleanup
Knowledge Check
Appendix A. Answers to the Knowledge Check Questions
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Appendix B. Snowflake Object Naming Best Practices
General (Character Related)
General (Not Character Related)
Standard Label Abbreviations
Appendix C. Setting Up a Snowflake Trial Account
Index
About the Author
Colophon