Official Google Cloud Certified Professional Data Engineer Study Guide

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The proven Study Guide that prepares you for this new Google Cloud exam The Google Cloud Certified Professional Data Engineer Study Guide, provides everything you need to prepare for this important exam and master the skills necessary to land that coveted Google Cloud Professional Data Engineer certification. Beginning with a pre-book assessment quiz to evaluate what you know before you begin, each chapter features exam objectives and review questions, plus the online learning environment includes additional complete practice tests. Written by Dan Sullivan, a popular and experienced online course author for machine learning, big data, and Cloud topics, Google Cloud Certified Professional Data Engineer Study Guide is your ace in the hole for deploying and managing analytics and machine learning applications. • Build and operationalize storage systems, pipelines, and compute infrastructure • Understand machine learning models and learn how to select pre-built models • Monitor and troubleshoot machine learning models • Design analytics and machine learning applications that are secure, scalable, and highly available. This exam guide is designed to help you develop an in depth understanding of data engineering and machine learning on Google Cloud Platform.

Author(s): Dan Sullivan
Edition: 1
Publisher: Sybex
Year: 2020

Language: English
Commentary: Vector PDF
Pages: 352
City: Indianapolis, IN
Tags: Google Cloud Platform;Machine Learning;Deep Learning;Unsupervised Learning;Reinforcement Learning;Anomaly Detection;Security;Supervised Learning;Apache Spark;Reliability;Monitoring;Scalability;High Availability;Best Practices;Encryption;Troubleshooting;AutoML;Model Evaluation;Database Design;Data Warehouse;Compliance;Google Cloud SQL;Distributed Processing;Google BigQuery;Google Bigtable;Google Spanner;Google Storage;Data Processing;Data Pipelines;Data Engineering;Model Training;Model Deployment

Official Google Cloud Certified Professional Data Engineer Study Guide
Acknowledgments
About the Author
About the Technical Editor
Contents at a Glance
Contents
Introduction
What Does This Book Cover?
Interactive Online Learning Environment and TestBank
Additional Resources
Objective Map
Assessment Test
Answers to Assessment Test
Chapter 1. Selecting Appropriate Storage Technologies
From Business Requirements to Storage Systems
Ingest
Store
Process and Analyze
Explore and Visualize
Technical Aspects of Data: Volume, Velocity, Variation, Access, and Security
Volume
Velocity
Variation in Structure
Data Access Patterns
Security Requirements
Types of Structure: Structured, Semi-Structured, and Unstructured
Structured: Transactional vs. Analytical
Semi-Structured: Fully Indexed vs. Row Key Access
Unstructured Data
Google’s Storage Decision Tree
Schema Design Considerations
Relational Database Design
NoSQL Database Design
Exam Essentials
Review Questions
Chapter 2. Building and Operationalizing Storage Systems
Cloud SQL
Configuring Cloud SQL
Improving Read Performance with Read Replicas
Importing and Exporting Data
Cloud Spanner
Configuring Cloud Spanner
Replication in Cloud Spanner
Database Design Considerations
Importing and Exporting Data
Cloud Bigtable
Configuring Bigtable
Database Design Considerations
Importing and Exporting
Cloud Firestore
Cloud Firestore Data Model
Indexing and Querying
Importing and Exporting
BigQuery
BigQuery Datasets
Loading and Exporting Data
Clustering, Partitioning, and Sharding Tables
Streaming Inserts
Monitoring and Logging in BigQuery
BigQuery Cost Considerations
Tips for Optimizing BigQuery
Cloud Memorystore
Cloud Storage
Organizing Objects in a Namespace
Storage Tiers
Cloud Storage Use Cases
Data Retention and Lifecycle Management
Unmanaged Databases
Exam Essentials
Review Questions
Chapter 3. Designing Data Pipelines
Overview of Data Pipelines
Data Pipeline Stages
Types of Data Pipelines
GCP Pipeline Components
Cloud Pub/Sub
Cloud Dataflow
Cloud Dataproc
Cloud Composer
Migrating Hadoop and Spark to GCP
Exam Essentials
Review Questions
Chapter 4. Designing a Data Processing Solution
Designing Infrastructure
Choosing Infrastructure
Availability, Reliability, and Scalability of Infrastructure
Hybrid Cloud and Edge Computing
Designing for Distributed Processing
Distributed Processing: Messaging
Distributed Processing: Services
Migrating a Data Warehouse
Assessing the Current State of a Data Warehouse
Designing the Future State of a Data Warehouse
Migrating Data, Jobs, and Access Controls
Validating the Data Warehouse
Exam Essentials
Review Questions
Chapter 5. Building and Operationalizing Processing Infrastructure
Provisioning and Adjusting Processing Resources
Provisioning and Adjusting Compute Engine
Provisioning and Adjusting Kubernetes Engine
Provisioning and Adjusting Cloud Bigtable
Provisioning and Adjusting Cloud Dataproc
Configuring Managed Serverless Processing Services
Monitoring Processing Resources
Stackdriver Monitoring
Stackdriver Logging
Stackdriver Trace
Exam Essentials
Review Questions
Chapter 6. Designing for Security and Compliance
Identity and Access Management with Cloud IAM
Predefined Roles
Custom Roles
Using Roles with Service Accounts
Access Control with Policies
Using IAM with Storage and Processing Services
Cloud Storage and IAM
Cloud Bigtable and IAM
BigQuery and IAM
Cloud Dataflow and IAM
Data Security
Encryption
Key Management
Ensuring Privacy with the Data Loss Prevention API
Detecting Sensitive Data
Running Data Loss Prevention Jobs
Inspection Best Practices
Legal Compliance
Health Insurance Portability and Accountability Act(HIPAA)
Children’s Online Privacy Protection Act
FedRAMP
General Data Protection Regulation
Exam Essentials
Review Questions
Chapter 7. Designing Databases for Reliability, Scalability, and Availability
Designing Cloud Bigtable Databases for Scalability and Reliability
Data Modeling with Cloud Bigtable
Designing Row-keys
Designing for Time Series
Use Replication for Availability and Scalability
Designing Cloud Spanner Databases for Scalability and Reliability
Relational Database Features
Interleaved Tables
Primary Keys and Hotspots
Database Splits
Secondary Indexes
Query Best Practices
Designing BigQuery Databases for Data Warehousing
Schema Design for Data Warehousing
Clustered and Partitioned Tables
Querying Data in BigQuery
External Data Access
BigQuery ML
Exam Essentials
Review Questions
Chapter 8. Understanding Data Operations for Flexibility and Portability
Cataloging and Discovery with Data Catalog
Searching in Data Catalog
Tagging in Data Catalog
Data Preprocessing with Dataprep
Cleansing Data
Discovering Data
Enriching Data
Importing and Exporting Data
Structuring and Validating Data
Visualizing with Data Studio
Connecting to Data Sources
Visualizing Data
Sharing Data
Exploring Data with Cloud Datalab
Jupyter Notebooks
Managing Cloud Datalab Instances
Adding Libraries to Cloud Datalab Instances
Orchestrating Workflows with Cloud Composer
Airflow Environments
Creating DAGs
Airflow Logs
Exam Essentials
Review Questions
Chapter 9. Deploying Machine Learning Pipelines
Structure of ML Pipelines
Data Ingestion
Data Preparation
Data Segregation
Model Training
Model Evaluation
Model Deployment
Model Monitoring
GCP Options for Deploying Machine Learning Pipeline
Cloud AutoML
BigQuery ML
Kubeflow
Spark Machine Learning
Exam Essentials
Review Questions
Chapter 10. Choosing Training and Serving Infrastructure
Hardware Accelerators
Graphics Processing Units
Tensor Processing Units
Choosing Between CPUs, GPUs, and TPUs
Distributed and Single Machine Infrastructure
Single Machine Model Training
Distributed Model Training
Serving Models
Edge Computing with GCP
Edge Computing Overview
Edge Computing Components and Processes
Edge TPU
Cloud IoT
Exam Essentials
Review Questions
Chapter 11. Measuring, Monitoring, and Troubleshooting Machine Learning Models
Three Types of Machine Learning Algorithms
Supervised Learning
Unsupervised Learning
Anomaly Detection
Reinforcement Learning
Deep Learning
Engineering Machine Learning Models
Model Training and Evaluation
Operationalizing ML Models
Common Sources of Error in Machine Learning Models
Data Quality
Unbalanced Training Sets
Types of Bias
Exam Essentials
Review Questions
Chapter 12. Leveraging Prebuilt Models as a Service
Sight
Vision AI
Video AI
Conversation
Dialogflow
Cloud Text-to-Speech API
Cloud Speech-to-Text API
Language
Translation
Natural Language
Structured Data
Recommendations AI API
Cloud Inference API
Exam Essentials
Review Questions
Appendix. Answers to Review Questions
Chapter 1: Selecting Appropriate Storage Technologies
Chapter 2: Building and Operationalizing Storage Systems
Chapter 3: Designing Data Pipelines
Chapter 4: Designing a Data Processing Solution
Chapter 5: Building and Operationalizing Processing Infrastructure
Chapter 6: Designing for Security and Compliance
Chapter 7: Designing Databases for Reliability, Scalability, and Availability
Chapter 8: Understanding Data Operations for Flexibility and Portability
Chapter 9: Deploying Machine Learning Pipelines
Chapter 10: Choosing Training and Serving Infrastructure
Chapter 11: Measuring, Monitoring, and Troubleshooting Machine Learning Models
Chapter 12: Leveraging Prebuilt Models as a Service
Index
Online Test Bank
EULA