Genomics in the AWS Cloud: Performing Genome Analysis Using Amazon Web Services

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Perform genome analysis and sequencing of data with Amazon Web Services Genomics in the AWS Cloud: Analyzing Genetic Code Using Amazon Web Services enables a person who has moderate familiarity with AWS Cloud to perform full genome analysis and research. Using the information in this book, you'll be able to take a FASTQ file containing raw data from a lab or a BAM file from a service provider and perform genome analysis on it. You'll also be able to identify potentially pathogenic gene sequences. • Get an introduction to Whole Genome Sequencing (WGS) • Make sense of WGS on AWS • Master AWS services for genome analysis Some key advantages of using AWS for genomic analysis is to help researchers utilize a wide choice of compute services that can process diverse datasets in analysis pipelines. Genomic sequencers that generate raw data files are located in labs on premises and AWS provides solutions to make it easy for customers to transfer these files to AWS reliably and securely. Storing Genomics and Medical (e.g., imaging) data at different stages requires enormous storage in a cost-effective manner. Amazon Simple Storage Service (Amazon S3), Amazon Glacier, and Amazon Elastics Block Store (Amazon EBS) provide the necessary solutions to securely store, manage, and scale genomic file storage. Moreover, the storage services can interface with various compute services from AWS to process these files. Whether you're just getting started or have already been analyzing genomics data using the AWS Cloud, this book provides you with the information you need in order to use AWS services and features in the ways that will make the most sense for your genomic research.

Author(s): Catherine Vacher, David Wall
Edition: 1
Publisher: Wiley
Year: 2023

Language: English
Commentary: Publisher's PDF
Pages: 336
City: Hoboken, NJ
Tags: Linux; Amazon Web Services; Cloud Computing; Bioinformatics; Data Visualization; Docker; Lambda Functions; Genomics; DNA; Proteomics; Genome; Biology; AWS Glacier; AWS Elastic Compute Cloud; AWS Simple Storage Service; Data Processing; DNA Sequencing; Cancer; AlphaFold; Containers

Cover
Title Page
Copyright Page
Contents at a Glance
Contents
Introduction
Who Should Read This Book
Genomics
Cloud Computing and AWS
What You’ll Learn from This Book
How This Book Is Organized
How to Use This Book
Our Story
Getting Under Way
How to Contact Wiley and the Authors
Chapter 1 Why Do Genome Analysis Yourself When Commercial Offerings Exist?
Commercial Sequencing Services
Typical Results
Summary
Chapter 2 A Crash Course in Molecular Biology
DNA
DNA at Work: RNA and Proteins
Inheritance
Summary
Chapter 3 Obtaining Your Genome
Preparing to Have Your Genome Sequenced
Can It Affect My Insurance?
Privacy
Humility and Levelheadedness
Validation with a Clinically Accredited Test
Alternatives to Using Your Own Genome
Specifying Lab Work
Depth
Sample Type
Type of Output Files
Sequencing Technology
Genome vs. Exome vs. SNP Arrays
Engaging a Laboratory
Getting a Tissue Sample for DNA Extraction
Rules and Regulations
Do-It-Yourself Phlebotomy
Legal Considerations
Shipping the Sample
Receiving the Results
Sequences and Quality Control Information
Alignment Information
Variation Information
Summary
Chapter 4 The Bioinformatics Workflow
Extraction of DNA
Deriving Nucleated Cells from Whole Blood
Processing Nucleated Cells
FASTA Files
FASTQ Files
Phred Scores
ASCII Encoding of Phred Scores
Alignment to a Reference Genome
Reference Genomes
Quality Control
Trimming
The Alignment Process
Marking Duplicates
Recalibrating Base Quality Score
Calling SNVs and Indel Variants
Annotating SNVs and Indel Variants
Prioritizing Variants
Inheritance Analysis
Identifying SVs and CNVs
Bioinformatics Workflow
Summary
Chapter 5 AWS Services for Genome Analysis
General Concepts
Networking
AWS Functionalities
AWS Accounts
Virtual Private Cloud
Subnets
Elastic IP Addresses
Custom Environments
Storage
S3
Glacier
Computing
Elastic Compute Cloud
Containers
Lambda Functions
Workflow Management
AWS Batch
AWS Step Functions
Simple Workflow Service
Third-Party Solutions
Summary
Chapter 6 Building Your Environment in the AWS Cloud
Setting Up a Virtual Private Cloud
Setting Up and Launching an EC2 Instance
Shutting Down an Instance to Save Money
Setting Up S3 Buckets
Configuring Your Account Securely
Turning On Multifactor Authentication
Establishing an AWS IAM Password Policy
Creating Groups
Creating Users
Setting Up Your Client Environment
Connecting to an EC2 Instance
Connecting from macOS or Unix/Linux
Connecting from Windows
Making S3 Buckets Available Locally
Mounting an S3 Bucket as a Windows Drive
Mounting an S3 Bucket Under macOS and Linux
Summary
Chapter 7 Linux and AWS Command-Line Basics for Genomics
Selecting a Linux Distribution
Accessing Your AWS Linux Instance from Your Local Computer
From Windows
From macOS
Options for Setting Up Linux on Your Personal Computer
Getting Familiar with the Command Line
Absolute and Relative References
Manipulating Files
Transferring Files to and from Your AWS Instance
Keyboard Shortcuts
Running Programs in the Background
Understanding File Permissions
Compressing and Archiving Files
Compression
Grep
Pipes and Redirection Operators
Text Processing Utilities: awk and sed
Managing Linux
Package Management Systems
The AWS Command-Line Interface
Installing the AWS CLI Environment
Windows
macOS and Linux
Configuring the AWS CLI
Setting the Configuration at the Command Line
Storing the Configuration in the Configuration File
Testing Your Installation
AWS CLI Essentials
An Alternative Approach: AWS Systems Manager
Summary
Chapter 8 Processing theSequencing Data
Getting from Data to Information
Aligning to the Reference Genome
Making Adjustments and Refinements to the Aligned Reads in the BAM File
Identifying the Small Differences and Recording Them in the VCF File
Making Adjustments and Refinements to the Variants in the VCF File
Annotating the SNVs and Indels
Prioritizing the Variants to Identify the Most Consequential Ones
Trio Analysis and Inheritance Analysis
Identifying and Annotating SVs and CNVs
Setting Up AWS Services and Data Storage
Copying the FASTQ Files
Installing Docker and Containers
Summary
Chapter 9 Visualizing the Genome
Introducing Genome Visualizers
Installing the IGV Desktop Visualizer
Connecting the IGV Visualizer to Our AWS Data
Loading Data into the IGV Visualizer
Visualizing Aligned Sequencing Reads in IGV
Have a CIGAR
Analyzing Variants in IGV
Summary
Chapter 10 Containerizing Your Workflow on the Desktop
Introducing Containerization
Understanding and Using Docker
Installing Docker on Your Local Machine
Downloading a Docker Image
Viewing Available Docker Images
Running a Docker Container Interactively
Removing a Docker Image
More on Using the Docker Hub
Containers for Genomics Work
Summary
Chapter 11 Variants and Applications
Polygenic Risk Scores
Genome-wide Association Studies
Calculating a Polygenic Score
Metagenomics
AlphaFold
Predicting Protein Structure from Protein Sequence—A 50-Year Puzzle
Installing and Running AlphaFold
Viewing and Comparing AlphaFold Results
Summary
Chapter 12 Cancer Genomics
Somatic Genomes
Cancer
Oncogenes
Tumor Suppressors
The Promise and Reality of Cancer Precision Medicine
Somatic or Germline? Cancer Predisposition
Chromothripsis
Epigenetics of Cancer
Mechanisms of Cancer
Samples
Somatic Variant Analysis
Copy Number Changes
Measuring Tumor Genomic Instability
Summary
Notes
Index
EULA