Statistical Analysis of Proteomic Data: Methods and Tools

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book explores the most important processing steps of proteomics data analysis and presents practical guidelines, as well as software tools, that are both user-friendly and state-of-the-art in chemo- and biostatistics. Beginning with methods to control the false discovery rate (FDR), the volume continues with chapters devoted to software suites for constructing quantitation data tables, missing value related issues, differential analysis software, and more. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and implementation advice that leads to successful results. 
Authoritative and practical,
Statistical Analysis of Proteomic Data: Methods and Tools serves as an ideal guide for proteomics researchers looking to extract the best of their data with state-of-the art tools while also deepening their understanding of data analysis.

Author(s): Thomas Burger
Series: Methods in Molecular Biology, 2426
Publisher: Humana Press
Year: 2022

Language: English
Pages: 397
City: New York

Preface
Contents
Contributors
Chapter 1: Unveiling the Links Between Peptide Identification and Differential Analysis FDR Controls by Means of a Practical I...
1 Introduction
2 Notations
2.1 Classical Notations in Biostatistics
2.2 Classical Notations in Proteomics
2.3 Other Notations Used in This Protocol
3 Material
3.1 R version
3.2 Packages
3.3 Data Format
3.4 Data Loading from cp4p
3.5 Data Simulation
4 Methods
4.1 Original Knockoff Procedure
4.2 Scoring Methods Based on Forward Stagewise Regression and t-Test
4.3 Sensitivity of FDR Control to Knockoff Used
5 Notes
References
Chapter 2: A Pipeline for Peptide Detection Using Multiple Decoys
1 Introduction
2 Material
2.1 Software Requirements
2.2 Software Installation
2.3 Data Format
3 Methods
3.1 Parameter Specification
3.2 Obtaining the Discoveries
3.3 Example
4 Notes
References
Chapter 3: Enhanced Proteomic Data Analysis with MetaMorpheus
1 Introduction
2 Material
2.1 Mass Spectra Requirements
2.2 Protein Database Requirements
2.3 System Requirements
2.4 Download and Installation
3 Methods
3.1 Starting MetaMorpheus
3.2 Loading Protein Databases
3.3 Loading Spectra Files
3.4 Set File-Specific Settings
3.5 Mass Calibration
3.6 Global Post-Translational Modification Discovery
3.7 Search
3.8 Multi-Protease Protein Inference
3.9 Crosslink Search
3.10 Glycopeptide Search
3.11 Starting Analysis in MetaMorpheus
3.12 Spectrum Annotation with MetaDraw
3.13 Data Visualization with MetaDraw
3.14 Command-Line Operation of MetaMorpheus
3.15 Parameters
4 Notes
References
Chapter 4: Validation of MS/MS Identifications and Label-Free Quantification Using Proline
1 Introduction
2 Material
2.1 Requirements
2.2 Data Format
2.3 Software Install
2.4 Sample Data
3 Methods
3.1 Starting and Configuration
3.2 Import Search Results
3.3 Combine Identification Results
3.4 Validate Identification Results
3.5 Navigate Through Identification Summaries
3.6 MS1 Label-Free Quantification
3.7 Navigate Through Quantification Datasets
3.8 Customized Graphical Display
3.9 Export Data
4 Notes
References
Chapter 5: Integrating Identification and Quantification Uncertainty for Differential Protein Abundance Analysis with Triqler
1 Introduction
2 Material
2.1 Requirements
2.2 Software Install
2.3 Data Type
2.4 Data Size: Number of Samples
2.5 Data Size: Number of Proteins
2.6 Input Format
3 Methods
3.1 Generating Triqler Input from MaxQuant
3.2 Generating Triqler Input from Quandenser
3.3 Generating Triqler Input from Dinosaur
3.4 Triqler Interface
3.5 Running Triqler
3.6 Interpreting the Triqler Output
3.7 Visualizing and Interpreting Posterior Distributions
3.8 Visualizing and Interpreting Hyperparameter Estimation
4 Notes
References
Chapter 6: Left-Censored Missing Value Imputation Approach for MS-Based Proteomics Data with GSimp
1 Introduction
2 Material
2.1 Hardware and Package Requirements
2.2 Data Format and Source Code
2.3 Software Installation
3 Methods
3.1 Data Processing
3.2 GSimp Input Arguments
3.3 GSimp Outputs
3.4 GSimp in Practice
4 Notes
References
Chapter 7: Towards a More Accurate Differential Analysis of Multiple Imputed Proteomics Data with mi4limma
1 Introduction
2 Material
2.1 Requirements
2.2 Data Format: Quantitative Data
2.3 Data Format: Experimental Data
2.4 Data Format: Imputed Data
2.5 Package Installation and Loading
3 Methods
3.1 Multiple Imputation
3.2 Estimation
3.3 Projection
3.4 Moderated t-Test
3.5 Complete Workflow
3.6 Example Use Case
4 Notes
References
Chapter 8: Uncertainty-Aware Protein-Level Quantification and Differential Expression Analysis of Proteomics Data with seaMass
1 Introduction
2 Material
2.1 Data Type
2.2 Data Format
2.3 Hardware Requirements
2.4 Software Requirements
2.5 Software Installation
3 Methods
3.1 Loading seaMass
3.2 Data Loading
3.3 Fractionation
3.4 Experimental Design
3.5 Protein Group Quantification and Normalization
3.6 Protein Group Quantification Output
3.7 seaMass-sigma and seaMass-theta Output Plots
3.8 Differential Expression and FDR Estimation
3.9 Differential Expression Output
3.10 seaMass-delta Output Plots
4 Notes
References
Chapter 9: Statistical Analysis of Quantitative Peptidomics and Peptide-Level Proteomics Data with Prostar
1 Introduction
2 Material
2.1 Live Demo Mode
2.2 Hardware Requirements
2.3 Software Requirements
2.4 Software Install
2.5 Data Type
2.6 Data Size-Number of Peptides
2.7 Data Size-Number of Samples
2.8 Data Format
3 Methods
3.1 Starting Prostar
3.2 Data Loading
3.3 Data Export
3.4 Descriptive Statistics
3.5 Peptide-Protein Graph
3.6 Filtering
3.7 Navigating Through the Dataset Versions
3.8 Normalization
3.9 Missing Values Imputation
3.10 Hypothesis Testing for Peptidomics Data
3.11 Aggregation
3.12 Aggregated Protein Dataset Preprocessing
3.13 Hypothesis Testing for Aggregated Protein Dataset
3.14 Differential Analysis
4 Notes
References
Chapter 10: msmsEDA & msmsTests: Label-Free Differential Expression by Spectral Counts
1 Introduction
1.1 GLMs for Inference with Counts
1.2 Batch Effects in Label-Free Proteomics
1.3 Normalization Strategies
1.4 Reproducibility
2 Material
2.1 Requirements
2.2 Packages Installation
2.3 Data Format
2.4 Example Data
3 Methods
3.1 Experimental Design and EDA
3.2 Inference
3.3 Reproducibility
3.4 Visualizing and Checking the Results
3.5 Normalizing Secretomes
3.6 Inspecting the Results
4 Notes
References
Chapter 11: Exploring Protein Interactome Data with IPinquiry: Statistical Analysis and Data Visualization by Spectral Counts
1 Introduction
2 Material
2.1 Considerations for IP-MS Approaches
2.2 Requirements
2.3 Software Installation
2.4 Data Format
2.5 Example Datasets
2.6 Data Loading
3 Methods
3.1 Visualization of the Overall Variability Between Samples
3.2 Statistical Analysis for Differential Analysis
3.3 Retrieve Annotations for Each Protein
3.4 Create and Export an Html Table
3.5 Export Result Table as Excel or Text File
3.6 Create Interactive Volcanoplot
3.7 Create ggplot2 Based Volcanoplot
3.8 Create Heatmap
4 Notes
References
Chapter 12: Statistical Analysis of Post-Translational Modifications Quantified by Label-Free Proteomics Across Multiple Biolo...
1 Introduction
2 Material
2.1 R and RStudio Installation
2.2 R Packages
2.3 Data Type
2.4 Case Study Dataset
3 Methods
3.1 Data Pre-processing Steps
3.2 Checking the Reproducibility of Identifications
3.3 Checking the Reproducibility of Quantified Values
3.4 Is the Number of Replicates Sufficient in Each Condition?
3.5 Normalization Step
3.6 Dealing with Missing Values
3.7 Mapping the Quantified Values of the Unmodified Protein to the Ones of Modified Peptides
3.8 Extracting Modified Peptides with Specific Detection Profiles Relatively to Their Unmodified Protein
3.9 Extracting Modified Peptides Evolving Significantly Differently from Their Unmodified Protein
3.10 Clustering of Modified Peptides Using Their Dynamics Relatively to Their Unmodified Protein
3.11 Subsequent Functional Analysis
4 Notes
References
Chapter 13: Fast, Free, and Flexible Peptide and Protein Quantification with FlashLFQ
1 Introduction
2 Material
2.1 Data Inputs
2.2 Accepted Data Formats
2.3 Hardware Requirements
2.4 Software Requirements
2.5 Installation
3 Methods
3.1 Adding Identification Files
3.2 Adding Spectra Files
3.3 Settings
3.4 Run
3.5 Output
3.6 Using the Docker Image
4 Notes
References
Chapter 14: Robust Prediction and Protein Selection with Adaptive PENSE
1 Introduction
2 Material
2.1 Data Format
2.2 Data Size
2.3 Hardware and Software Requirements
2.4 Installation
3 Methods
3.1 Starting PENSE
3.2 Loading Data
3.3 Fitting a Predictive Model
3.4 Selecting Hyper-Parameters
4 Notes
References
Chapter 15: Multivariate Analysis with the R Package mixOmics
1 Introduction
2 Material
2.1 Hardware and Software Requirements
2.2 Data Pre-processing
2.2.1 Normalization
2.2.2 Data Filtering
2.2.3 Managing Zeros and Missing Values
2.3 Get Started with mixOmics
3 Methods
3.1 The Mechanics of mixOmics
3.1.1 What Do the Methods in mixOmics Aim to Achieve?
3.1.2 How Are These Components Calculated?
3.1.3 How Are the Loading Vectors Calculated?
3.1.4 How Do We Identify Important Variables?
3.2 Interpreting Graphical Outputs
3.2.1 Sample Plots
3.2.2 Variable Plots
3.2.3 Correlation Circle Plots
3.2.4 Loading Plots
4 Case Studies
4.1 Unsupervised Exploration of One Data Set: PCA and sPCA
4.2 Supervised Exploration of One Data Set-PLS-DA and sPLS-DA
4.3 Unsupervised Exploration of Two Data Sets-PLS and sPLS
4.4 Supervised Exploration of More Than Two Data Sets: Multi-Block sPLS-DA
5 Notes
References
Chapter 16: Integrating Multiple Quantitative Proteomic Analyses Using MetaMSD
1 Introduction
2 Material
2.1 Hardware Requirements
2.2 Software Installation
2.3 Software Installation Trouble Shooting
2.4 Data Preparation
2.5 Data Description
3 Methods
3.1 Integrative Quantitative Proteomics Data Analysis
3.2 MetaMSD Output Files
3.3 Meta-analysis Parameter Option
3.4 Q-Value Threshold Option
3.5 Top N Differential Protein List Option
3.6 Input/Output Folder Name Option
3.7 Help Message Option
4 Notes
References
Chapter 17: Application of WGCNA and PloGO2 in the Analysis of Complex Proteomic Data
1 Introduction
2 Material
2.1 Data Type
2.2 Data Size
2.3 Data Format
2.4 Software Installation
3 Methods
3.1 Workflow
3.2 Load Necessary Packages
3.3 Prepare the Input Files
3.4 Pathway Analysis
3.5 Results
3.6 Adding Abundance
3.7 Gene Ontology Analysis
3.8 WGCNA Analysis for Proteomic Data
4 Notes
References
Index