Proteomics Data Analysis

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This thorough book collects methods and strategies to analyze proteomics data. It is intended to describe how data obtained by gel-based or gel-free proteomics approaches can be inspected, organized, and interpreted to extrapolate biological information. Organized into four sections, the volume explores strategies to analyze proteomics data obtained by gel-based approaches, different data analysis approaches for gel-free proteomics experiments, bioinformatic tools for the interpretation of proteomics data to obtain biological significant information, as well as methods to integrate proteomics data with other omics datasets including genomics, transcriptomics, metabolomics, and other types of data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detailed implementation advice that will ensure high quality results in the lab. 

Authoritative and practical, Proteomics Data Analysis serves as an ideal guide to introduce researchers, both experienced and novice, to new tools and approaches for data analysis to encourage the further study of proteomics.

Author(s): Daniela Cecconi
Series: Methods in Molecular Biology, 2361
Publisher: Humana
Year: 2021

Language: English
Pages: 339
City: New York

Dedication
Preface
Contents
Contributors
Part I: Data Analysis for Gel-Based Proteomics
Chapter 1: Two-Dimensional Gel Electrophoresis Image Analysis
1 Introduction
2 Image Preprocessing
3 Spot Detection and Spot Quantification
4 Gel Warping and Matching
5 Conclusions
References
Chapter 2: Chemometric Tools for 2D-PAGE Data Analysis
1 Introduction
2 Data Arrangement and Scaling
3 Pattern Recognition Methods
3.1 Principal Component Analysis (PCA)
3.2 Clustering Methods
4 Classification Methods
4.1 Evaluation of Classification Performances
4.2 Linear Discriminant Analysis (LDA)
4.3 Partial Least Squares Discriminant Analysis (PLS-DA)
4.4 Soft-Independent Model of Class Analogy (SIMCA)
4.5 Ranking-PCA
5 Concluding Remarks
References
Part II: Data Analysis for Gel-Free Proteomics
Chapter 3: Software Options for the Analysis of MS-Proteomic Data
1 Introduction
2 Software Tools for MS Data Processing
2.1 Preprocessing of Raw Data
2.2 Search Engines and Scoring Algorithms
2.3 Sequence Databases for Proteomics
2.4 Database Interrogation for Proteomics Identification
2.5 Quantification Algorithms
2.6 Format of Output Results
2.7 Integrated Analysis Platforms
3 Conclusions
4 Notes
References
Chapter 4: Analysis of Label-Based Quantitative Proteomics Data Using IsoProt
1 Introduction
2 Materials
2.1 Installing Docker
2.2 Preparing the Input Files
2.3 Launching IsoProt
3 Methods
4 Results and Interpretation
4.1 Quality Control
4.2 Data Interpretation
5 Notes
References
Chapter 5: Quantification of Changes in Protein Expression Using SWATH Proteomics
1 Introduction
2 Materials
2.1 Equipment
2.2 Protein Digestion
2.3 Samples
2.4 Chromatography
2.5 Mass Spectrometry
2.6 Data Processing
3 Methods
3.1 Cell Lysis and Protein Digestion
3.2 Preparation of Test Samples
3.3 Instrument Setup and Data Acquisition
3.4 SWATH Relative Abundance Quantification
3.5 Assessment of SWATH Data Quality
3.6 Data Analysis of the Full Dataset
4 Conclusions
5 Notes
References
Chapter 6: Data Processing and Analysis for DIA-Based Phosphoproteomics Using Spectronaut
1 Introduction
2 Materials
3 Methods
3.1 Create Spectral Library
3.2 Set Up DIA Analysis in Spectronaut
3.2.1 DIA Search Using Prerecorded Spectral Libraries
3.2.2 DIA Search Without Libraries or directDIA Search
3.2.3 Export DIA Results
3.3 Collapse to Phospho-Sites: Perseus Plugin
3.4 Differential Regulation Analysis with Prostar
4 Notes
References
Chapter 7: Glycan Compositions with GlyConnect Compozitor to Enhance Glycopeptide Identification
1 Introduction
2 Glycopeptide Identification Software
2.1 An Ever-Growing Catalog
2.2 Glycan Composition File Selection
3 Practical Examples
3.1 Exploring the Urine O-Glycome in GlyConnect
3.2 The N-Glycome of Erythropoietin
3.3 Using GlyConnect Metadata
4 Conclusion
5 Notes
References
Chapter 8: Elaboration Pipeline for the Management of MALDI-MS Imaging Datasets
1 Introduction
2 Materials
2.1 Data Acquisition
2.2 Computer Specifications
2.3 Software and Web Resources
3 Methods
3.1 Data Visualization and Tissue Annotation
3.2 Data Import
3.3 Data Preprocessing and Feature Selection
3.4 Unsupervised Statistical Analysis
3.5 Supervised Statistical Analysis
3.6 Internal Calibration and Protein ID Assignment
4 Notes
References
Chapter 9: Features Selection and Extraction in Statistical Analysis of Proteomics Datasets
1 Introduction
2 Inductive Reasoning, Dimensionality and Sparsity
3 Data Processing Before Feature Selection
4 Feature Selection
4.1 Linear Discriminant Analysis (LDA)
4.2 Partial Least Squares Discriminant Analysis (PLS-DA)
4.3 Principal Component Analysis (PCA)
4.4 Clustering Methods
5 Cross-Validation and Performance Estimation of a Proteomics Signature
5.1 k-Fold CV
5.2 Leave-One-Out CV (LOOCV)
5.3 Performance Estimation
6 Some Examples of Statistical Workflows Applied to Proteomics
6.1 Clinical Proteomics: Diagnostic Protein Signatures
6.2 Classification Methods for High-Dimension Proteomics Data of Mixed Quality
7 Concluding Remarks
8 Notes
References
Part III: Proteomics Data Interpretation
Chapter 10: ORA, FCS, and PT Strategies in Functional Enrichment Analysis
1 Introduction
2 Materials
2.1 Data Sources
2.2 Software and Apps
3 Methods
3.1 Metadata
3.2 Process Discovery Matrix
3.3 Differential Expression (DE) Analysis Using Limma
3.4 Functional Enrichment Analysis: ORA
3.5 Functional Class Scoring (FCS) Using GSEA
3.6 Pathway-Topology (PT) Analysis
4 Notes
References
Chapter 11: GO Enrichment Analysis for Differential Proteomics Using ProteoRE
1 Introduction
2 Materials: ProteoRE Tools and Content
2.1 ProteoRE Tools
2.2 Data Files
2.3 History, Datasets, and Availability of Analyses
3 Methods: Using ProteoRE Tools to Perform the Functional Analysis of a Proteomics Dataset
3.1 Access the ProteoRE Web Interface
3.2 Upload Datasets in Your History
3.3 Conversion of Identifiers (ID)
3.4 Filtering Proteins with No Identifier
3.5 Annotating the List of Differentially Expressed Proteins
3.6 Building a Breast Cancer Proteome as a Reference Background for GO Enrichment Analysis
3.7 Performing GO Singular Enrichment Analysis (Fisher´s Exact Test)
3.8 Performing GO Modular Enrichment Analysis Using ``Weight´´ Algorithm
3.9 Running GO Terms Enrichment Comparison Between Up- and Downregulated Proteins
4 Notes
References
Chapter 12: Protein Subcellular Localization Prediction
1 Introduction
2 Methods for Subcellular Extraction of Proteins
3 Mass Spectrometry Qualitative and Quantitative Analysis
4 Data Analysis for Subcellular Localization
4.1 Analysis of Protein Subcellular Localization by BUSCA Software
4.1.1 Input of Data
4.1.2 Output of Analysis
5 Recent Spatial Proteomics Approaches
6 Conclusion
References
Chapter 13: Protein Secretion Prediction Tools and Extracellular Vesicles Databases
1 Introduction
2 Materials
3 Methods
3.1 Classical Protein Secretion Prediction: SignalP 5.0
3.2 Transmembrane Protein Prediction: TMHMM
3.3 Non-classical Protein Secretion Prediction: SecretomeP
3.4 ExoCarta and Vesiclepedia
4 Notes
References
Chapter 14: Databases for Protein-Protein Interactions
1 Introduction
2 Materials
3 Methods
3.1 MINT
3.2 STRING
3.3 BioGRID
3.4 IntAct
3.5 DIP
3.6 HPRD
3.7 I2D
3.8 BIND
3.9 MPact
4 Computational Methods for Predicting Protein-Protein Interactions
4.1 Struct2Net
4.2 HOMCOS
4.3 ENTS
4.4 Comparison of Database Features
References
Chapter 15: Machine and Deep Learning for Prediction of Subcellular Localization
1 Introduction
2 Materials
2.1 Benchmark Datasets
2.2 Evaluation Criteria
3 Methods
3.1 Evolution Information
3.2 Feature Condensing
3.3 Convolutional Neural Network
3.4 Multi-label Classification
4 Notes
References
Chapter 16: Deep Learning for Protein-Protein Interaction Site Prediction
1 Introduction
2 Materials
2.1 Computing Resources
2.1.1 Software Installations
2.1.2 Machine Learning Frameworks
2.2 Databases and Datasets
2.3 Tools for Computing Features and Representations
2.3.1 Sequence-Based
2.3.2 Sequence Embeddings
2.3.3 Structure-Based
3 Methods
3.1 Data
3.1.1 Curation
3.1.2 Train-Test Split Strategies
3.1.3 Representation
3.1.4 Input Features
3.1.5 Pre-processing
3.2 Model Evaluation
3.2.1 Hyperparameter Tuning
3.2.2 Evaluation Metrics
3.2.3 Overfitting
3.2.4 Attribution
3.3 Alternative Training Regimes for Future Model Development
3.3.1 Multi-modal Input
3.3.2 Transfer Learning
3.3.3 Multi-task Learning
3.3.4 Learning Using Privileged Information
3.3.5 Uncertainty Modeling and Active Learning
3.3.6 Attention Mechanisms
3.3.7 Ensembling
4 Notes
References
Part IV: Proteomics Data Integration with Other -Omics
Chapter 17: Integrative Analysis of Incongruous Cancer Genomics and Proteomics Datasets
1 Introduction
2 Materials
2.1 Data Sources
2.2 Software and Apps
3 Methods
4 Notes
References
Chapter 18: Integration of Proteomics and Other Omics Data
1 Introduction
2 Materials
3 Methods
3.1 Unsupervised Analysis for Identifying Protein Functions
3.2 Unsupervised Analysis for Identifying Sample Heterogeneity or Protein Subgrouping
3.2.1 Sample Heterogeneity Analysis
3.2.2 Protein Clustering Analysis
3.3 Supervised Analysis for Identifying Proteomics Markers That Are Associated with Outcomes/Phenotypes
3.4 Supervised Analysis for Constructing Predictive Models for Outcomes/Phenotypes
3.5 Application Notes
4 Concluding Remarks
References
Index