Computational Methods for Predicting Post-Translational Modification Sites

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This volume describes computational approaches to predict multitudes of PTM sites. Chapters describe in depth approaches on algorithms, state-of-the-art Deep Learning based approaches, hand-crafted features, physico-chemical based features, issues related to obtaining negative training, sequence-based features, and structure-based features. Written in the format of the highly successful Methods in Molecular Biology series, each chapter includes an introduction to the topic, lists necessary materials and reagents, includes tips on troubleshooting and known pitfalls, and step-by-step, readily reproducible protocols.

 Authoritative and cutting-edge, Authoritative and cutting-edge,  Computational Methods for Predicting Post-Translational Modification Sites aims to be a useful guide for researchers who are interested in the field of PTM site prediction. 

Author(s): Dukka B. KC
Series: Methods in Molecular Biology, 2499
Publisher: Humana
Year: 2022

Language: English
Pages: 336
City: New York

Dedication
Preface
Acknowledgments
Contents
Contributors
Chapter 1: Maximizing Depth of PTM Coverage: Generating Robust MS Datasets for Computational Prediction Modeling
1 Introduction
2 Sample Preparation
2.1 Proteolysis
2.2 PTM Enrichment
2.2.1 Phosphorylation
2.2.2 Oxidation
2.2.3 Glycosylation
2.2.4 Ubiquitination
2.2.5 Acetylation and Methylation
2.3 Orthogonal Fractionation
3 Instrument Methodology
3.1 Improvements in Modern Technology
3.2 Fragmentation Methods
4 Bioinformatics
4.1 Peptide Identification
4.2 Site Localization
5 Computational Prediction Models: Considerations
6 Validation of Computational Predictions
7 Conclusions
References
Chapter 2: PLDMS: Phosphopeptide Library Dephosphorylation Followed by Mass Spectrometry Analysis to Determine the Specificity...
1 Introduction
2 Materials
2.1 Fluorenylmethyloxycarbonyl (Fmoc) Solid-Phase Peptide Synthesis (SPPS)
2.2 Dephosphorylation Assays
2.3 Size-Exclusion Chromatography (SEC) Purification
2.4 Mass Spectrometry Analysis
2.5 Data Processing Software
2.6 Data Analysis
3 Methods
3.1 Library Design
3.2 Library Synthesis
3.3 Library Evaluation and Optimization
3.4 Library Measurement
3.5 MS Data Processing
3.6 Data Filtering
3.7 Performing the Dephosphorylation Assay for MS Analysis
3.8 Size Exclusion Chromatography to Separate the Library from Recombinant Phosphatase
3.9 MS Measurement and Data Processing of Phosphatase-Treated Phosphopeptide Libraries
3.10 Amino Acid Preference Heatmap Generation
3.11 Peptide Dephosphorylation Rate
4 Notes
References
Chapter 3: FEPS: A Tool for Feature Extraction from Protein Sequence
1 Introduction
2 Protein Feature Descriptors
2.1 Amino Acid Composition
2.2 Dipeptide Composition
2.3 Tripeptide Composition
2.4 Position Weight Amino Acid Composition
2.5 One-Hot Encoding
2.6 Conjoint Triad Descriptor
2.7 Overlapping Properties
2.8 Sequence-Order-Coupling Numbers
2.9 Quasi Sequence-Order Number
2.10 Composition, Transition and Distribution
2.10.1 Composition
2.10.2 Transition
2.10.3 Distribution
2.11 Average Cumulative Hydrophobicity
2.12 High-Quality Physicochemical Indices
2.13 Composition of k-Spaced Amino Acid Pairs
2.14 Autocorrelation
2.14.1 Moran Autocorrelation
2.14.2 Geary Autocorrelation
2.14.3 Moreau-Broto autocorrelation
2.15 Pseudo Amino Acid Composition
2.16 Amphiphilic Pseudo Amino Acid Composition
2.17 Profile-Based Derived by Position-Specific Scoring Matrix
2.17.1 PSSM Composition
2.17.2 PSSM Autocovariance Transformation
2.18 Shannon Entropy
2.19 Relative Entropy
2.20 Information Gain
3 Current Feature Extraction Tools from Protein Sequence/Structure
4 Description of Feature Extraction from Protein Sequences (FEPS) Toolkit
4.1 Choosing the Input Fasta Files
4.2 Selecting the Feature Type and Options
4.3 The Feature File Format
4.4 Downloading Feature and Description Files
5 EPS_CFS: A Wrapper in Python for FEPS
5.1 Protocol for Feature Extraction Using FEPS_CFS wrapper
5.2 FEPS_CFS Details
6 Conclusion
References
Chapter 4: A Pretrained ELECTRA Model for Kinase-Specific Phosphorylation Site Prediction
1 Introduction
2 Materials and Methods
2.1 Dataset
2.2 Methods
2.2.1 Pretraining ELECTRA Model
2.2.2 Fine-Tune ELECTRA Model for General Phosphorylation Site Prediction
2.2.3 Fine-Tune ELECTRA Model for Kinase-Specific Phosphorylation Site Prediction
3 Results
3.1 Evaluation of the ELECTRA Model for Kinase-Specific Phosphorylation Site Prediction
3.2 Efficiency Analysis
3.3 Effect of Pretrained ELECTRA Model on Small-Sample Data Learning
3.4 Tool
4 Notes
5 Discussion
References
Chapter 5: iProtGly-SS: A Tool to Accurately Predict Protein Glycation Site Using Structural-Based Features
1 Introduction
2 Materials and Algorithm
2.1 Datasets
2.2 Algorithm
2.3 Webserver
3 Notes
References
Chapter 6: Functions of Glycosylation and Related Web Resources for Its Prediction
1 Introduction
2 Materials
2.1 Brief Introduction to Glycobiology
2.2 Glycan-Related Databases
2.2.1 Glycan Structures
2.2.2 Glycoproteins
2.2.3 Glycoenzymes and Diseases
3 Methods
3.1 Glycan Biosynthesis Prediction
3.2 Glycosylation Site Prediction
References
Chapter 7: Analysis of Posttranslational Modifications in Arabidopsis Proteins and Metabolic Pathways Using the FAT-PTM Databa...
1 Introduction
2 Materials
3 Methods
3.1 Protein Search Tool
3.2 Metabolic Pathway Tool
3.3 Co-modification Tool
4 Notes
References
Chapter 8: Bioinformatic Analyses of Peroxiredoxins and RF-Prx: A Random Forest-Based Predictor and Classifier for Prxs
1 Introduction
2 Methods
2.1 Dataset and Preprocessing
2.2 Features Construction
2.3 Feature Selection
2.4 Model Construction and Assessment
3 Results and Discussion
3.1 Phase 1. Defining Prx Proteins from Non-Prx
3.2 Phase 2. Class Assignment
3.3 Feature Analysis and Importance
3.4 Comparison with Different Machine Learning Algorithms
4 Conclusions
References
Chapter 9: Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins
1 Introduction
2 Material and Algorithm
2.1 Dataset
2.2 Method
2.2.1 N-Linked Glycosylation Site Prediction
2.2.2 O-Linked Glycosylation Site Prediction
2.2.3 Cross-Species Site Prediction
3 Web Server and Stand-Alone Package
4 Notes
5 Summary
References
Chapter 10: iPTMnet RESTful API for Post-translational Modification Network Analysis
1 Introduction
2 Methods
2.1 Overview of the RESTful API Architecture
2.2 Examples of Using API Endpoints
2.2.1 Setting up a Development Environment
2.2.2 Searching the iPTMnet Database
Listing 1 Search the Database for smad2 Gene
2.2.3 Getting Information for a Particular Entry
Listing 2 Get Detailed Info for the SMAD2 Gene
Listing 3 Result Obtained by Using the Get Info Function for the SMAD2 Gene
2.2.4 Getting PTM Events for a Protein Acting as a Substrate
Listing 4 Get Substrates for Q15796
2.2.5 Getting Proteoforms for a Protein
Listing 5 Get Proteoforms for Q15796
2.2.6 Getting Protein-Protein Interactions for Proteoforms of a Protein
Listing 6 Get PPI for Proteoforms of Q15796
2.2.7 Getting PTM Dependent Protein-Protein Interactions for a Protein
Listing 7 Get PTM Dependent PPI for Proteoforms of Q15796
2.2.8 Getting PTM Sites Affected in Variants of a Protein
Listing 8 Get PTM Sites Affected in Variants of Q15796
2.2.9 Getting Enzymes Corresponding to a List of PTM Sites
Listing 9 Python Dictionary Object Representing a PTM Site
Listing 10 CSV File Containing a List of PTM Sites
Listing 11 Get PTM Enzymes for a List of PTM Sites
Listing 12 Get PTM Enzymes from a File Containing a List of PTM Sites
2.2.10 Getting PTM Dependent PPI for a List of PTM Sites
Listing 13 Get PTM Dependent PPI from a List of PTM Sites
Listing 14 Get PTM Dependent PPI from a File Containing a List of PTM Sites
3 Conclusion
4 Notes
References
Chapter 11: Systematic Characterization of Lysine Post-translational Modification Sites Using MUscADEL
1 Introduction
2 Overview of Existing Methods and Tools
3 Materials
4 Methods
4.1 The Model Architecture
4.2 Model Training
4.3 Implementation of the MUscADEL Web Server
4.4 Usage of the MUscADEL Web Server
5 Performance Evaluation
6 Notes
References
Chapter 12: Enhancing the Discovery of Functional Post-Translational Modification Sites with Machine Learning Models - Develop...
1 Introduction
2 Materials
2.1 Sources of Experimentally Observed and Functional PTMs
2.2 Protein Family Associations and Protein Sequence Retrieval
2.3 Multiple Sequence Alignment
2.4 Model Input Features
2.5 Machine Learning
2.6 Graphical and Statistical Data Analysis
3 Methods
3.1 Overview and Rationale of PTM Site Function Prediction
3.2 Model Feature Extraction and Aggregation
3.3 Defining Optimal Model Architecture and Model Training
3.4 Model Performance Evaluation Using Aggregated Literature Evidence-Known Function Source Count (KFSC)
3.5 Optimization of PTM-Agnostic Recommendation Thresholding
3.6 Model Validation with External Data Sources
3.6.1 Evaluation of Expanded Datasets
3.6.2 Evaluating Model Performance for PTM Sites That Undergo a Change in Functional Status-Pseudoexperimental Validation
3.6.3 Precision/Recall Analysis of PTM Function Predictions Using Site-by-Site Experimental Data
3.6.4 Experimental Validation
3.7 Additional Mechanisms for Model Validation
3.7.1 Disease-Linked Variant Correlation
3.7.2 SLiM Association
3.8 Interpreting How Machine Learning Models Arrive at their Predictions-LIME
3.9 Continuing Challenges and Additional Considerations When Developing PTM Function Prediction Models
3.9.1 What is a Functional PTM?
3.9.2 Inherent Biases
3.9.3 Negative Data
3.9.4 PTM Stoichiometry and Dynamics
3.9.5 PTM and Enzyme Interactions-Functional Association
4 Notes
References
Chapter 13: Exploration of Protein Posttranslational Modification Landscape and Cross Talk with CrossTalkMapper
1 Introduction
2 Materials
2.1 Resources
2.2 Software
2.3 Preparation of the R Environment for CrossTalkMapper
3 Methods
3.1 Data Preparation
3.2 Visualization
3.2.1 Overview of the PTM Landscape
Bar Plot
Heatmaps
3.2.2 In-Depth Visualization of PTM Cross Talk and Abundance Variations
Cross Talk Maps
Further Investigation of PTM Cross Talk with Automatic Generation of Cross Talk Maps and Line Plots
4 Notes
References
Chapter 14: PTM-X: Prediction of Post-Translational Modification Crosstalk Within and Across Proteins
1 Introduction
2 Materials
3 Methods
3.1 Description of PTM-X
3.2 Data Preparation for PTM-X
3.3 Usage of PTM-X Software Locally
3.3.1 Part 1: Setup Computing Environment
3.3.2 Part 2: Using PTM-X to Feature Features and Perform Prediction
3.4 Direct Usage of PTM-X Web Server
4 Notes
References
Chapter 15: Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction
1 Introduction
1.1 Computational Prediction of PTM Sites
1.2 Proteolytic Processing/Cleavage
1.2.1 Computational Prediction of Proteolytic Cleavage
2 Deep-Learning Based Advances for (Single) Posttranslational Modification Site Prediction
2.1 Deep-Learning Based Advances for Acetylation Site Prediction
2.1.1 DeepACEt
2.1.2 DNNAce
2.1.3 Deep-PLA
2.2 Deep Learning-Based Approaches for Protein Glycosylation Site Predictor
2.2.1 SPRINT-Gly
2.3 Deep Learning-Based Approaches for Hydroxylation Site Prediction
2.4 Deep Learning-Based Approaches for Malonylation Site Prediction
2.4.1 DeepMal
2.4.2 DL-MaloSite
2.4.3 Kmalo
2.5 Deep-Learning Based Approaches for Phosphorylation Site Prediction
2.5.1 MusiteDeep
2.5.2 DeepPhos
2.5.3 Chlamy-EnPhosSite
2.5.4 DeepPSP
2.5.5 PROSPECT
2.5.6 DTL-DephosSite
2.6 Deep-Learning Based Approaches for Succinylation Site Prediction
2.6.1 CNN-SuccSite
2.6.2 DeepSuccinylSite
2.6.3 MDCAN-Lys
2.7 Deep Learning-Based Approaches for Ubiquitination Site Prediction
2.7.1 DeepUbi
2.7.2 DeepUbiquitylation
2.7.3 UbiComb
2.7.4 DL-plant-ubsites
2.7.5 DeepTL-Ubi
3 Deep-Learning Based Approaches for Multiple PTM Site Prediction
3.1 MusiteDeep (Web Server)
3.2 pCysMod
3.3 MUscADEL
3.4 MultiLyGAN
3.5 Some Other Notable DL-Based Approaches for Various PTM Site Predictors
3.5.1 Deep-KCR: Histone Lysine Crotonylation (Kcr) Site Predictor
3.5.2 nhKCR: Nonhistone Crotonylation Site Predictor
3.5.3 DeepRMethylSite: Arginine Methylation Site Predictor
3.5.4 DeepNitro: Nitrosylation Sites Predictor
3.5.5 GPS-Palm: S-palmitoylation Site Predictor
4 Recent Advances in Proteolytic Cleavage Prediction
4.1 Recent ML-Based Approaches for Proteolytic Cleavage Prediction
4.1.1 Procleave
4.2 Deep Learning-Based Approaches for Proteolytic Cleavage Prediction
4.2.1 DeepCalpain
4.2.2 DeepCleave
4.2.3 DeepDigest
4.2.4 MPSC: Predicting Proteolysis in Complex Proteomes Using Deep Learning
5 Conclusion and Discussion
5.1 Development of Multimodal Approaches for PTM Prediction
5.2 Positive-Unlabeled Learning Approaches
5.3 Deep Transfer Learning-Based Approaches
5.4 Multitask Multilabel Deep Learning-Based Approaches
5.5 Interpretable and Explainable DL-Based Approaches
References
Index