The analysis of protein-protein interactions is fundamental to the understanding of cellular organization, processes, and functions. Recent large-scale investigations of protein-protein interactions using such techniques as two-hybrid systems, mass spectrometry, and protein microarrays have enriched the available protein interaction data and facilitated the construction of integrated protein-protein interaction networks. The resulting large volume of protein-protein interaction data has posed a challenge to experimental investigation. This book provides a comprehensive understanding of the computational methods available for the analysis of protein-protein interaction networks. It offers an in-depth survey of a range of approaches, including statistical, topological, data-mining, and ontology-based methods. The author discusses the fundamental principles underlying each of these approaches and their respective benefits and drawbacks, and she offers suggestions for future research.
Author(s): Aidong Zhang
Edition: 1
Publisher: Cambridge University Press
Year: 2009
Language: English
Pages: 294
City: Leiden
Cover......Page 1
Half-title......Page 3
Title......Page 5
Copyright......Page 6
Dedication......Page 7
Contents......Page 9
Preface......Page 15
1.1 Rapid Growth of Protein--Protein Interaction Data......Page 17
1.2 Computational Analysis of PPI Networks......Page 19
1.2.1 Topological Features of PPI Networks......Page 20
1.2.2 Modularity Analysis......Page 21
1.2.3 Prediction of Protein Functions in PPI Networks......Page 22
1.3 Significant Applications......Page 23
1.4 Organization of this Book......Page 25
1.5 Summary......Page 26
2.2 The Y2H System......Page 27
2.3 Mass Spectrometry (MS) Approaches......Page 29
2.5.1 Experimental PPI Data Sets......Page 31
2.5.2 Public PPI Databases......Page 32
2.5.3 Functional Analysis of PPI Data......Page 33
2.6 Summary......Page 36
3.2 Genome-Scale Approaches......Page 37
3.3 Sequence-Based Approaches......Page 41
3.4 Structure-Based Approaches......Page 42
3.5 Learning-Based Approaches......Page 43
3.6 Network Topology-Based Approaches......Page 45
3.7 Summary......Page 48
4.2 Representation of PPI Networks......Page 49
4.3 Basic Concepts......Page 50
4.4.2 Distance-Based Centralities......Page 51
4.4.3 Current-Flow-Based Centrality......Page 53
4.4.4 Random-Walk-Based Centrality......Page 56
4.4.5 Feedback-Based Centrality......Page 57
4.5 Characteristics of PPI Networks......Page 60
4.6 Summary......Page 65
5.1 Introduction......Page 66
5.2.2 Cores......Page 67
5.2.3 Degree-Based Index......Page 68
5.3 Methods for Clustering Analysis of Protein Interaction Networks......Page 69
5.3.1 Traditional Clustering Methods......Page 70
5.3.2 Nontraditional Clustering Methods......Page 71
5.4.1 Clustering Coefficient......Page 72
5.4.2 Validation Based on Agreement with Annotated ProteinFunction Databases......Page 73
5.4.3 Validation Based on the Definition of Clustering......Page 75
5.4.4 Topological Validation......Page 76
5.4.6 Statistical Validation......Page 77
5.5 Summary......Page 78
6.1 Introduction......Page 79
6.2.1 Error and Attack Tolerance of Complex Networks......Page 80
6.2.2 Role of High-Degree Nodes in Biological Networks......Page 83
6.2.3 Betweenness, Connectivity, and Centrality......Page 85
6.3 Bridging Centrality Measurements......Page 89
6.3.1 Performance of Bridging Centrality with Synthetic andReal-World Networks......Page 91
6.3.2 Assessing Network Disruption, Structural Integrity,and Modularity......Page 93
6.4 Network Modularization usingthe Bridge Cut Algorithm......Page 100
6.5 Use of Bridging Nodes in Drug Discovery......Page 103
6.5.1 Biological Correlates of Bridging Centrality......Page 104
6.5.2 Results from Drug Discovery-Relevant Human Networks......Page 108
6.5.3 Comparison to Alternative Approaches: Yeast CellCycle State Space Network......Page 110
6.5.4 Potential of Bridging Centrality as a Drug Discovery Tool......Page 111
6.6.1 Weighted PPI Network......Page 113
6.6.2 Protein Connectivity and Interaction Reliability......Page 114
6.6.3 PathStrength and PathRatio Measurements......Page 115
6.6.4 Analysis of the PathRatio Topological Measurement......Page 116
6.6.5 Experimental Results......Page 117
6.7 Summary......Page 124
7.2 Topological Distance Measurement Basedon Coefficients......Page 125
7.3.1 PathRatio Method......Page 128
7.3.2 Averaging the Distances......Page 129
7.4 Ensemble Method......Page 130
7.4.1 Similarity Metrics......Page 131
7.4.3 Consensus Methods......Page 132
7.5 UVCLUSTER......Page 134
7.6 Similarity Learning Method......Page 136
7.7.1 Sequence Similarity-Based Measurements......Page 140
7.7.2 Structural Similarity-Based Measurements......Page 141
7.7.3 Gene Expression Similarity-Based Measurements......Page 143
7.8 Summary......Page 144
8.2.1 Enumeration of Complete Subgraphs......Page 146
8.2.2 Monte Carlo Optimization......Page 147
8.2.3 Molecular Complex Detection......Page 148
8.2.4 Clique Percolation......Page 149
8.2.5 Merging by Statistical Significance......Page 150
8.2.6 Super-Paramagnetic Clustering......Page 152
8.3.1 Recursive Minimum Cut......Page 153
8.3.2 Restricted Neighborhood Search Clustering (RNSC)......Page 154
8.3.4 Markov Clustering......Page 156
8.3.5 Line Graph Generation......Page 159
8.4.1 Graph Reduction......Page 160
8.4.2 Hierarchical Modularization......Page 162
8.4.4 k Effects on Graph Reduction......Page 163
8.4.5 Hierarchical Structure of Modules......Page 165
8.5 Summary......Page 166
9.1 Introduction......Page 168
9.2 Protein Function Prediction Using the FunctionalFlow Algorithm......Page 169
9.3 CASCADE: A Dynamic Flow Simulation forModularity Analysis......Page 171
9.3.1 Occurrence Probability and Related Models......Page 172
9.3.2 The CASCADE Algorithm......Page 174
9.3.3 Analysis of Prototypical Data......Page 176
9.3.4 Significance of Individual Clusters......Page 178
9.3.5 Analysis of Functional Annotation......Page 180
9.3.6 Comparative Assessment of CASCADE with Other Approaches......Page 185
9.3.8 Analysis of Computational Complexity......Page 191
9.3.9 Advantages of the CASCADE Method......Page 192
9.4 Functional Flow Analysis in WeightedPPI Networks......Page 193
9.4.1 Functional Influence Model......Page 194
9.4.2 Functional Flow Simulation Algorithm......Page 195
9.4.3 Time Complexity of Flow Simulation......Page 196
9.4.4 Detection of Overlapping Modules......Page 197
9.4.5 Detection of Disjoint Modules......Page 205
9.4.6 Functional Flow Pattern Mining......Page 207
9.5 Summary......Page 214
10.1 Introduction......Page 215
10.2 Applications of Markov Random Field and Belief Propagation for Protein Function Prediction......Page 216
10.3 Protein Function Prediction Using Kernel-based Statistical Learning Methods......Page 223
10.4 Protein Function Prediction usingBayesian Networks......Page 227
10.5 Improving Protein Function Prediction using Bayesian Integrative Methods......Page 229
10.6 Summary......Page 230
11.1 Introduction......Page 232
11.2.1 GO Annotations......Page 233
11.3 Semantic Similarity-Based Integration......Page 234
11.3.1 Structure-Based Methods......Page 235
11.3.2 Information Content-Based Methods......Page 236
11.3.3 Combination of Structure and Information Content......Page 237
11.5 Estimate of Interaction Reliability......Page 239
11.5.1 Functional Co-occurrence......Page 240
11.5.2 Topological Significance......Page 241
11.5.3 Protein Lethality......Page 242
11.6.1 Statistical Assessment......Page 243
11.6.2 Supervised Validation......Page 245
11.7.1 GO Index-Based Probabilistic Method......Page 247
11.7.2 Semantic Similarity-Based Probabilistic Method......Page 251
11.8 Summary......Page 257
12.2 Integration of Gene Expression with PPI Networks......Page 259
12.3 Integration of Protein Domain Information withPPI Networks......Page 260
12.4 Integration of Protein Localization Information with PPI Networks......Page 261
12.5.1 Kernel-Based Methods......Page 263
12.6 Summary......Page 265
13 Conclusion......Page 267
Bibliography......Page 271
Index......Page 289