Univ. College, London, UK. Covers both the traditional approaches including gene and protein sequence analysis and structure prediction, and more recent technologies such as datamining to provide insights on cellular mechanisms. Written specifically for advanced level courses for undergraduates. Softcover.
Author(s): Christine Orengo, David Jones, Janet Thornton
Series: Advanced Texts
Edition: 1
Publisher: Taylor & Francis
Year: 2003
Language: English
Pages: 322
Tags: Биологические дисциплины;Матметоды и моделирование в биологии;Биоинформатика;
Book Cover......Page 1
Half-Title......Page 2
Title......Page 3
Copyright......Page 4
Contents......Page 5
Abbreviations......Page 10
Contributors......Page 13
Foreword......Page 14
1.1.1 A brief history of the gene......Page 16
1.1.2 What is information?......Page 18
1.1.3.1 The algorithmic nature of molecular evolution......Page 21
1.1.3.2 Causes of genetic variation......Page 22
1.1.3.4 Substitutional mutation......Page 23
1.2.1 Protein families in eukaryotic genomes......Page 24
1.2.2 Gene duplication......Page 25
1.2.4 The concept of homology......Page 27
1.3 Outlook: Evolution takes place at all levels of biological organization......Page 30
References and further reading......Page 32
2.1 Concepts......Page 33
2.2.1 Mapping and sequencing bacterial genomes......Page 34
2.3.1 Approaches to gene finding in eukaryotes......Page 35
2.3.1.1 Genetic and physical maps......Page 36
2.3.1.2 Whole genome sequencing......Page 37
2.3.1.3 Finding individual genes......Page 38
2.3.2.2 Recognition of eukaryotic genes......Page 40
2.4 Detecting non-coding RNA genes......Page 41
References and further reading......Page 42
3.1 Concepts......Page 43
3.2.1 Databases......Page 44
3.3.1 Challenges faced when aligning sequences......Page 45
3.3.1.1 Dot plots and similarity matrices for comparing protein sequences......Page 48
3.3.1.2 Dynamic programming......Page 49
3.3.1.4 Local implementation of dynamic programming......Page 52
3.4 Fast database search methods......Page 53
3.4.2 BLAST......Page 54
3.5 Assessing the statistical significance of sequence similarity......Page 56
3.6 Intermediate sequence searching......Page 58
3.8 Multiple sequence alignment......Page 59
3.8.1 Sequence weighting......Page 60
3.8.2 Deriving the consensus sequence and aligning other sequences against it......Page 61
References and further reading......Page 62
4.2.1 Neutralist model......Page 63
4.3.1 Conservative substitutions......Page 64
4.3.3 Elements of a substitution matrix......Page 65
4.3.3.1 Log odds ratio......Page 68
4.3.3.2 Log odds ratio of substitution......Page 69
4.3.4.1 Construction of the raw PAM matrix......Page 70
4.3.4.3 The mutation probability matrix......Page 71
4.3.4.4 Calculating the log-odds matrix......Page 72
4.4 Scoring residue conservation......Page 73
4.4.2 Guidelines for making a conservation score......Page 74
4.5.1 Simple scores......Page 75
4.5.2 Stereochemical property scores......Page 76
4.5.4 Sequence-weighted scores......Page 79
References and further reading......Page 80
5.1 Overview......Page 81
5.3.1 Identification of sequence homologs......Page 82
5.3.2 Identification of conserved domains and functional sites......Page 84
Regular expressions......Page 86
Fingerprints......Page 88
Profiles......Page 89
5.3.3.4 Hidden Markov Models......Page 90
5.3.3.5 Statistical significance of profile and HMM hits......Page 92
5.4 Outlook: Context-dependence of protein function......Page 93
References and further reading......Page 94
6.1.1 Single domains and multidomain proteins......Page 96
6.1.2.3 Analysis of structural variation in protein families......Page 97
6.1.3 Expansion of the Protein Structure Databank......Page 99
6.3 Algorithms......Page 100
6.3.1 Approaches for comparing 3-D structures......Page 102
6.3.2 Intermolecular approaches which compare geometric properties (rigid body superposition methods)......Page 103
6.3.2.2 Superposition methods: coping with indels by comparing secondary structures......Page 106
6.3.3 Intramolecular methods which compare geometric relationships......Page 107
6.3.3.1 Distance plots......Page 108
6.3.3.2 Comparing intramolecular relationships: coping with indels by comparing secondary structures: Graph theory methods......Page 110
6.3.3.4 Comparing intramolecular relationships: coping with indels by applying dynamic programming techniques......Page 111
6.3.4 Combining intermolecular superposition and comparison of intramolecular relationships......Page 113
6.3.5 Searching for common structural domains and motifs......Page 114
6.4 Statistical methods for assessing structural similarity......Page 115
6.5 Multiple structure comparison and 3-D templates for structural families......Page 116
References......Page 117
7.1 Concepts......Page 118
7.2 Data resources......Page 119
7.3.2 Identifying domain boundaries......Page 120
7.3.3.1 Pairwise structure alignment methods......Page 124
7.4 Descriptions of the structural classification hierarchy......Page 126
7.4.2 Comparisons between the different classification resources......Page 128
7.4.3 Distinguishing between homologs and analogs......Page 132
7.5.1 Populations of different levels in the classification hierarchy......Page 133
References......Page 134
8.1 Concepts......Page 136
8.2 Why do comparative modeling?......Page 137
8.3.1 Building a model: Traditional method......Page 138
8.3.1.3 Identify the structurally conserved and structurally variable regions......Page 139
8.3.1.4 Inherit the SCRs from the parent(s)......Page 140
8.3.1.5 Build the SVRs......Page 141
8.3.1.6 Build the sidechains......Page 142
8.3.1.7 Refining the model......Page 143
8.3.2 Building a model: Using MODELLER......Page 145
8.4 Evaluation of model quality......Page 146
8.5 Factors influencing model quality......Page 148
8.6 Insights and conclusions......Page 149
References......Page 150
9.1 Concepts......Page 151
9.2.2.2 Simulation methods......Page 152
9.2.3 Fold recognition or threading......Page 153
9.3.2 Intrinsic propensities for secondary structure formation of the amino acids......Page 154
9.3.3 Hydropathy methods and transmembrane helix prediction......Page 155
9.3.4 Predicting secondary structure from multiple sequence alignments......Page 156
9.3.5 Predicting secondary structure with neural networks......Page 157
9.3.6 Secondary structure prediction using ANNs......Page 158
9.3.7 Training the network......Page 159
9.3.9 Using sequence profiles to predict secondary structure......Page 160
9.3.10 Measures of accuracy in secondary structure prediction......Page 161
9.4.2 Profile methods......Page 162
9.4.4 Potentials of mean force......Page 164
9.5 Ab initio prediction methods......Page 165
9.6 Critically assessing protein structure prediction......Page 166
References......Page 167
10.1 Introduction......Page 168
10.3 Challenges of inferring function from structure......Page 169
10.4.2 Gene fusion......Page 170
10.4.3 One gene, two or more functions......Page 171
10.5 Functional classifications......Page 172
10.5.1.1 Hierarchical structure......Page 173
10.6.1.1 PDBsum......Page 174
10.6.2 Structural class......Page 175
10.6.3.1 Homologous relationship......Page 176
10.6.3.2 Fold similarity and structural analogs......Page 178
10.6.3.3 Structural motifs and functional analogs......Page 179
10.7 Evolution of protein function from a structural perspective......Page 181
10.7.1 Substrate specificity......Page 183
10.7.2.2 Semi-conserved chemistry......Page 184
10.7.2.4 Variation in chemistry......Page 185
10.7.3 Catalytic residues......Page 186
10.7.4 Domain enlargement......Page 187
10.7.6 Summary......Page 188
10.8.1.3 M.jannaschii IMPase: bifunctional protein......Page 189
10.9 Conclusions......Page 191
References and further reading......Page 192
11.1 Concepts......Page 193
11.3.1 Methods of assignment of structures to protein sequences......Page 194
11.3.2 Databases of structural assignments to proteomes......Page 196
11.3.3 Computational and experimental structural genomics: Target selection......Page 198
11.4.1.1 Common and specific domain families in the three kingdoms of life......Page 199
11.4.2.1 Selection for a small proportion of all possible combinations of domains......Page 200
11.4.2.3 Conservation of N-to-C terminal orientation of domain pairs......Page 201
11.4.2.4 Many combinations are specific to one phylogenetic group......Page 202
11.5 Evolution of enzymes and metabolic pathways by structural annotation of genomes......Page 203
11.5.1 Small molecule metabolism in E.coli: an enzyme mosaic......Page 204
11.5.3 Duplications within and across pathways......Page 206
11.5.4 Conclusions from structural assignments to E.coli enzymes......Page 208
References......Page 209
12.1 Concepts......Page 211
12.2 Protein-protein interactions......Page 212
12.3.2 Purification of protein complexes followed by mass spectrometry......Page 213
12.4 Structural analyses of domain interactions......Page 214
12.4.2 The geometry of domain combinations......Page 215
12.5.1 Conservation of gene order......Page 216
12.5.2 Gene fusions......Page 217
12.7 Summary and outlook......Page 218
References and further reading......Page 219
13.2 Why predict molecular interactions?......Page 220
13.3.3 What is needed?......Page 221
13.4.1.1 Grid representation......Page 222
13.4.2.1 Hydrophobicity......Page 224
13.4.3 Molecular mechanics and knowledge-based force fields......Page 225
13.5.1 Constraint-based methods......Page 226
13.5.1.2 Matching......Page 227
13.5.2.1 The Fourier transform method......Page 230
13.6.1 Protein-ligand docking......Page 231
13.6.1.1 Multiple conformation rigid-body method......Page 232
13.6.1.3 Combinatorial search methods......Page 233
13.7 Evaluation of models......Page 234
References and further reading......Page 235
14.1 Concepts......Page 236
14.1.1 The cellular transcriptome......Page 237
14.3.1 Performing a microarray experiment......Page 238
14.3.2 Scanning of microarrays......Page 239
14.4 Properties and Processing of array data......Page 240
14.5.1 Normalizing arrays......Page 242
14.5.2 Normalizing genes......Page 243
14.6 Microarray standards and databases......Page 245
References......Page 247
15.1.1 Data analysis......Page 248
15.1.2 New challenges and opportunities......Page 249
15.3 Clustering......Page 250
15.3.1 Hierarchical clustering......Page 252
15.3.1.1 Single linkage clustering......Page 253
15.3.1.2 Complete linkage clustering......Page 254
15.3.1.3 Clustering gene expression data......Page 255
15.3.2 K-means......Page 257
15.3.3 Self-organizing maps......Page 258
15.3.4 Discussion......Page 260
15.4 Classification......Page 261
15.4.1 Support vector machines (SVM)......Page 265
15.5 Conclusion and future research......Page 266
References and further reading......Page 267
16.1 The proteome......Page 268
16.2.3 The role of bioinformatics in proteomics......Page 269
16.3 Technology platforms in proteomics......Page 270
16.3.1.1 2-D gel electrophoresis......Page 271
16.3.2.1 Protein annotation by mass spectrometry......Page 272
16.3.2.2 Combined HPLC and MS......Page 276
16.3.3 Protein chips......Page 277
16.4.1 Case studies in expression proteomics......Page 278
16.4.2 Case studies in cell-map proteomics......Page 279
References and further reading......Page 280
17.1 Concepts......Page 282
17.2.2 Why are DBMSs useful?......Page 283
17.2.3 Relational and other databases......Page 285
17.3.1 Accessing a database......Page 287
17.3.2.2 Physical design......Page 289
17.3.3 Overcoming performance problems......Page 291
17.3.4 Accessing data from remote sites......Page 292
17.4 Challenges arising from biological data......Page 294
References and further reading......Page 295
18.1 Concepts......Page 296
18.2.1 HTML and CSS......Page 297
18.2.2 XML......Page 300
18.2.4 Remote procedure invocation......Page 303
18.3 Insights and conclusions......Page 304
References and further reading......Page 305
Glossary......Page 306
Index......Page 314