Knowledge-Based Bioinformatics: From analysis to interpretation

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

In order to deal with issues that arise from the current increase of biological data in genomic and proteomic research and present it effectively to a wider audience, broader coverage of recent developments in the field of knowledge-based systems and their applications is required. Most current texts are either outdated or do not include all the aspects in knowledge and data-driven representation, integration, analysis, and interpretation. This collection aims to address this issue by providing comprehensive coverage of knowledge driven approaches to bioinformatics.

Author(s): Gil Alterovitz, Marco Ramoni
Publisher: Wiley
Year: 2010

Language: English
Pages: 397
Tags: Биологические дисциплины;Матметоды и моделирование в биологии;Биоинформатика;

Knowledge-Based Bioinformatics......Page 5
Contents......Page 7
Preface......Page 15
List of Contributors......Page 19
PART I FUNDAMENTALS......Page 23
Section 1 Knowledge-Driven Approaches......Page 25
1.1 Introduction......Page 27
1.2 Formal reasoning for bioinformatics......Page 29
1.4 Collecting explicit knowledge......Page 32
1.5 Representing common knowledge......Page 33
1.7 Knowledge discovery applications......Page 37
1.8 Semantic harmonization: the power and limitation of ontologies......Page 40
1.9 Text mining and extraction......Page 41
1.10 Gene expression......Page 42
1.11 Pathways and mechanistic knowledge......Page 44
1.12 Genotypes and phenotypes......Page 46
1.13 The Web’s role in knowledge mining......Page 47
1.14.2 Information aggregation......Page 48
1.14.4 Information articulation......Page 50
1.14.5 Next-generation knowledge discovery......Page 52
1.15 References......Page 53
2.1.1 The genomic era and systems biology......Page 55
2.1.2 The exponential growth of biomedical knowledge......Page 56
2.1.3 The challenges of finding and interacting with biomedical knowledge......Page 57
2.2.1 We need to read; development of automatic methods to extract data housed in the biomedical literature......Page 59
2.2.2 Implicit and implied knowledge; the forgotten data source......Page 63
2.2.3 Humans are visual beings: so should their knowledge be......Page 64
2.3 Current knowledge-based bioinformatics tools......Page 65
2.3.1 Enrichment tools......Page 66
2.3.2 Integration and expansion: from gene lists to networks......Page 68
2.3.3 Expanding the concept of an interaction......Page 70
2.4 3R systems: reading, reasoning and reporting the way towards biomedical discovery......Page 72
2.4.1 3R knowledge networks populated by reading and reasoning......Page 74
2.4.2 Implied association results in uncertainty......Page 75
2.4.3 Reporting: using 3R knowledge networks to tell biological stories......Page 76
2.5 The Hanalyzer: a proof of 3R concept......Page 77
2.7 References......Page 84
3.1 Introduction......Page 89
3.2 Knowledge representation languages and tools for building bio-ontologies......Page 90
3.2.1 RDF (resource description framework)......Page 93
3.2.2 OWL (Web ontology language)......Page 94
3.2.3 OBO format......Page 99
3.3.1 Define the scope of the bio-ontology......Page 100
3.3.3 Commit to agreed ontological principles......Page 101
3.3.5 Ontology Design Patterns (ODPs)......Page 102
3.3.6 Ontology evaluation......Page 103
3.4 Conclusion......Page 105
3.6 References......Page 106
4.1 Introduction......Page 109
4.2.2 Data submitted by external users and collaborators......Page 112
4.3 Design of knowledge bases......Page 113
4.3.1 Understanding your end users and understanding their data......Page 114
4.4.1 Choosing a database architecture......Page 115
4.4.2 Good programming practices......Page 118
4.4.3 Implementation of interfaces......Page 119
4.5.1 Manual curation and auto-annotation......Page 120
4.5.2 Clever pipelines and data flows......Page 123
4.5.3 Lessening data maintenance overheads......Page 126
4.7 References......Page 127
Section 2 Data-Analysis Approaches......Page 129
5.2 Significance testing......Page 131
5.2.1 Multiple testing and false discovery rate......Page 132
5.2.2 Correlated errors......Page 133
5.3.1 Clustering......Page 134
5.3.2 Principal components......Page 138
5.3.3 Multidimensional scaling (MDS)......Page 139
5.4 Classification and prediction......Page 141
5.4.2 Modern procedures......Page 142
5.5 References......Page 144
6.1 Introduction......Page 147
6.2 Bayes theorem and some simple applications......Page 148
6.3 Inference of population structure from genetic marker data......Page 151
6.4 Inference of protein binding motifs from sequence data......Page 152
6.5 Inference of transcriptional regulatory networks from joint analysis of protein–DNA binding data and gene expression data......Page 153
6.6 Inference of protein and domain interactions from yeast two-hybrid data......Page 154
6.7 Conclusions......Page 156
6.9 References......Page 157
7.1 Introduction......Page 159
7.1.1 Knowledge discovery through text mining......Page 160
7.1.2 Need for processing biomedical texts......Page 161
7.1.3 Developing text mining solutions......Page 163
7.2.1 Efficient analysis of normalized information......Page 164
7.2.2 Interactive seeking of textual information......Page 167
7.3.1 Components......Page 169
7.3.2 Methods......Page 172
7.4 Development issues......Page 174
7.4.2 Corpus construction......Page 175
7.4.4 Integration framework......Page 176
7.4.5 Evaluation......Page 177
7.5.1 Interactive literature analysis......Page 178
7.5.2 Integration into bioinformatics solutions......Page 179
7.5.3 Discovery of knowledge from the literature......Page 180
7.6 Conclusion......Page 181
7.7 References......Page 182
PART II APPLICATIONS......Page 191
Section 3 Gene and Protein Information......Page 193
8.1 Introduction......Page 195
8.1.2 Value-added curation......Page 196
8.2.1 Gene Ontology and the annotation of the human

proteome......Page 197
8.2.3 GO annotation methods......Page 198
8.2.5 Ontology development......Page 205
8.3.1 Manual methods of transferring functional annotation......Page 208
8.3.2 Electronic methods of transferring functional annotation......Page 209
8.3.3 Electronic annotation methods......Page 210
8.4 Community annotation......Page 211
8.4.3 Community annotation workshops......Page 212
8.5.1 GO cannot capture all relevant biological aspects......Page 213
8.5.5 Manual curation is expensive......Page 214
8.6 Accessing GO annotations......Page 215
8.6.1 Tools for browsing the GO......Page 216
8.6.2 Functional classification......Page 221
8.6.3 GO slims......Page 224
8.7 Conclusions......Page 225
8.8 References......Page 226
9.1.1 Introduction to gene annotation......Page 231
9.1.3 Annotation based on transcribed evidence......Page 233
9.1.4 A comparison of annotation processes......Page 235
9.1.5 The CCDS project......Page 236
9.1.6 Pseudogene annotation......Page 237
9.1.7 The annotation of non-coding genes......Page 240
9.2.1 The annotation of multispecies genomes......Page 242
9.2.2 Community annotation......Page 244
9.2.3 Alternative splicing and new transcriptomics data......Page 245
9.2.4 The annotation of human genome variation......Page 247
9.2.5 The annotation of polymorphic gene families......Page 248
9.3 References......Page 250
10.1 Introduction......Page 255
10.2 Batch-learning SOM (BLSOM) adapted for genome informatics......Page 257
10.3.1 BLSOMs for 13 eukaryotic genomes......Page 259
10.3.2 Diagnostic oligonucleotides for phylotype-specific clustering......Page 260
10.3.3 A large-scale BLSOM constructed with all sequences available from species-known genomes......Page 262
10.3.4 Phylogenetic estimation for environmental DNA sequences and microbial community comparison using the BLSOM......Page 264
10.3.5 Reassociation of environmental genomic fragments according to species......Page 267
10.4 Conclusions and discussion......Page 269
10.5 References......Page 270
Section 4 Biomolecular Relationships and Meta-Relationships......Page 273
11.1 Introduction......Page 275
11.2.1 Global structure of molecular networks: scale-free, small-world, disassortative, and modular......Page 276
11.2.3 Applications of topology analysis......Page 280
11.2.4 Challenges and future directions of topology analysis......Page 284
11.3.2 Applications of motif analysis......Page 285
11.3.3 Challenges and future directions of motif analysis......Page 288
11.4 Network modular analysis and applications......Page 289
11.4.1 Density-based clustering methods......Page 290
11.4.2 Partition-based clustering methods......Page 291
11.4.3 Centrality-based clustering methods......Page 292
11.4.4 Hierarchical clustering methods......Page 293
11.4.5 Applications of modular analysis......Page 294
11.4.6 Challenges and future directions of modular analysis......Page 295
11.5.1 Network comparison algorithms: from computer science to systems biology......Page 296
11.5.2 Network comparison algorithms for molecular networks......Page 297
11.5.3 Applications of molecular network comparison......Page 299
11.5.4 Challenges and future directions of network comparison......Page 300
11.7 Summary......Page 301
11.9 References......Page 304
12.1 Biological pathway analysis and pathway knowledge bases......Page 311
12.2 Overview of high-throughput data capture technologies and data repositories......Page 312
12.3.1 Reactome......Page 315
12.3.2 KEGG......Page 318
12.3.4 NCI-Pathway Interaction Database......Page 319
12.3.5 NCBI-BioSystems......Page 320
12.3.7 PharmGKB......Page 321
12.4 How does information get into pathway knowledge bases?......Page 322
12.5.1 SBML......Page 323
12.5.2 BioPAX......Page 324
12.5.4 Comparison of data exchange formats for different pathway knowledge bases......Page 325
12.6 Visualization tools......Page 326
12.7 Use case: pathway analysis in Reactome using statistical analysis of high-throughput data sets......Page 327
12.8 Discussion: challenges and future directions of pathway knowledge bases......Page 332
12.9 References......Page 333
13.1 Complex traits: clinical phenomenology and molecular background......Page 337
13.2 Why it is challenging to infer relationships between genes and phenotypes in complex traits?......Page 339
13.3 Bottom-up or top-down: which approach is more useful in delineating complex traits key drivers?......Page 347
13.4 High-throughput technologies and their applications in complex traits genetics......Page 349
13.5 Integrative systems biology: a comprehensive approach to mining high-throughput data......Page 350
13.6 Methods applying systems biology approach in the identification of functional relationships from gene expression data......Page 353
13.6.1 Methods using quantitative expression data to identify correlations in expression between genes (clustering)......Page 356
13.6.2 Methods integrating functional genomics into cellular functional classes......Page 360
13.6.3 Methods combining functional genomics results and existing biological information to construct novel biological networks......Page 365
13.7 Advantages of networks exploration in molecular biology and drug discovery......Page 375
13.8 Practical examples of applying systems biology approaches and network exploration in the identi.cation of functional modules and disease-causing genes in complex phenotypes/diseases......Page 376
13.9 Challenges and future directions......Page 385
13.10 References......Page 386
Trends and conclusion......Page 389
Index......Page 391