XML for Bioinformatics aims to provide biologists, software engineers, and bioinformatics professionals with a comprehensive introduction to XML and current XML applications in bioinformatics. The book will assume no background in XML, and take readers from basic to intermediate XML concepts. Core topics will include: fundamentals of XML, creating XML grammars, web services via SOAP, and parsing XML documents in Perl and Java.
Author(s): Ethan Cerami
Edition: 1
Publisher: Springer
Year: 2005
Language: English
Pages: 321
Cover......Page 1
XML for Bioinformatics......Page 3
ISBN 9780387230283......Page 4
Preface......Page 8
Contents......Page 12
1. Introduction to XML for Bioinformatics......Page 18
1.1.1 XML Defined......Page 19
1.1.2 Origins of XML......Page 21
1.1.3 The XML Family of Specifications......Page 22
1.1.4 Web Services Defined......Page 23
1.2 Using XML for Biological Data Exchange......Page 24
1.2.1 Case Study: The Distributed Annotation System......Page 25
1.2.2 XML Formats for Bioinformatics......Page 28
1.3.1 Advantages of XML......Page 29
1.3.2 Disadvantages of XML......Page 30
1.4.1 Articles......Page 31
1.4.2 Web Site and Web Resources......Page 32
2.1 Getting Started with BSML......Page 34
2.1.1 Using Genomic Workspace™......Page 37
2.2.1 Working with Elements......Page 39
2.2.2 Working with Attributes......Page 40
2.2.5 Processing Instructions......Page 41
2.2.6 Character Encoding......Page 42
2.2.7 CDATA Sections......Page 43
2.2.8 Creating Well-Formed XML Documents......Page 44
2.2.9 Creating Valid XML Documents......Page 45
2.2.10 Working with XML Parsers......Page 47
2.3.1 Why We Need XML Namespaces......Page 48
2.3.2 Declaring and Using XML Namespaces......Page 50
2.3.3 Declaring a Default Namespace......Page 51
2.4 Fundamentals of BSML......Page 52
2.4.2 BSML Document Structure......Page 53
2.4.3 Representing Sequences......Page 55
2.4.4 Representing Sequence Features......Page 56
2.4.5 Retrieving Live BSML Data via XEMBL......Page 62
2.5 Useful Resources......Page 64
3.1 Introduction to DTDs......Page 66
3.1.1 A Bird's-Eye View: Protein DTD......Page 67
3.1.2 Validating XML Documents......Page 69
3.2 Document Type Declarations......Page 72
3.3.1 EMPTY......Page 74
3.3.3 #PCDATA......Page 75
3.3.4 Child Elements......Page 76
3.3.5 Mixed Content......Page 77
3.4 Declaring Attributes......Page 78
3.4.1 Attribute Types......Page 79
3.4.2 Attribute Behaviors......Page 82
3.5.1 General Entities......Page 83
3.5.2 Parameter Entities......Page 86
3.5.4 Conditional DTD Sections......Page 87
3.6.1 NCBI and XML......Page 89
3.6.2 The TinySeq DTD......Page 90
4.1 Introduction to XML Schemas......Page 98
4.2 Essential Concepts: Representing Protein Data......Page 99
4.2.1 The <schema> element......Page 101
4.2.4 Global Elements vs. Local Elements......Page 103
4.2.5 Creating Instance Documents......Page 104
4.2.6 Validating Instance Documents......Page 105
4.3.1 Built-in Schema Types......Page 106
4.3.2 Working with Facets......Page 108
4.4.1 Introduction to Complex Types......Page 111
4.4.2 Declaring Empty Element Types......Page 113
4.4.3 Declaring Mixed Element Types......Page 114
4.4.4 Occurrence Constraints......Page 115
4.4.5 Declaring Default Values......Page 116
4.4.6 Compositors: Sequence and Choice......Page 117
4.4.7 Defining Named Complex Types......Page 119
4.5 Basic Namespace Issues......Page 120
4.6 Case Study: The HUPO PSI Molecular Interaction Format......Page 124
4.6.1 PSI-MI Schema Overview......Page 125
4.6.2 A Sample PSI-MI Instance Document......Page 126
4.6.3 Working with the PSI-MI Controlled Vocabulary......Page 130
5.1 Introduction to XML Parsing in Perl......Page 132
5.1.1 Tree-Based vs. Event-Based XML Parsers......Page 133
5.1.2 Installing Modules via CPAN......Page 134
5.2.2 SAX and Bioinformatics Applications......Page 135
5.2.4 Introduction to XML::SAX......Page 136
5.2.5 Using NCBI EFetch and XML::SAX......Page 142
5.3.1 DOM Traversal with XML::LibXML......Page 146
5.3.4 Using NCBI EFetch and XML::LibXML......Page 149
6.1 Genome Annotation......Page 154
6.2 Introduction to DAS......Page 157
6.3 DAS Protocol Overview......Page 158
6.3.1 Getting Started......Page 161
6.3.2 DAS Requests......Page 162
6.3.3 DAS Responses......Page 163
6.3.4 X-DAS-Capabilities Header......Page 165
6.4.1 Retrieving Data Sources......Page 166
6.4.2 Retrieving Entry Points......Page 168
6.4.3 Retrieving Sequence Data......Page 170
6.4.4 Retrieving Annotations......Page 172
6.5 Working with Reference Maps......Page 185
6.5.1 Traversing the Ensembl Reference Map......Page 186
6.5.2 Working with Evolving Reference Maps......Page 188
6.6 The Future of DAS......Page 189
7.1.1 A First Example......Page 192
7.1.2 The XMLReader Interface......Page 196
7.1.3 The ContentHandler Interface......Page 199
7.1.4 Extending the DefaultHandler......Page 201
7.1.5 Using InputSource Objects......Page 203
7.2.1 Checking for Well-Formedness......Page 205
7.2.2 Validating XML Documents: Overview......Page 207
7.2.4 The ErrorHandler Interface......Page 208
7.2.5 Validating against XML Schemas......Page 213
7.3.1 Working with Elements and Namespaces......Page 214
7.3.2 Working with Attributes......Page 219
7.4.1 Parsing DAS Feature Data......Page 221
7.4.2 Integrating with BioJava......Page 225
8.1.1 JDOM Package Overview......Page 232
8.1.2 Parsing XML Documents with JDOM......Page 233
8.2.1 Introduction to the JDOM Element API......Page 238
8.2.2 Traversing DAS Documents......Page 241
8.2.3 Parsing DAS dsn Documents......Page 246
8.3.1 Creating New Documents......Page 250
8.3.2 Creating New Elements......Page 251
8.3.3 A Complete Example......Page 252
8.4.1 Using JDAS......Page 255
8.4.2 The JDAS Source Code......Page 260
9.1.1 Web Services Defined......Page 264
9.1.2 Architectural Options......Page 267
9.2 Case Study: Introduction to the NCI caBIO Project......Page 268
9.2.1 Background: Connecting to caBIO via the Java RMI Interface......Page 270
9.3.1 Introduction to REST......Page 274
9.3.2 Connecting to the caBIO REST Interface......Page 275
9.3.3 Example Application: Command Line caBIO Browser......Page 279
9.4 Introduction to SOAP......Page 284
9.4.1 SOAP Overview......Page 285
9.4.2 Constructing SOAP Messages......Page 287
9.4.3 Transporting SOAP via HTTP......Page 290
9.5 Introduction to Apache Axis......Page 292
9.5.1 Building a Web Service with Axis......Page 293
9.5.2 Connecting to caBIO with Axis......Page 298
2 Amino Acid Codes......Page 300
Bibliography......Page 302
B......Page 308
C......Page 309
D......Page 310
E......Page 311
F......Page 312
H......Page 313
L......Page 314
N......Page 315
P......Page 316
S......Page 317
U......Page 319
W......Page 320
X......Page 321