XML for Bioinformatics aims to provide biologists, software engineers, and bioinformatics professionals with a comprehensive introduction to XML and current XML applications in bioinformatics. The book will assume no background in XML, and take readers from basic to intermediate XML concepts. Core topics will include: fundamentals of XML, creating XML grammars, web services via SOAP, and parsing XML documents in Perl and Java.
Author(s): Ethan Cerami
Edition: 1
Publisher: Springer
Year: 2005
Language: English
Pages: 311
City: New York
Preface......Page 6
Contents......Page 9
Introduction to XML for Bioinformatics......Page 14
1.1.1 XML Defined......Page 15
1.1.2 Origins of XML......Page 17
1.1.3 The XML Family of Specifications......Page 18
1.1.4 Web Services Defined......Page 19
1.2 Using XML for Biological Data Exchange......Page 20
1.2.1 Case Study: The Distributed Annotation System......Page 21
1.2.2 XML Formats for Bioinformatics......Page 24
1.3.1 Advantages of XML......Page 25
1.3.2 Disadvantages of XML......Page 26
1.4.1 Articles......Page 27
1.4.2 Web Site and Web Resources......Page 28
2.1 Getting Started with BSML......Page 29
2.1.1 Using Genomic Workspace......Page 32
2.2.1 Working with Elements......Page 34
2.2.2 Working with Attributes......Page 35
2.2.5 Processing Instructions......Page 36
2.2.6 Character Encoding......Page 37
2.2.7 CDATA Sections......Page 38
2.2.8 Creating Well-Formed XML Documents......Page 39
2.2.9 Creating Valid XML Documents......Page 40
2.2.10 Working with XML Parsers......Page 42
2.3.1 Why We Need XML Namespaces......Page 43
2.3.2 Declaring and Using XML Namespaces......Page 45
2.3.3 Declaring a Default Namespace......Page 46
2.4 Fundamentals of BSML......Page 47
2.4.2 BSML Document Structure......Page 48
2.4.3 Representing Sequences......Page 50
2.4.4 Representing Sequence Features......Page 51
2.4.5 Retrieving Live BSML Data via XEMBL......Page 57
2.5 Useful Resources......Page 59
3.1 Introduction to DTDs......Page 61
3.1.1 A Bird’s-Eye View: Protein DTD......Page 62
3.1.2 Validating XML Documents......Page 64
3.2 Document Type Declarations......Page 67
3.3.1 EMPTY......Page 69
3.3.3 #PCDATA......Page 70
3.3.4 Child Elements......Page 71
3.3.5 Mixed Content......Page 72
3.4 Declaring Attributes......Page 73
3.4.1 Attribute Types......Page 74
3.4.2 Attribute Behaviors......Page 77
3.5.1 General Entities......Page 78
3.5.2 Parameter Entities......Page 81
3.5.4 Conditional DTD Sections......Page 82
3.6.1 NCBI and XML......Page 84
3.6.2 The TinySeq DTD......Page 85
4.1 Introduction to XML Schemas......Page 92
4.2 Essential Concepts: Representing Protein Data......Page 93
4.2.1 The element......Page 95
4.2.4 Global Elements vs. Local Elements......Page 97
4.2.5 Creating Instance Documents......Page 98
4.2.6 Validating Instance Documents......Page 99
4.3.1 Built-in Schema Types......Page 100
4.3.2 Working with Facets......Page 102
4.4.1 Introduction to Complex Types......Page 105
4.4.2 Declaring Empty Element Types......Page 107
4.4.3 Declaring Mixed Element Types......Page 108
4.4.4 Occurrence Constraints......Page 109
4.4.5 Declaring Default Values......Page 110
4.4.6 Compositors: Sequence and Choice......Page 111
4.4.7 Defining Named Complex Types......Page 113
4.5 Basic Namespace Issues......Page 114
4.6 Case Study: The HUPO PSI Molecular Interaction Format......Page 118
4.6.1 PSI-MI Schema Overview......Page 119
4.6.2 A Sample PSI-MI Instance Document......Page 120
4.6.3 Working with the PSI-MI Controlled Vocabulary......Page 124
5.1 Introduction to XML Parsing in Perl......Page 126
5.1.1 Tree-Based vs. Event-Based XML Parsers......Page 127
5.1.2 Installing Modules via CPAN......Page 128
5.2.2 SAX and Bioinformatics Applications......Page 129
5.2.4 Introduction to XML::SAX......Page 130
5.2.5 Using NCBI EFetch and XML::SAX......Page 136
5.3.1 DOM Traversal with XML::LibXML......Page 140
5.3.4 Using NCBI EFetch and XML::LibXML......Page 143
6.1 Genome Annotation......Page 147
6.2 Introduction to DAS......Page 150
6.3 DAS Protocol Overview......Page 151
6.3.1 Getting Started......Page 154
6.3.2 DAS Requests......Page 155
6.3.3 DAS Responses......Page 156
6.3.4 X-DAS-Capabilities Header......Page 158
6.4.1 Retrieving Data Sources......Page 159
6.4.2 Retrieving Entry Points......Page 161
6.4.3 Retrieving Sequence Data......Page 163
6.4.4 Retrieving Annotations......Page 165
6.5 Working with Reference Maps......Page 178
6.5.1 Traversing the Ensembl Reference Map......Page 179
6.5.2 Working with Evolving Reference Maps......Page 181
6.6 The Future of DAS......Page 182
7.1.1 A First Example......Page 184
7.1.2 The......Page 188
7.1.3 The......Page 191
7.1.4 Extending the......Page 193
7.1.5 Using......Page 195
7.2.1 Checking for Well-Formedness......Page 197
7.2.2 Validating XML Documents: Overview......Page 199
7.2.4 The......Page 200
7.2.5 Validating against XML Schemas......Page 205
7.3.1 Working with Elements and Namespaces......Page 206
7.3.2 Working with Attributes......Page 211
7.4.1 Parsing DAS Feature Data......Page 213
7.4.2 Integrating with BioJava......Page 217
8.1.1 JDOM Package Overview......Page 223
8.1.2 Parsing XML Documents with JDOM......Page 224
8.2.1 Introduction to the JDOM Element API......Page 229
8.2.2 Traversing DAS Documents......Page 232
8.2.3 Parsing DAS......Page 237
8.3.1 Creating New Documents......Page 241
8.3.2 Creating New Elements......Page 242
8.3.3 A Complete Example......Page 243
8.4.1 Using JDAS......Page 246
8.4.2 The JDAS Source Code......Page 251
9.1.1 Web Services Defined......Page 255
9.1.2 Architectural Options......Page 258
9.2 Case Study: Introduction to the NCI caBIO Project......Page 259
9.2.1 Background: Connecting to caBIO via the Java RMI Interface......Page 261
9.3.1 Introduction to REST......Page 265
9.3.2 Connecting to the caBIO REST Interface......Page 266
9.3.3 Example Application: Command Line caBIO Browser......Page 270
9.4 Introduction to SOAP......Page 275
9.4.1 SOAP Overview......Page 276
9.4.2 Constructing SOAP Messages......Page 278
9.4.3 Transporting SOAP via HTTP......Page 281
9.5 Introduction to Apache Axis......Page 283
9.5.1 Building a Web Service with Axis......Page 284
9.5.2 Connecting to caBIO with Axis......Page 289
2 Amino Acid Codes......Page 291
Bibliography......Page 293
Index......Page 298