"In this book, Andy Baxevanis and Francis Ouellette . . . have undertaken the difficult task of organizing the knowledge in this field in a logical progression and presenting it in a digestible form. And they have done an excellent job. This fine text will make a major impact on biological research and, in turn, on progress in biomedicine. We are all in their debt."-Eric Lander from the Foreword Reviews from the First Edition "...provides a broad overview of the basic tools for sequence analysis ... For biologists approaching this subject for the first time, it will be a very useful handbook to keep on the shelf after the first reading, close to the computer."-Nature Structural Biology "...should be in the personal library of any biologist who uses the Internet for the analysis of DNA and protein sequence data." -Science "...a wonderful primer designed to navigate the novice through the intricacies of in scripto analysis ... The accomplished gene searcher will also find this book a useful addition to their library ... an excellent reference to the principles of bioinformatics."-Trends in Biochemical Sciences This new edition of the highly successful Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins provides a sound foundation of basic concepts, with practical discussions and comparisons of both computational tools and databases relevant to biological research. Equipping biologists with the modern tools necessary to solve practical problems in sequence data analysis, the Second Edition covers the broad spectrum of topics in bioinformatics, ranging from Internet concepts to predictive algorithms used on sequence, structure, and expression data. With chapters written by experts in the field, this up-to-date reference thoroughly covers vital concepts and is appropriate for both the novice and the experienced practitioner. Written in clear, simple language, the book is accessible to users without an advanced mathematical or computer science background. This new edition includes: * All new end-of-chapter Web resources, bibliographies, and problem sets * Accompanying Web site containing the answers to the problems, as well as links to relevant Web resources * New coverage of comparative genomics, large-scale genome analysis, sequence assembly, and expressed sequence tags * A glossary of commonly used terms in bioinformatics and genomics Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition is essential reading for researchers, instructors, and students of all levels in molecular biology and bioinformatics, as well as for investigators involved in genomics, positional cloning, clinical research, and computational biology.
Author(s): Andreas D. Baxevanis, B. F. Francis Ouellette
Series: Methods of Biochemical Analysis
Edition: 2 Sub
Publisher: Wiley-Interscience
Year: 2001
Language: English
Pages: 505
Front Cover......Page 1
THE GENBANK SEQUENCE DATABASE 45......Page 8
GENOMIC MAPPING AND MAPPING DATABASES 111......Page 9
CREATION AND ANALYSIS OF PROTEIN MULTIPLE SEQUENCE ALIGNMENTS 215......Page 10
EXPRESSED SEQUENCE TAGS (ESTs) 283......Page 11
COMPARATIVE GENOME ANALYSIS 359......Page 12
USING PERL TO FACILITATE BIOLOGICAL ANALYSIS 413......Page 13
FOREWORD......Page 14
PREFACE......Page 16
CONTRIBUTORS......Page 18
1 BIOINFORMATICS AND THE INTERNET......Page 20
INTERNET BASICS......Page 21
Copper Wires, Coaxial Cables, and Fiber Optics......Page 23
Content Providers vs. ISPs......Page 25
ELECTRONIC MAIL......Page 26
FILE TRANSFER PROTOCOL......Page 29
Navigation on the World Wide Web......Page 32
Finding Information on the World Wide Web......Page 33
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 1......Page 35
REFERENCES......Page 36
INTRODUCTION Why Use a Data Model?......Page 38
Some Examples of the Model......Page 39
What to Define?......Page 42
PUBs: PUBLICATIONS OR PERISH......Page 43
Authors......Page 44
Patents......Page 45
MEDLINE and PubMed Identifiers......Page 46
Accession Number......Page 47
gi Number......Page 48
Accession Numbers on Protein Sequences......Page 49
BIOSEQs: SEQUENCES......Page 50
Sequences are Different......Page 51
Nucleotide/Protein Sets......Page 53
Seq-feat: Features......Page 54
The Sequence Is Not the Alignment......Page 57
Data Representations of Alignments......Page 58
MolInfo: Molecule Information......Page 59
BLAST......Page 60
Sequin......Page 61
REFERENCES......Page 62
INTRODUCTION......Page 64
FORMAT VS. CONTENT: COMPUTERS VS. HUMANS......Page 66
THE GENBANK FLATFILE: A DISSECTION......Page 68
The Header......Page 69
The Feature Table......Page 74
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 3......Page 77
APPENDICES Appendix 3.1. Example of GenBank Flatfile Format......Page 78
Appendix 3.2. Example of EMBL Flatfile Format......Page 80
Appendix 3.3. Example of a Record in CON Division......Page 82
INTRODUCTION......Page 84
WHY, WHERE, AND WHAT TO SUBMIT?......Page 85
DNA/RNA......Page 86
Coding Sequence(s)......Page 87
PROTEIN-ONLY SUBMISSIONS......Page 88
HOW TO SUBMIT WITH SEQUIN......Page 89
Submission Made Easy......Page 90
Entering a Single Nucleotide Sequence and its Protein Products......Page 91
Entering an Aligned Set of Sequences......Page 92
Viewing the Sequence Record......Page 93
Validation......Page 94
Advanced Annotation and Editing Functions......Page 95
CONSEQUENCES OF THE DATA MODEL......Page 96
Using Sequin as a Workbench......Page 97
CONCLUDING REMARKS......Page 98
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 4......Page 99
REFERENCES......Page 100
INTRODUCTION TO STRUCTURES......Page 102
Coordinates, Sequences, and Chemical Graphs......Page 103
Atoms, Bonds, and Completeness......Page 104
PDB Query and Reporting......Page 106
Sequences from Structure Records......Page 108
Validating PDB Sequences......Page 109
MMDB: MOLECULAR MODELING DATABASE AT NCBI......Page 110
Entrez Neighboring: Known Sequence Similarities......Page 111
mmCIF......Page 113
VISUALIZING STRUCTURAL INFORMATION Multiple Representation Styles......Page 114
NMR Models and Ensembles......Page 116
Local Dynamics......Page 118
DATABASE STRUCTURE VIEWERS......Page 119
MMDB Viewer: Cn3D......Page 120
Making Presentation Graphics......Page 121
STRUCTURE SIMILARITY SEARCHING......Page 122
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 5......Page 125
REFERENCES......Page 126
6 GENOMIC MAPPING AND MAPPING DATABASES......Page 130
INTERPLAY OF MAPPING AND SEQUENCING......Page 131
Polymorphic Markers......Page 132
DNA Clones......Page 133
Genetic Linkage Maps......Page 134
Transcript Maps......Page 136
Physical Maps......Page 137
Integrated Maps......Page 138
COMPLEXITIES AND PITFALLS OF MAPPING......Page 139
GDB......Page 141
NCBI......Page 142
MAPPING PROJECTS AND ASSOCIATED RESOURCES......Page 146
Cytogenetic Resources......Page 147
Genetic Linkage Map Resources......Page 149
Radiation Hybrid Map Resources......Page 150
STS Content Maps and Resources......Page 153
DNA Sequence......Page 154
Integrated Maps and Genomic Cataloguing......Page 155
Comparative Resources......Page 157
Single-Chromosome and Regional Map Resources......Page 159
Defining a Genomic Region......Page 161
Determining and Ordering the Contents of a Defined Region......Page 162
Defining a Map Position From a Clone or DNA Sequence......Page 164
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 6......Page 165
REFERENCES......Page 168
7 INFORMATION RETRIEVAL FROM BIOLOGICAL DATABASES......Page 174
Neighboring......Page 175
Implementations......Page 177
The Entrez Discovery Pathway: Examples......Page 178
LOCUSLINK......Page 191
SEQUENCE DATABASES BEYOND NCBI......Page 197
MEDICAL DATABASES......Page 200
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 7......Page 202
REFERENCES......Page 204
INTRODUCTION......Page 206
THE EVOLUTIONARY BASIS OF SEQUENCE ALIGNMENT......Page 207
THE MODULAR NATURE OF PROTEINS......Page 209
OPTIMAL ALIGNMENT METHODS......Page 212
SUBSTITUTION SCORES AND GAP PENALTIES......Page 214
DATABASE SIMILARITY SEARCHING......Page 217
FASTA......Page 219
BLAST......Page 221
DATABASE SEARCHING ARTIFACTS......Page 223
POSITION-SPECIFIC SCORING MATRICES......Page 227
SPLICED ALIGNMENTS......Page 228
CONCLUSIONS......Page 229
REFERENCES......Page 231
INTRODUCTION......Page 234
STRUCTURAL ALIGNMENT OR EVOLUTIONARY ALIGNMENT?......Page 235
HOW TO MULTIPLY ALIGN SEQUENCES......Page 236
Assessing Quality of Alignment......Page 237
Hierarchical Methods......Page 238
More Rigorous Nonhierarchical Methods......Page 240
TOOLS TO ASSIST THE ANALYSIS OF MULTIPLE ALIGNMENTS......Page 241
Subalignments—AMAS......Page 242
Secondary Structure Prediction and the Prediction of Buried Residues From Multiple Sequence Alignment......Page 244
COLLECTIONS OF MULTIPLE ALIGNMENTS......Page 246
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 9......Page 247
REFERENCES......Page 249
10 PREDICTIVE METHODS USING DNA SEQUENCES......Page 252
GRAIL......Page 254
FGENEH/FGENES......Page 255
MZEF......Page 257
GENSCAN......Page 259
PROCRUSTES......Page 260
GeneParser......Page 264
HOW WELL DO THE METHODS WORK?......Page 265
STRATEGIES AND CONSIDERATIONS......Page 267
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 10......Page 269
REFERENCES......Page 270
11 PREDICTIVE METHODS USING PROTEIN SEQUENCES......Page 272
AACompIdent and AACompSim (ExPASy)......Page 273
PROPSEARCH......Page 274
PHYSICAL PROPERTIES BASED ON SEQUENCE Compute pI/MW and ProtParam (ExPASy)......Page 276
TGREASE......Page 277
MOTIFS AND PATTERNS......Page 278
ProfileScan......Page 279
BLOCKS......Page 280
CDD......Page 281
SECONDARY STRUCTURE AND FOLDING CLASSES......Page 282
nnpredict......Page 283
PredictProtein......Page 284
PREDATOR......Page 286
Comparison of Methods......Page 287
Coiled Coils......Page 288
Transmembrane Regions......Page 290
Signal Peptides......Page 291
Nonglobular Regions......Page 292
TERTIARY STRUCTURE......Page 293
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 11......Page 296
REFERENCES......Page 298
12 EXPRESSED SEQUENCE TAGS (ESTs)......Page 302
WHAT IS AN EST?......Page 303
How to Access ESTs......Page 304
Limitations of EST Data......Page 305
UniGene......Page 307
STACK......Page 312
THE HUMAN GENE MAP......Page 313
GENE PREDICTION IN GENOMIC DNA......Page 314
CGAP......Page 315
Microarrays......Page 316
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 12......Page 317
REFERENCES......Page 318
13 SEQUENCE ASSEMBLY AND FINISHING METHODS......Page 322
THE USE OF BASE CALL ACCURACY ESTIMATES OR CONFIDENCE VALUES......Page 324
GLOBAL ASSEMBLY......Page 325
FILE FORMATS......Page 326
Phrapview......Page 327
THE CONTIG SELECTOR......Page 330
THE CONTIG COMPARATOR......Page 331
THE TEMPLATE DISPLAY......Page 332
THE CONTIG EDITOR......Page 335
EXPERIMENT SUGGESTION AND AUTOMATION......Page 338
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 13......Page 340
REFERENCES......Page 341
14 PHYLOGENETIC ANALYSIS......Page 342
FUNDAMENTAL ELEMENTS OF PHYLOGENETIC MODELS......Page 344
PHYLOGENETIC DATA ANALYSIS: THE FOUR STEPS......Page 346
ALIGNMENT: BUILDING THE DATA MODEL......Page 348
ALIGNMENT: EXTRACTION OF A PHYLOGENETIC DATA SET......Page 352
Models of Substitution Rates Between Bases......Page 354
Models of Among-Site Substitution Rate Heterogeneity......Page 356
Models of Substitution Rates Between Amino Acids......Page 357
Which Substitution Model to Use?......Page 358
TREE-BUILDING METHODS......Page 359
Distance-Based Methods......Page 360
Character-Based Methods......Page 362
Searching for Trees......Page 364
Randomized Trees (Skewness Test)......Page 365
Bootstrap......Page 366
PHYLOGENETICS SOFTWARE......Page 367
PHYLIP......Page 368
PAUP......Page 371
PUZZLE or TREE-PUZZLE......Page 372
INTERNET-ACCESSIBLE PHYLOGENETIC ANALYSIS SOFTWARE......Page 373
BLAST2 & Orthologue Search Server......Page 374
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 14......Page 375
REFERENCES......Page 376
15 COMPARATIVE GENOME ANALYSIS......Page 378
General-Purpose Databases for Comparative Genomics......Page 379
Organism-Specific Databases......Page 384
GENOME ANALYSIS AND ANNOTATION......Page 385
Using Genome Comparison for Prediction of Protein Functions......Page 386
APPLICATION OF COMPARATIVE GENOMICS—RECONSTRUCTION OF METABOLIC PATHWAYS......Page 401
Glycolysis Step-by-Step......Page 402
Error Propagation and Incomplete Information in Databases......Page 404
Genome, Protein, and Organismal Context as a Source of Errors......Page 405
INTERNET RESOURCES FOR TOPICS PRESENTED IN CHAPTER 15......Page 406
REFERENCES......Page 409
INTRODUCTION......Page 412
TECHNOLOGIES FOR LARGE-SCALE GENE EXPRESSION Measurements......Page 413
Informatics Aspects of Microarray Production......Page 414
What is Actually Measured?......Page 415
COMPUTATIONAL TOOLS FOR EXPRESSION ANALYSIS Public Databases......Page 418
HIERARCHICAL CLUSTERING......Page 426
PROSPECTS FOR THE FUTURE......Page 428
REFERENCES......Page 429
17 USING PERL TO FACILITATE BIOLOGICAL ANALYSIS......Page 432
GETTING STARTED......Page 433
HOW SCRIPTS WORK......Page 435
STRINGS, NUMBERS, AND VARIABLES......Page 436
ARITHMETIC......Page 437
VARIABLE INTERPOLATION......Page 438
BASIC INPUT AND OUTPUT......Page 439
FILEHANDLES......Page 441
MAKING DECISIONS......Page 443
CONDITIONAL BLOCKS......Page 446
LOOPS......Page 449
COMBINING LOOPS WITH INPUT......Page 451
STANDARD INPUT AND OUTPUT......Page 452
FINDING THE LENGTH OF A SEQUENCE FILE......Page 454
PATTERN MATCHING......Page 455
EXTRACTING PATTERNS......Page 459
ARRAYS......Page 460
SPLIT AND JOIN......Page 463
HASHES......Page 464
A REAL-WORLD EXAMPLE......Page 465
SUGGESTED READING......Page 468
GLOSSARY......Page 470
INDEX......Page 476