Bioinformatics: An Introduction

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This invaluable textbook presents a self-contained introduction to the field of bioinformatics. Providing a comprehensive breadth of coverage while remaining accessibly concise, the text promotes a deep understanding of the field, supported by basic mathematical concepts, an emphasis on biological knowledge, and a holistic approach that highlights the connections unifying bioinformatics with other areas of science.

The thoroughly revised and enhanced fourth edition features new chapters focusing on regulation and control networks, the origins of life, evolution, statistics and causation, viruses, the microbiome, single cell analysis, drug discovery and forensic applications. This edition additionally includes new and updated material on the ontology of bioinformatics, data mining, ecosystems, and phenomics. Also covered are new developments in sequencing technologies, gene editing methods, and modelling of the brain, as well as state-of-the-art medical applications. Of special topicality is a new chapter on bioinformatics aspects of the coronavirus pandemic.

Topics and features:

  • Explains the fundamentals of set theory, combinatorics, probability, likelihood, causality, clustering, pattern recognition, randomness, complexity, systems, and networks
  • Discusses topics on ontogeny, phylogeny, genome structure, and regulation, as well as aspects of molecular biology
  • Critically examines the most significant practical applications, offering detailed descriptions of both the experimental process and the analysis of the data
  • Provides a varied selection of problems throughout the book, to stimulate further thinking
  • Encourages further reading through the inclusion of an extensive bibliography

This classic textbook builds upon the successful formula of previous editions with coverage of the latest advances in this exciting and fast-moving field. With its interdisciplinary scope, this unique guide will prove to be an essential study companion to a broad audience of undergraduate and beginning graduate students, spanning computer scientists focusing on bioinformatics, students of the physical sciences seeking a helpful primer on biology, and biologists desiring to better understand the theory underlying important applications of information science in biology.

Dr. Jeremy Ramsden is Hon. Prof. of Nanotechnology in the Department of Biomedical Research at the University of Buckingham, UK.

Author(s): Jeremy Ramsden
Series: Computational Biology, 21
Edition: 4
Publisher: Springer
Year: 2023

Language: English
Commentary: Publisher PDF
Pages: 426
City: Cham
Tags: Algorithms; Bioinformatics; Biology; Complexity; Genome; Knowledge Representation; Sequence Analysis

Preface to the Fourth Edition
Preface to the Third Edition
Preface to the Second Edition
Preface to the First Edition
Contents
1 Introduction
1.1 What is Bioinformatics?
1.2 What Can Bioinformatics Do?
1.3 An Ontology of Bioinformatics
1.4 The Organization of This Book
References
Part I Overview
2 Genotype, Phenotype, and Environment
References
3 Regulation and Control
3.1 The Concept of Machine
3.2 Regulation
3.3 Cybernetics
3.4 Adaptation
3.5 The Integrating Rôle of Directive Correlation
3.6 Timescales of Adaptation
3.7 The Architecture of Functional Systems
3.8 Autonomy and Heterarchical Architecture
3.9 Biological Information Processing
References
4 Evolution
4.1 Phylogeny and Evolution
4.1.1 Group and Kin Selection
4.1.2 Models of Evolution
4.2 Evolutionary Systems
4.3 Evolutionary Computing
4.4 Concluding Remarks on Evolution
References
5 Origins of Life and Earth Prehistory
References
Part II Information
6 The Nature of Information
6.1 Structure and Quantity
6.1.1 The Generation of Information
6.1.2 Conditional and Unconditional Information
6.1.3 Experiments and Observations
6.2 Constraint
6.2.1 The Value of Information
6.2.2 The Quality of Information
6.3 Accuracy, Meaning, and Effect
6.3.1 Accuracy
6.3.2 Meaning
6.3.3 Effect
6.3.4 Significs
6.4 Further Remarks on Information Generation and Reception
6.5 Summary
References
7 The Transmission of Information
7.1 The Capacity of a Channel
7.2 Coding
7.3 Decoding
7.4 Compression
7.4.1 Use of Compression to Measure Distance
7.4.2 Ergodicity
7.5 Noise
7.6 Error Correction
7.7 Summary
References
8 Sets and Combinatorics
8.1 The Notion of Set
8.2 Combinatorics
8.2.1 Ordered Sampling with Replacement
8.2.2 Ordered Sampling Without Replacement
8.2.3 Unordered Sampling Without Replacement
8.2.4 Unordered Sampling With Replacement
8.3 The Binomial Theorem
9 Probability and Likelihood
9.1 The Notion of Probability
9.2 Fundamentals
9.2.1 Generalized Union
9.2.2 Conditional Probability
9.2.3 Bernoulli Trials
9.3 Moments of Distributions
9.3.1 Runs
9.3.2 The Hypergeometric Distribution
9.3.3 The Law of Large Numbers
9.3.4 Additive and Multiplicative Processes
9.4 Likelihood
References
10 Statistics and Causation
10.1 A Brief Outline of Statistics
10.2 The Calculus of Causation
References
11 Randomness and Complexity
11.1 Random Processes
11.2 Markov Chains
11.3 Random Walks
11.4 The Generation of Noise
11.5 Complexity
11.6 Biological Complexity
References
12 Systems and Networks
12.1 General Systems Theory
12.1.1 Automata
12.1.2 Cellular Automata
12.1.3 Percolation
12.1.4 Systems Biology
12.2 Networks (Graphs)
12.2.1 Trees
12.2.2 Complexity Parameters of Networks
12.2.3 Dynamical Properties
12.3 Synergetics
12.4 Self-organization
References
13 Useful Algorithms
13.1 Pattern Recognition
13.2 Botryology
13.2.1 Clustering
13.2.2 Principal Component and Linear Discriminant Analyses
13.2.3 Wavelets
13.3 Multidimensional Scaling and Seriation
13.4 Visualization
13.5 The Maximum Entropy Method
References
Part III Biology
14 The Nature of Living Things
14.1 The Cell
14.2 Mitochondria
14.3 Metabolism
14.4 The Cell Cycle
14.4.1 The Chromosome
14.4.2 The Structures of Genome and Genes
14.4.3 The C-Value Paradox
14.4.4 The Structure of the Chromosome
14.5 Cancer
14.6 The Immune System
14.7 Molecular Mechanisms
14.7.1 Replication
14.7.2 Proofreading and Repair
14.7.3 Recombination
14.7.4 Summary of Sources of Genome Variation
14.8 Gene Expression
14.8.1 Transcription
14.8.2 Regulation of Transcription
14.8.3 Prokaryotic Transcriptional Regulation
14.8.4 Eukaryotic Transcriptional Regulation
14.8.5 mRNA Processing
14.8.6 Translation
14.9 Ontogeny (Development)
14.9.1 Stem cells
14.9.2 Epigenesis
14.9.3 The Epigenetic Landscape
14.9.4 ps: [/EMC pdfmark [/objdef Equ /Subtype /Span /ActualText (r) /StPNE pdfmark [/StBMC pdfmarkto.ps: [/EMC pdfmark [/Artifact <> /BDC pdfmark rps: [/EMC pdfmark [/StBMC pdfmark ps: [/EMC pdfmark [/StPop pdfmark [/StBMC pdfmark and ps: [/EMC pdfmark [/objdef Equ /Subtype /Span /ActualText (upper K) /StPNE pdfmark [/StBMC pdfmarkto.ps: [/EMC pdfmark [/Artifact <> /BDC pdfmark Kps: [/EMC pdfmark [/StBMC pdfmark ps: [/EMC pdfmark [/StPop pdfmark [/StBMC pdfmark Selection
14.9.5 Homeotic Genes
References
Chapter15TheMoleculesofLife
15.1MoleculesandSupramolecularStructure
15.1MoleculesandSupramolecularStructure
thletterofthealphabet.Thenextstageofcomplexityistoconsidermolecules(Table15.2)andmacromolecules(Table15.3).Thisisstillhighlyreductionist,however,itcorrespondstocalculatingShannonentropyfromthevocabularyofMacbeth.Wordsare,however,groupedintosentences,which,inturn,arearrangedintoparagraphs.Thecellisanalogouslyhighlystructured—moleculesaregroupedintosupramolecularcomplexes,which,inturn,areassembledintoorganelles.Thisstructure,someofwhichisvisibleintheopticalmicroscope,butwhichmostlyneedsthehigherresolutionoftheelectronmicroscope,isoftencalledultrastructure.Itisdifficulttoquantify—thatis,assignnumericalparameterstoit,withwhichdifferentsetsofobservationscanbecompared.Thehumaneyecanreadilyperceivedrasticchangesinultrastructurewhenacellissubjectedtoexternalstress,butgenerallythesechangeshavetobedescribedinwords.
.
15.1MoleculesandSupramolecularStructure
Element
500
Notypes
Element
1600
H
Element
.
15.2Water
10nm
.
Density
.
.
15.3DNA
.
TheO–Hinfraredspectrum(ofHODinliquidD
Bondedandnonbondedionsareinequilibrium:
atroomtemperatureorabout2.4kJ/mol)
.
Fig.15.2PolymerizedDNA.Theso-called
Fig.15.2PolymerizedDNA.Theso-called
endisatthelowerright(afterAgeno,1967;reproducedwithpermissionoftheAccademiadeiLincei)
to
Fig.15.2PolymerizedDNA.Theso-called
Fig.15.3Thehydrogen-bondingpatternsofcomplementarybases(thymine[T],adenine[A],gua-nine[G],cytosine[C],movingroundclockwisefromtheupperleft)(afterAgeno,1967;reproducedwithpermissionoftheAccademiadeiLincei).InRNA,uracil(U)replacesthymine(i.e.,themethylgrouponthebaseisreplacedbyhydrogen)andtheribosehasahydroxylgroup.ThelowerpairisdenotedbyCpG(Sect.14.8.4)
Fig.15.3Thehydrogen-bondingpatternsofcomplementarybases(thymine[T],adenine[A],gua-nine[G],cytosine[C],movingroundclockwisefromtheupperleft)(afterAgeno,1967;reproducedwithpermissionoftheAccademiadeiLincei).InRNA,uracil(U)replacesthymine(i.e.,themethylgrouponthebaseisreplacedbyhydrogen)andtheribosehasahydroxylgroup.ThelowerpairisdenotedbyCpG(Sect.14.8.4)
.
Asexpectedfromtheiraromaticstructure,thebasesareplanar.Figure15.4showstheformationofthedoublehelix.Thegenesofmostorganismsareformedbysuchadoublehelix.ThemeltingoftheH-bondsasthetemperatureisraisedishighly coöperative(duetotherepulsiveelectrostaticforcebetweenthechargedphosphategroups).Onaverage,theseparationintosinglestrandedDNAoccursatabout80
Fig.15.3Thehydrogen-bondingpatternsofcomplementarybases(thymine[T],adenine[A],gua-nine[G],cytosine[C],movingroundclockwisefromtheupperleft)(afterAgeno,1967;reproducedwithpermissionoftheAccademiadeiLincei).InRNA,uracil(U)replacesthymine(i.e.,themethylgrouponthebaseisreplacedbyhydrogen)andtheribosehasahydroxylgroup.ThelowerpairisdenotedbyCpG(Sect.14.8.4)
Table15.5summarizessomesignificantdiscoveriesrelatingtoDNA.
Discoveryorevent
Example:UCSCGenomeBrowser
Crick
Discoveryorevent
Atetranucleotidestructureelucidated
1944
Principalworker(s)
Discoveryorevent
where
(15.4)
isBoltz-mann’sconstant,and
.
15.4RNA
.
15.4RNA
.Theconceptcanbeillustratedbyfocusingonloopclo-sure,consideredtobethemostimportantfoldingevent.Thepotentialenergyistheenthalpy(i.e.,thenumber
RNAhasfivemainfunctions:asamessenger(mRNA),actingasanintermediaryinproteinsynthesis;asanenzyme(ribozymes);aspart(about60%byweight,therestbeingprotein)oftheribosome(rRNA);asthecarrierfortransferringaminoacidstothegrowingpolypeptidechainsynthesizedattheribosome(tRNA);andasamodulatorofDNA4andmRNAinteractions—smallinterferingRNA(siRNA;seeSect.14.8.4).
15.4RNA
Fig.15.5ApieceofRNA(fromtheQ
Fig.15.5ApieceofRNA(fromtheQ
15.5Proteins
Globularproteins
Fig.15.5ApieceofRNA(fromtheQ
whichmaybeverylarge,suchthattheyformgelsbyentanglement.Thepolypeptidebackboneisextensivelydecoratedwithrelativelyshortpolysac-charides.Typicallytheyactaslubricantsandengulfers(example:mucin);
.
.
whichmaybeverylarge,suchthattheyformgelsbyentanglement.Thepolypeptidebackboneisextensivelydecoratedwithrelativelyshortpolysac-charides.Typicallytheyactaslubricantsandengulfers(example:mucin);
.
whicharealsoglobular,butpermanentlyembedded(transversally)inalipidbilayermembrane.Theymainlyfunctionaschannels,energyandsignaltransducers,andmotors(examples:ATPase,bacteriorhodopsin,andporin).
whichmaybeverylarge,suchthattheyformgelsbyentanglement.Thepolypeptidebackboneisextensivelydecoratedwithrelativelyshortpolysac-charides.Typicallytheyactaslubricantsandengulfers(example:mucin);
.
4.4
.
.
denotesabenzenering.Squarebracketsdenotearingstructure
.
Fig.15.6Hydrogen-bondingcapabilitiesofthepeptidebackboneandthepolarresidues(afterBakerandHubbard).Residuesnotshownareincapableofhydrogenbondformation
Fig.15.6Hydrogen-bondingcapabilitiesofthepeptidebackboneandthepolarresidues(afterBakerandHubbard).Residuesnotshownareincapableofhydrogenbondformation
10AsshowninFig.15.6,someresiduescanalsoparticipateinhydrogen-bonding,butthebackbonepeptideH-bonds(orpotentialH-bonddonorsandacceptors)areofcoursemorenumerousand,hence,moresignificant.11Fernández(2012a,
.
Fig.15.6Hydrogen-bondingcapabilitiesofthepeptidebackboneandthepolarresidues(afterBakerandHubbard).Residuesnotshownareincapableofhydrogenbondformation
integrityrequiresthatthebackboneH-bondsbekeptdry.TheenergeticimportanceofH-bondwrapping(i.e.,protectionfromwater)canbeseenbynotingthattheenergyofahydrogenbondisstronglycontext-dependent.Inwater,itisabout2kJ/mol;invacuo,itincreaseseightfoldtotenfold.Wrappingwillthereforegreatlycontributetotheenthalpicstabilizationofglobularproteinconformation.
.
;mostsolublemonomericglobularproteinshavea
.
and
.
odictableoftheelementsinchemistry.Indeed,thedehydronconceptisneededtocomputationallyfoldapeptidechainabinitio.
.
.
15.5.3ProteinStructureDetermination
2.
isonlyabout11).
.
Crystallizetheprotein(oftenunusualsaltconditionsarerequired)andrecordtheX-raydiffractogram,15orcarryoutnuclearmagneticresonancespectroscopy(oneormoreof
Crystallizetheprotein(oftenunusualsaltconditionsarerequired)andrecordtheX-raydiffractogram,15orcarryoutnuclearmagneticresonancespectroscopy(oneormoreof
C,
7.
Crystallizetheprotein(oftenunusualsaltconditionsarerequired)andrecordtheX-raydiffractogram,15orcarryoutnuclearmagneticresonancespectroscopy(oneormoreof
hands.Thisgivesafairimpressionoftypicalproteinstructureatverylowresolution.
.
15.6Polysaccharides
.
Problem.Examinewhetherpolysaccharidescouldbeusedastheprimaryinforma-tioncarrierinacell.
Problem.Examinewhetherpolysaccharidescouldbeusedastheprimaryinforma-tioncarrierinacell.
15.7Lipids
Problem.Examinewhetherpolysaccharidescouldbeusedastheprimaryinforma-tioncarrierinacell.
Fig.15.8Somenaturallyoccurringlipidsandmembranecomponents:1,afattyacid;2,phos-phatidicacid;3,phosphatidylethanolamine;4,phosphatidylcholine;5,cardiolipin(diphosphatidyl- glycerol);6,cholesterol
Fig.15.8Somenaturallyoccurringlipidsandmembranecomponents:1,afattyacid;2,phos-phatidicacid;3,phosphatidylethanolamine;4,phosphatidylcholine;5,cardiolipin(diphosphatidyl- glycerol);6,cholesterol
Fig.15.8Somenaturallyoccurringlipidsandmembranecomponents:1,afattyacid;2,phos-phatidicacid;3,phosphatidylethanolamine;4,phosphatidylcholine;5,cardiolipin(diphosphatidyl- glycerol);6,cholesterol
thelipidsprobablyplayafarmoreactiverôlethanmerelyfunctioningasapassivematrixfortheprotein—whichmayconstitutemorethan50%ofthemembrane.Thecovalentattachmentofalipidmoleculetoaprotein,typicallyataterminalaminoacid,isasignificantformofpost-translationalmodification.Itisnowknownthattheeukaryoticlipidometypicallycomprisesmanyhun-dredsofdifferentmolecules,andtheirglobalanalysisrequireshigh-throughput techniques.Animportantdevelopmenthasbeen“shotgun”massspectrometryofthelipidsextractedbysolvents,20whichnotonlyenablesthedifferentlipidstobeidentified,butalsoquantifiestheirabundances.Thehighthroughputisachievedby considerableautomationoftheprocessandthedatahandlingiscomputationallyheavy.21
References
References
FernándezA(2012a)Epistructuraltensionpromotesproteinassociations.PhysRevLett108:188102
FernándezA,SosnickTR,ColubriA(2002)Dynamicsofhydrogenbonddesolvationinproteinfolding.JMolBiol321:659–675
AgenoM(1967)Lineediricercainfisicabiologica.AccadNazLincei102:3–50
References
SchwudkeD,SchuhmannK,HerzogR,BornsteinSR,ShevchenkoA(2011)Shotgunlipidomicsonhighresolutionmassspectrometers.ColdSpringHarbourPerspectBiol3:a004614
YetukuriL,EkroosK,Vidal-PuigA,OrešiˇcM(2008)Informaticsandcomputationalstrategiesforthestudyoflipids.MolBioSystems4:121–127
YetukuriL,EkroosK,Vidal-PuigA,OrešiˇcM(2008)Informaticsandcomputationalstrategiesforthestudyoflipids.MolBioSystems4:121–127
YetukuriL,EkroosK,Vidal-PuigA,OrešiˇcM(2008)Informaticsandcomputationalstrategiesforthestudyoflipids.MolBioSystems4:121–127
16 Environment and Ecology
16.1 Susceptibility to Disease
16.2 Toxicogenomics
16.3 Ecosystems Management
References
Part IV Omics
17 Genomics
17.1 DNA Sequencing
17.1.1 Extraction of Nucleic Acids
17.1.2 The Polymerase Chain Reaction
17.1.3 Sequencing
17.1.4 Expressed Sequence Tags
17.1.5 Next Generation Sequencing
17.2 DNA Methylation Profiling
17.3 Gene Identification
17.4 Extrinsic Methods
17.4.1 Database Reliability
17.4.2 Sequence Comparison and Alignment
17.4.3 Trace, Alignment, and Listing
17.4.4 Dynamic Programming Algorithms
17.5 Intrinsic Methods
17.5.1 Signals
17.5.2 Hidden Markov Models
17.6 Minimalist Approaches to Deciphering DNA
17.7 Phylogenies
17.8 Metagenomics
References
18 Transcriptomics and Proteomics
18.1 Transcriptomics
18.2 Proteomics
18.2.1 Two-Dimensional Gel Electrophoresis
18.2.2 Column Chromatography
18.2.3 Other Kinds of Electrophoresis
18.3 Protein Identification
18.4 Isotope-Coded Affinity Tags
18.5 Protein Microarrays
18.6 Protein Expression Patterns—Temporal and Spatial
18.7 The Kinome
References
19 Microbiomics
References
20 Viruses
20.1 Virus Structure and Life Cycle
20.2 Viruses as Pathogens
20.3 Virus Genome Sequencing
References
21 Single Cell Analysis and Multiomics
21.1 Experimental Methods
21.2 Applications to Disease and Other Phenomena
21.3 Beyond Sequence
References
22 Biological Signalling
22.1 The Complexity of Signal Transduction
22.2 Anatomy of Signal Transduction
22.3 Signalling Channel Capacities
22.4 Molecular Mechanism of Recognition and Actuation
22.5 Overcoming Noise
References
23 Regulatory Networks
23.1 Interactomics
23.2 Network Modelling
23.3 A Simple Example—Operons
23.4 Inference of Regulatory Networks
23.5 The Physical Chemistry of Interactions
23.6 Intermolecular Interactions
23.7 In Vivo Experimental Methods for Interactions
23.7.1 The Yeast Two-Hybrid Assay
23.7.2 Crosslinking
23.7.3 Correlated Expression
23.7.4 Other Methods
23.8 In Vitro Experimental Methods
23.8.1 Chromatography
23.8.2 Direct Affinity Measurement
23.8.3 Protein Chips
23.9 Interactions from Sequence
23.10 Global Statistics of Interactions
23.11 Metabolomics and Metabonomics
23.12 Data Collection
23.13 Data Analysis
23.14 Metabolic Regulation
23.14.1 Metabolic Control Analysis
23.14.2 The Metabolic Code
23.15 Metabolic Networks
References
24 The Nervous System
24.1 The Neuron and Neural Networks
24.2 Outstanding Problems
24.3 Artificial Neural Networks
24.4 Neurocomputation
References
25 Phenomics
25.1 Enzyme Activity-Based Protein Profiling
25.2 Phenotype Microarrays
25.3 Ethomics
25.4 Actimetry
25.5 Modeling Life
References
Part V Applications
26 Medicine and Disease
26.1 Infectious Diseases
26.2 Noninfectious Diseases
26.3 Personalized Medicine
26.4 Toward Automated Diagnosis
References
27 Drug Discovery
27.1 Routes to Discovery
27.2 Protein–Protein Interactions
27.3 Enhancing Control of Specificity
27.4 Drug–Drug Interactions
27.5 Nanodrugs
27.6 High-Throughput Experimental Approaches
27.7 Behaviour-Based Testing
References
28 Forensic Investigation
28.1 DNA Forensics in Criminal Investigations
28.2 Tracing Genetically Modified Ingredients in Food
References
29 Pandemics
References
30 Domestication
References
31 The Organization of Knowledge
31.1 Ontology
31.2 The Classification of Knowledge
31.3 Knowledge Representation
31.4 Data Mining
31.5 Text Mining
31.6 The Automation of Research
31.7 Big Data
References
Appendix Bibliography
Index