Heredity performs literal communication of immensely long genomes through immensely long time intervals. Genomes nevertheless incur sporadic errors referred to as mutations which have significant and often dramatic effects, after a time interval as short as a human life. How can faithfulness at a very large timescale and unfaithfulness at a very short one be conciliated? The engineering problem of literal communication has been completely solved during the second half of the XX-th century. Originating in 1948 from Claude Shannon's seminal work, information theory provided means to measure information quantities and proved that communication is possible through an unreliable channel (by means left unspecified) up to a sharp limit referred to as its capacity, beyond which communication becomes impossible. The quest for engineering means of reliable communication, named error-correcting codes, did not succeed in closely approaching capacity until 1993 when Claude Berrou and Alain Glavieux invented turbocodes. By now, the electronic devices which invaded our daily lives (e.g., CD, DVD, mobile phone, digital television) could not work without highly efficient error-correcting codes. Reliable communication through unreliable channels up to the limit of what is theoretically possible has become a practical reality: an outstanding achievement, however little publicized. As an engineering problem that nature solved aeons ago, heredity is relevant to information theory. The capacity of DNA is easily shown to vanish exponentially fast, which entails that error-correcting codes must be used to regenerate genomes so as to faithfully transmit the hereditary message. Moreover, assuming that such codes exist explains basic and conspicuous features of the living world, e.g., the existence of discrete species and their hierarchical taxonomy, the necessity of successive generations and even the trend of evolution towards increasingly complex beings. Providing geneticists with an introduction to information theory and error-correcting codes as necessary tools of hereditary communication is the primary goal of this book. Some biological consequences of their use are also discussed, and guesses about hypothesized genomic codes are presented. Another goal is prompting communication engineers to get interested in genetics and biology, thereby broadening their horizon far beyond the technological field, and learning from the most outstanding engineer: Nature. Table of Contents: Foreword / Introduction / A Brief Overview of Molecular Genetics / An Overview of Information Theory / More on Molecular Genetics / More on Information Theory / An Outline of Error-Correcting Codes / DNA is an Ephemeral Memory / A Toy Living World / Subsidiary Hypothesis, Nested System / Soft Codes / Biological Reality Conforms to the Hypotheses / Identification of Genomic Codes / Conclusion and Perspectives
Author(s): Gerard Battail
Series: Synthesis Lectures on Biomedical Engineering
Publisher: Morgan and Claypool Publishers
Year: 2008
Language: English
Pages: 205
Contents......Page 3
Foreword......Page 9
I An Informal Overview......Page 19
Introduction......Page 20
Genetics and communication engineering......Page 21
A static view of the living world: species and taxonomy......Page 23
A dynamic view of the living world: evolution......Page 24
Regeneration versus replication......Page 26
DNA structure and replication......Page 27
DNA directs the construction of a phenotype......Page 28
From DNA to protein, and from a genome to a phenotype......Page 29
Genomes are very long......Page 30
Introduction......Page 31
Shannon's paradigm......Page 32
Single occurrence of events......Page 33
Entropy of a source......Page 35
Average mutual information, capacity of a channel......Page 39
Variants of Shannon's paradigm......Page 41
Source coding......Page 43
Channel coding......Page 44
Fundamental theorems......Page 45
Reception in the presence of errors......Page 47
Variant of Shannon's paradigm intended to genetics......Page 49
Computing an upper bound of DNA capacity......Page 51
Summary of the next chapters......Page 53
II Facts of Genetics and Information Theory......Page 55
More on Molecular Genetics......Page 56
Structure of double-strand DNA......Page 57
DNA as a long-lasting support of information......Page 59
Error-correction coding as an implicit hypothesis......Page 61
Principle of DNA replication......Page 62
Amino-acids and polypeptidic chains......Page 63
Synthesis of a polypeptidic chain......Page 64
A genome instructs the development and maintenance of a phenotype......Page 66
DNA recombination and crossing over......Page 67
Memoryless sources, Markovian sources, and their entropy......Page 69
A fundamental property of stationary ergodic sources......Page 72
Source coding using a source extension......Page 73
Kraft-McMillan inequality......Page 74
Fundamental theorem of source coding......Page 76
Fundamental theorem of channel coding......Page 78
Coding for the binary symmetric channel......Page 80
Principle of the algorithmic information theory......Page 83
Algorithmic complexity and its relation to randomness and entropy......Page 86
Sequences generated by random programs......Page 88
Information and its relationship to semantics......Page 90
Appendices......Page 94
Defining a message......Page 97
Describing a channel......Page 98
Error patterns on repeated symbols and their probability......Page 100
Decision on a repeated symbol by majority voting......Page 101
Soft decision on a repeated symbol......Page 102
A simple example......Page 103
Decoding the code taken as example using the syndrome......Page 105
Replication decoding of the code taken as example......Page 106
Designing easily decodable codes: low-density parity check codes......Page 108
Soft decoding of other block codes......Page 110
An outlook on the fundamental theorem of channel coding......Page 111
A geometrical interpretation......Page 112
Designing good error-correcting codes......Page 113
Convolutional encoding......Page 115
Systematic convolutional codes and their decoding......Page 119
The trellis diagram and its use for decoding......Page 121
Description and properties......Page 124
Symbol-by-symbol SISO decoding of turbocodes......Page 126
Variants and comments......Page 130
Conclusion......Page 132
III Necessity of Genomic Error Correcting Codes and its Consequences......Page 135
DNA is an Ephemeral Memory......Page 136
Symbol erasure probability......Page 137
Capacity computations, single-strand DNA......Page 138
Estimating the error frequency before correction......Page 140
Paradoxically, a permanent memory is ephemeral......Page 141
A simple model......Page 143
Computing statistical quantities …......Page 144
The initial memory content is progressively forgotten......Page 146
Introducing natural selection in the toy living world......Page 147
Example of a toy living world using a very simple code......Page 149
Evolution in the toy living world; phyletic graphs......Page 151
Description of a nested system......Page 155
Rate and length of component codes......Page 157
Distances in the nested system......Page 158
Consequences of the subsidiary hypothesis......Page 159
Soft Codes......Page 162
Introducing codes defined by a set of constraints......Page 163
Identifying the alphabets......Page 164
Potential genomic soft codes......Page 165
Biological soft codes form nested systems......Page 167
Further comments about genomic soft codes......Page 169
Is a eukaryotic gene a systematic codeword?......Page 170
Genomes are very redundant......Page 171
Taxonomy and phylogeny......Page 173
Correcting ability of genomic codes......Page 174
Nature must proceed with successive regenerations......Page 175
Joint implementation of replication and regeneration......Page 176
Saltationism in evolution......Page 177
Saltationism depends on the layer depth in the nested system......Page 178
Evolution is contingent......Page 179
Neighborhood in genomic and phenotypic spaces......Page 180
On genome comparisons expressed as percentages......Page 181
Identification of Genomic Codes......Page 182
A necessary collaboration of engineers and biologists......Page 183
Identifying component codes of the nested system......Page 184
Identifying regeneration means......Page 185
Genome distinction and conservation......Page 186
Difficulties with sexual reproduction......Page 187
Bibliography......Page 188
Biography......Page 193
Index......Page 199