This book covers a wide range of subjects in applying machine learning approaches for bioinformatics projects. The book succeeds on two key unique features. First, it introduces the most widely used machine learning approaches in bioinformatics and discusses, with evaluations from real case studies, how they are used in individual bioinformatics projects. Second, it introduces state-of-the-art bioinformatics research methods. The theoretical parts and the practical parts are well integrated for readers to follow the existing procedures in individual research. Unlike most of the bioinformatics books on the market, the content coverage is not limited to just one subject. A broad spectrum of relevant topics in bioinformatics including systematic data mining and computational systems biology researches are brought together in this book, thereby offering an efficient and convenient platform for teaching purposes. An essential reference for both final year undergraduates and graduate students in universities, as well as a comprehensive handbook for new researchers, this book will also serve as a practical guide for software development in relevant bioinformatics projects.
Author(s): Yang Z.R.
Series: Science, Engineering, and Biology Informatics
Publisher: WS
Year: 2010
Language: English
Pages: 337
Tags: Биологические дисциплины;Матметоды и моделирование в биологии;Биоинформатика;
Preface......Page 6
Contents......Page 10
1 Introduction......Page 16
1.1 Brief history of bioinformatics......Page 18
1.2 Database application in bioinformatics......Page 21
1.3 Web tools and services for sequence homology Alignment......Page 23
1.3.1 Web tools and services for protein functional site identification......Page 24
1.4 Pattern analysis......Page 25
1.5 The contribution of information technology......Page 26
1.6 Chapters......Page 27
2 Introduction to Unsupervised Learning......Page 30
3.1 Histogram approach......Page 39
3.2 Parametric approach......Page 40
3.3.1 K-nearest neighbour approach......Page 43
3.3.2 Kernel approach......Page 44
Summary......Page 51
4.1 General......Page 53
4.2 Principal component analysis......Page 54
4.3 An application of PCA......Page 57
4.4 Multi-dimensional scaling......Page 61
4.5 Application of the Sammon algorithm to gene data......Page 63
Summary......Page 65
5.1 Hierarchical clustering......Page 67
5.2 K-means......Page 70
5.3 Fuzzy C-means......Page 73
5.4 Gaussian mixture models......Page 75
5.5 Application of clustering algorithms to the Burkholderia pseudomallei gene expression data......Page 79
Summary......Page 82
6.1 Vector quantization......Page 84
6.2 SOM structure......Page 88
6.3 SOM learning algorithm......Page 90
6.4 Using SOM for classification......Page 94
6.5.1 Sequence analysis......Page 96
6.5.2 Gene expression data analysis......Page 98
6.6 A case study of gene expression data analysis......Page 101
6.7 A case study of sequence data analysis......Page 103
Summary......Page 105
7.1 General concepts......Page 107
7.2 General definition......Page 109
7.3 Model evaluation......Page 111
7.4 Data organisation......Page 116
Summary......Page 118
8.1 Linear discriminant analysis......Page 119
8.2 Generalised discriminant analysis......Page 124
8.3 K-nearest neighbour......Page 126
Summary......Page 133
9.1 Introduction......Page 135
9.2 Basic principle for constructing a classification tree......Page 136
9.3 Classification and regression tree......Page 140
9.4 CART for compound pathway involvement prediction......Page 141
9.5 The random forest algorithm......Page 143
9.6 RF for analyzing Burkholderia pseudomallei gene expression profiles......Page 144
Summary......Page 147
10.1 Introduction......Page 148
10.2.2 Learning rules......Page 152
10.3.1 Regression......Page 160
10.3.2 Classification......Page 161
10.3.3 Procedure......Page 162
10.4.1 Bio-chemical data analysis......Page 163
10.4.3 Protein structure data analysis......Page 164
10.5 A case study on Burkholderia pseudomallei gene expression data......Page 165
Summary......Page 168
11.1 Introduction......Page 169
11.2 Radial-basis function neural network (RBFNN)......Page 171
11.3 Bio-basis function neural network......Page 177
11.4 Support vector machine......Page 183
11.5 Relevance vector machine......Page 188
Summary......Page 191
12.1 Markov model......Page 192
12.2.1 General definition......Page 194
12.2.2 Handling HMM......Page 198
12.2.3 Evaluation......Page 199
12.2.4 Decoding......Page 203
12.2.5 Learning......Page 204
12.3 HMM for sequence classification......Page 206
Summary......Page 209
13.1 Built-in strategy......Page 210
13.1.1 Lasso regression......Page 211
13.1.2 Ridge regression......Page 214
13.1.3 Partial least square regression (PLS) algorithm......Page 215
13.3 Heuristic strategy – orthogonal least square approach......Page 219
13.4 Criteria for feature selection......Page 223
13.4.1 Correlation measure......Page 224
13.4.3 Mutual information approach......Page 225
Summary......Page 227
14 Feature Extraction (Biological Data Coding)......Page 228
14.1 Molecular sequences......Page 229
14.2 Chemical compounds......Page 230
14.4.1 Peptide feature extraction......Page 231
14.4.2 Whole sequence feature extraction......Page 237
Summary......Page 239
15.1 Nitration site prediction......Page 240
15.2 Plant promoter region prediction......Page 245
Summary......Page 252
16.1 Gene regulatory network......Page 253
16.2 Causal networks, networks, graphs......Page 256
16.3 A brief review of the probability......Page 257
16.4 Discrete Bayesian network......Page 260
16.5 Inference with discrete Bayesian network......Page 261
16.7 Bayesian networks for gene regulartory networks......Page 262
16.8 Bayesian networks for discovering peptide patterns......Page 263
16.9 Bayesian networks for analysing Burkholderia pseudomallei gene data......Page 264
Summary......Page 267
17.1 Michealis-Menten change law......Page 268
17.2 S-system......Page 271
17.3 Simplification of an S-system......Page 274
17.4.1 Neural network approach......Page 275
17.4.2 Simulated annealing approach......Page 276
17.5 Steady-state analysis of an S-system......Page 277
17.6 Sensitivity of an S-system......Page 282
Summary......Page 283
18 Future Directions......Page 284
18.1 Multi-source data......Page 285
18.2 Gene regulatory network construction......Page 287
18.3 Building models using incomplete data......Page 289
18.4 Biomarker detection from gene expression data......Page 290
Summary......Page 293
References......Page 294
Index......Page 334