Feedforward Neural Network Methodology (Springer Series in Statistics)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This decade has seen an explosive growth in computational speed and memory and a rapid enrichment in our understanding of artificial neural networks. These two factors provide systems engineers and statisticians with the ability to build models of physical, economic, and information-based time series and signals. This book provides a thorough and coherent introduction to the mathematical properties of feedforward neural networks and to the intensive methodology which has enabled their highly successful application to complex problems.

Author(s): Terrence L. Fine
Edition: 1
Year: 1999

Language: English
Pages: 357

Preface......Page 6
Contents......Page 10
List of Figures......Page 16
1.1 Objectives and Setting......Page 18
1.2.1 Biological and Philosophical......Page 21
1.2.2 Computational......Page 22
1.2.3 Systems Applications......Page 23
1.3 Historical Background......Page 26
1.4.1 Neuronal Properties......Page 27
1.4.2 Mathematical Model of a Neuron......Page 30
1.5.1 Role of Networks in Function Representation......Page 31
1.5.2 Issues in Function Representation......Page 32
1.5.3 Order of Treatment......Page 33
2.1 Objectives and Setting......Page 34
2.2.1 Definitions......Page 36
2.2.2 Characterizations through Convexity......Page 37
2.2.3 Properties of Solutions......Page 39
2.3.1 The Upper Bound D(n, d )......Page 40
2.4 Probabilistic Interpretation, Asymptotics, and Capacity......Page 44
2.5 Complexity of Implementations......Page 46
2.6.1 The PTA......Page 48
2.6.2 Convergence of the PTA......Page 49
2.6.3 First Alternatives to PTA......Page 52
2.7.1 Behavior of the PTA......Page 53
2.7.2 Further Alternatives to PTA......Page 54
2.8 The Augmented Perceptron—Support Vector Machines......Page 55
2.9 Generalization Ability and Applications of Perceptrons......Page 58
2.10.1 Binary-Valued Inputs......Page 59
2.10.2 Sigmoidal Nodes and Radial Basis Functions......Page 60
2.11 Appendix 1: Useful Facts about Binomial Coefficients and Their Sums......Page 61
2.12 Appendix 2: Proofs of Results from Section 2.3......Page 62
2.13 Appendix 3: MATLAB Program for Perceptron Training......Page 67
3.1 Objectives and Setting......Page 70
3.2.2 Architecture......Page 71
3.2.3 Notation......Page 74
3.2.4 Purpose of the Network......Page 75
3.3 Boolean Functions......Page 76
3.4.1 Error-Free Learning......Page 78
3.4.2 Learning with Errors......Page 81
3.5.1 Growth Function and VC Capacity......Page 83
3.5.2 VC Capacity of Networks......Page 85
3.6 Partitioning the Space of Inputs......Page 88
3.6.1 Enumeration......Page 89
3.6.2 Construction of Partitions......Page 90
3.6.3 Limitations of a Single Hidden Layer Architecture......Page 91
3.7 Approximating to Functions......Page 93
3.8 Appendix: MATLAB Program for the Sandwich Construction......Page 94
3.9 Appendix: Proof of Theorem 3.5.1......Page 95
4.1.1 Objectives......Page 98
4.1.3 Single Hidden Layer Functions......Page 99
4.1.4 Multiple Hidden Layers–Multilayer Perceptrons......Page 102
4.2.1 Uniqueness of Network Representations of Functions......Page 103
4.2.2 Closure under Affine Combinations and Input Transformations......Page 104
4.2.4 Stability: Taylor’s Series......Page 105
4.2.5 Approximating to Step, Bounded Variation, and Continuous Functions......Page 107
4.3 Formalization of Function Approximation......Page 109
4.4 Outline of Approaches......Page 110
4.5 Implementation via Kolmogorov’s Solution to Hilbert’s 13th Problem......Page 112
4.6 Implementation via Stone-Weierstrass Theorem......Page 113
4.7 Single Hidden Layer Network Approximations to Continuous Functions......Page 116
4.8.1 Measurable Functions......Page 120
4.8.2 Enumerating the Constructable Partition Functions......Page 121
4.8.3 Constructing Integrable Functions......Page 122
4.8.4 Implementing Partially Specified Functions......Page 123
4.9.1 Approximating to Derivatives......Page 124
4.9.2 Approximating to Inverses......Page 125
4.10 Choice of Node Functions......Page 127
4.11.1 A Hilbert Space Setting......Page 128
4.11.2 Convex Classes of Functions......Page 130
4.11.3 Upper Bound to Approximation Error......Page 131
4.11.4 A Class of Functions for which d[Sub(ā)] = 0......Page 132
4.12 Fundamental Limits to Nonlinear Function Approximation......Page 134
4.14 Appendix 4.1: Linear Vector Spaces......Page 137
4.15 Appendix 4.2: Metrics and Norms......Page 138
4.16 Appendix 4.3: Topology......Page 140
4.17 Appendix 4.4: Proof of Theorem 4.11.1......Page 141
5.1.1 Error Surface......Page 146
5.1.2 Taylor’s Series Expansion of the Error Function......Page 148
5.1.4 Outline of Algorithmic Approaches......Page 151
5.2.1 Notation......Page 153
5.2.2 Gradients for a Single Hidden Layer Network......Page 155
5.2.3 Gradients for MLP: Backpropagation......Page 156
5.2.4 Calculational Efficiency of Backpropagation......Page 160
5.3.1 Overview and Startup Issues......Page 161
5.3.2 Iterative Descent Algorithms......Page 164
5.3.3 Approximate Line Search......Page 168
5.3.4 Search Termination......Page 171
5.4.1 Locally Optimal Steepest Descent......Page 172
5.4.2 Choice of Constant Learning Rate α......Page 175
5.4.4 Adaptively Chosen Step Size......Page 178
5.4.5 Summary of Steepest Descent Algorithms......Page 179
5.4.6 Momentum Smoothing......Page 180
5.5.1 Conjugacy and Its Implications......Page 181
5.5.2 Selecting Conjugate Gradient Directions......Page 183
5.5.4 Summary of the Conjugate Gradient Directions Algorithm......Page 186
5.6.1 Outline of Approach......Page 187
5.7 Levenberg-Marquardt Algorithms......Page 190
5.8 Remarks on Computing the Hessian......Page 192
5.9 Training Is Intrinsically Difficult......Page 193
5.10 Learning from Several Networks......Page 194
5.11 Appendix 1: Positive Definite Matrices......Page 195
5.12 Appendix 2: Proof of Theorem 5.5.3......Page 196
5.13.1 1HL Network Response, Gradient, and Sum-Squared Error......Page 198
5.13.2 1HL Steepest Descent......Page 201
5.13.3 1HL Conjugate Gradient......Page 204
5.13.4 1HL Quasi-Newton......Page 207
5.13.5 1HL Levenberg-Marquardt......Page 211
5.14 Appendix 4: MATLAB Listings of Multiple-Output Quasi-Newton Training Algorithms......Page 214
6.1.1 The Issue......Page 220
6.1.2 Formulation as Model Selection......Page 221
6.1.3 Outline of Approaches to Model Selection......Page 222
6.2.1 Setting......Page 223
6.2.2 Priors......Page 225
6.2.3 Likelihood and Posterior......Page 226
6.2.4 Bayesian Model and Architecture Selection......Page 229
6.3.1 General Theory for Inverse Problems......Page 232
6.3.2 Lagrange Multiplier Formulation......Page 236
6.3.3 Application to Neural Networks......Page 237
6.4.1 Kolmogorov Complexity......Page 238
6.4.2 Application to Neural Networks......Page 241
6.5.1 Code Length Measure of Complexity......Page 243
6.5.2 Application to Neural Networks......Page 245
6.6 Overfitting Control......Page 248
6.7.1 Growing a Network......Page 249
6.7.2 Pruning a Network......Page 250
7.1 Introduction and Network Specification......Page 252
7.3 Gradient-Based Training Algorithms......Page 255
7.4 Expected Error Terms......Page 256
7.5 Data Sets......Page 260
7.6.1 Limiting Behavior......Page 262
7.6.2 Central Limit Theorem Estimates of Fluctuation......Page 263
7.6.3 Upper Bounds to Fluctuation Measures......Page 264
7.7 Creating Independence: Cross-Validation and Bootstrap......Page 267
7.7.1 Cross-Validation......Page 269
7.7.2 Bootstrap......Page 272
7.8.1 Introduction......Page 273
7.8.2 The Method of Vapnik-Chervonenkis......Page 274
7.8.3 Bounds Uniform Only over the Network Parameters......Page 279
7.8.4 Fat-Shattering Dimension......Page 280
7.9.1 Outline of the Argument......Page 283
7.9.2 Properties of the Minima of Generalization Error......Page 284
7.9.3 Vapnik-Chervonenkis Theory and Uniform Bounds......Page 286
7.9.4 Almost Sure Convergence of Estimated Parameters into the Set of Minima of Generalization Error......Page 287
7.10 Asymptotic Distribution of Training Algorithm Parameter Estimates......Page 288
7.11 Asymptotics of Generalization Error: Learning Curves......Page 294
7.12 Asymptotics of Empirical/Training Error......Page 296
7.13 Afterword......Page 297
7.14 Appendix 1: Proof of Theorem 7.8.2......Page 299
A.1 Overview......Page 302
A.2 Exercises for the Chapters......Page 303
A.3 MATLAB Program Listings......Page 323
References......Page 326
A......Page 346
B......Page 347
C......Page 348
E......Page 349
G......Page 350
K......Page 351
M......Page 352
O......Page 353
S......Page 354
T......Page 355
V......Page 356
Z......Page 357