Author(s): Benjamin M. Marlin
Language: English
Pages: 164
Tags: Информатика и вычислительная техника;Искусственный интеллект;
1 Introduction......Page 9
1.1 Outline and Contributions......Page 10
1.2.1 Notation for Missing Data......Page 12
1.2.2 Notation and Conventions for Vector and Matrix Calculus......Page 13
2.1 Optimal Prediction and Minimizing Expected Loss......Page 15
2.2 The Bayesian Framework......Page 16
2.2.2 Bayesian Computation......Page 17
2.3.1 MAP Approximation to The Prediction Function......Page 19
2.3.2 MAP Computation......Page 20
2.4.1 Function Approximation as Optimization......Page 21
2.4.2 Function Approximation and Regularization......Page 22
2.5.2 Validation Loss......Page 23
2.5.3 Cross Validation Loss......Page 24
3.1 Categories of Missing Data......Page 25
3.2 The Missing at Random Assumption and Multivariate Data......Page 26
3.3 Impact of Incomplete Data on Inference......Page 28
3.4 Missing Data, Inference, and Model Misspecification......Page 29
4.1 Finite Mixture Models......Page 33
4.1.1 Maximum A Posteriori Estimation......Page 35
4.2 Dirichlet Process Mixture Models......Page 37
4.2.1 Properties of The Dirichlet Process......Page 38
4.2.2 Bayesian Inference and the Conjugate Gibbs Sampler......Page 40
4.2.3 Bayesian Inference and the Collapsed Gibbs Sampler......Page 42
4.2.4 Predictive Distribution and the Conjugate Gibbs Sampler......Page 43
4.2.5 Predictive Distribution and the Collapsed Gibbs Sampler......Page 44
4.3 Factor Analysis and Probabilistic Principal Components Analysis......Page 45
4.3.1 Joint, Conditional, and Marginal Distributions......Page 46
4.3.2 Maximum Likelihood Estimation......Page 47
4.3.3 Predictive Distribution......Page 49
4.4.2 Maximum Likelihood Estimation......Page 50
4.4.3 Predictive Distribution......Page 53
5 Unsupervised Learning with Non-Random Missing Data......Page 54
5.1 The Yahoo! Music Data Set......Page 55
5.1.1 User Survey......Page 56
5.1.2 Rating Data Analysis......Page 57
5.1.3 Experimental Protocols for Rating Prediction......Page 59
5.2.1 Experimental Protocols for Rating Prediction......Page 60
5.4 The Finite Mixture/CPT-v Model......Page 62
5.4.1 Conditional Identifiability......Page 64
5.4.2 Maximum A Posteriori Estimation......Page 67
5.4.4 Experimentation and Results......Page 71
5.5 The Dirichlet Process Mixture/CPT-v Model......Page 76
5.5.1 An Auxiliary Variable Gibbs Sampler......Page 77
5.5.2 Rating Prediction for Training Cases......Page 80
5.5.3 Rating Prediction for Novel Cases......Page 81
5.5.4 Experimentation and Results......Page 82
5.6 The Finite Mixture/Logit-vd Model......Page 83
5.6.1 Maximum A Posteriori Estimation......Page 85
5.6.2 Rating Prediction......Page 87
5.6.3 Experimentation and Results......Page 89
5.7.1 Restricted Boltzmann Machines and Complete Data......Page 90
5.7.2 Conditional Restricted Boltzmann Machines and Missing Data......Page 93
5.7.3 Conditional Restricted Boltzmann Machines and Non User-Selected Items......Page 97
5.7.4 Experimentation and Results......Page 100
5.8 Comparison of Results and Discussion......Page 102
6.1 Frameworks for Classification With Missing Features......Page 107
6.1.3 Classification and Imputation......Page 108
6.1.4 Classification in Sub-spaces: Reduced Models......Page 109
6.2.1 Fisher's Linear Discriminant Analysis......Page 110
6.2.3 Quadratic Discriminant Analysis......Page 112
6.2.4 Regularized Discriminant Analysis......Page 113
6.2.5 LDA and Missing Data......Page 115
6.2.6 Discriminatively Trained LDA and Missing Data......Page 116
6.2.7 Synthetic Data Experiments and Results......Page 120
6.3.1 The Logistic Regression Model......Page 122
6.3.2 Maximum Likelihood Estimation for Logistic Regression......Page 123
6.3.4 Logistic Regression and Missing Data......Page 124
6.3.5 An Equivalence Between Missing Data Strategies for Linear Classification......Page 126
6.3.6 Synthetic Data Experiments and Results......Page 127
6.4.1 Perceptrons......Page 132
6.4.2 Hard Margin Support Vector Machines......Page 133
6.4.4 Soft Margin Support Vector Machine via Loss + Penalty......Page 134
6.5 Basis Expansion and Kernel Methods......Page 135
6.5.2 Kernel Methods......Page 136
6.5.3 Kernel Support Vector Machines and Kernel Logistic Regression......Page 137
6.5.4 Kernels For Missing Data Classification......Page 138
6.5.5 Synthetic Data Experiments and Results......Page 141
6.6.1 Feed-Forward Neural Network Architecture......Page 143
6.6.2 One Hidden Layer Neural Networks for Classification......Page 144
6.6.4 Regularization in Neural Networks......Page 145
6.6.5 Neural Network Classification and Missing Data......Page 146
6.6.6 Synthetic Data Experiments and Results......Page 147
6.7.1 Hepatitis Data Set......Page 148
6.7.2 Thyroid - AllHypo Data Set......Page 149
6.7.3 Thyroid - Sick Data Set......Page 150
6.7.4 MNIST Data Set......Page 151
7.1 Unsupervised Learning with Non-Random Missing Data......Page 154
Bibliography......Page 156