Introduction to High-Dimensional Statistics

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Praise for the first edition:

"[This book] succeeds singularly at providing a structured introduction to this active field of research. … it is arguably the most accessible overview yet published of the mathematical ideas and principles that one needs to master to enter the field of high-dimensional statistics. … recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research."
Journal of the American Statistical Association

Introduction to High-Dimensional Statistics, Second Edition preserves the philosophy of the first edition: to be a concise guide for students and researchers discovering the area and interested in the mathematics involved. The main concepts and ideas are presented in simple settings, avoiding thereby unessential technicalities. High-dimensional statistics is a fast-evolving field, and much progress has been made on a large variety of topics, providing new insights and methods. Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this new edition:

  • Offers revised chapters from the previous edition, with the inclusion of many additional materials on some important topics, including compress sensing, estimation with convex constraints, the slope estimator, simultaneously low-rank and row-sparse linear regression, or aggregation of a continuous set of estimators.
  • Introduces three new chapters on iterative algorithms, clustering, and minimax lower bounds.
  • Provides enhanced appendices, minimax lower-bounds mainly with the addition of the Davis-Kahan perturbation bound and of two simple versions of the Hanson-Wright concentration inequality.
  • Covers cutting-edge statistical methods including model selection, sparsity and the Lasso, iterative hard thresholding, aggregation, support vector machines, and learning theory.
  • Provides detailed exercises at the end of every chapter with collaborative solutions on a wiki site.
  • Illustrates concepts with simple but clear practical examples.

Author(s): Christophe Giraud
Series: Chapman & Hall/CRC Monographs on Statistics and Applied Probability 168
Edition: 2
Publisher: Chapman and Hall/CRC
Year: 2021

Language: English
Pages: 346
Tags: High-Dimensional Statistics; Statistics; Applied Probability; Probability

Cover
Half Title
Title Page
Copyright Page
Table of Contents
Preface, second edition
Preface
Acknowledgments
1 Introduction
1.1 High-Dimensional Data
1.2 Curse of Dimensionality
1.2.1 Lost in the Immensity of High-Dimensional Spaces
1.2.2 Fluctuations Cumulate
1.2.3 Accumulation of Rare Events May Not Be Rare
1.2.4 Computational Complexity
1.3 High-Dimensional Statistics
1.3.1 Circumventing the Curse of Dimensionality
1.3.2 A Paradigm Shift
1.3.3 Mathematics of High-Dimensional Statistics
1.4 About This Book
1.4.1 Statistics and Data Analysis
1.4.2 Purpose of This Book
1.4.3 Overview
1.5 Discussion and References
1.5.1 Take-Home Message
1.5.2 References
1.6 Exercises
1.6.1 Strange Geometry of High-Dimensional Spaces
1.6.2 Volume of a p-Dimensional Ball
1.6.3 Tails of a Standard Gaussian Distribution
1.6.4 Principal Component Analysis
1.6.5 Basics of Linear Regression
1.6.6 Concentration of Square Norm of Gaussian Random Variable
1.6.7 A Simple Proof of Gaussian Concentration
2 Model Selection
2.1 Statistical Setting
2.2 Selecting among a Collection of Models
2.2.1 Models and Oracle
2.2.2 Model Selection Procedures
2.3 Risk Bound for Model Selection
2.3.1 Oracle Risk Bound
2.4 Computational Issues
2.5 Illustration
2.6 Alternative Point of View on Model Selection
2.7 Discussion and References
2.7.1 Take-Home Message
2.7.2 References
2.8 Exercises
2.8.1 Orthogonal Design
2.8.2 Risk Bounds for the Different Sparsity Settings
2.8.3 Collections of Nested Models
2.8.4 Segmentation with Dynamic Programming
2.8.5 Goldenshluger–Lepski Method
2.8.6 Estimation under Convex Constraints
3 Minimax Lower Bounds
3.1 Minimax Risk
3.2 Recipe for Proving Lower Bounds
3.2.1 Fano’s Lemma
3.2.2 From Fano’s Lemma to a Lower Bound over a Finite Set
3.2.3 Back to the Original Problem: Finding a Good Discretization
3.3 Illustration
3.4 Minimax Risk for Coordinate-Sparse Regression
3.4.1 Lower Bound on the Minimax Risk
3.4.2 Minimax Optimality of the Model Selection Estimator
3.4.3 Frontier of Estimation in High Dimensions
3.5 Discussion and References
3.5.1 Take-Home Message
3.5.2 References
3.6 Exercises
3.6.1 Kullback–Leibler Divergence between Gaussian Distribution
3.6.2 Spreading Points in a Sparse Hypercube
3.6.3 Some Other Minimax Lower Bounds
3.6.4 Non-Parametric Lower Bounds
3.6.5 Data Processing Inequality and Generalized Fano Inequality
4 Aggregation of Estimators
4.1 Introduction
4.2 Gibbs Mixing of Estimators
4.3 Oracle Risk Bound
4.4 Numerical Approximation by Metropolis–Hastings
4.5 Numerical Illustration
4.6 Discussion and References
4.6.1 Take-Home Message
4.6.2 References
4.7 Exercises
4.7.1 Gibbs Distribution
4.7.2 Orthonormal Setting with Power Law Prior
4.7.3 Group-Sparse Setting
4.7.4 Gain of Combining
4.7.5 Online Aggregation
4.7.6 Aggregation with Laplace Prior
5 Convex Criteria
5.1 Reminder on Convex Multivariate Functions
5.1.1 Subdifferentials
5.1.2 Two Useful Properties
5.2 Lasso Estimator
5.2.1 Geometric Insights
5.2.2 Analytic Insights
5.2.3 Oracle Risk Bound
5.2.4 Computing the Lasso Estimator
5.2.5 Removing the Bias of the Lasso Estimator
5.3 Convex Criteria for Various Sparsity Patterns
5.3.1 Group-Lasso for Group Sparsity
5.3.2 Sparse-Group Lasso for Sparse-Group Sparsity
5.3.3 Fused-Lasso for Variation Sparsity
5.4 Discussion and References
5.4.1 Take-Home Message
5.4.2 References
5.5 Exercises
5.5.1 When Is the Lasso Solution Unique?
5.5.2 Support Recovery via the Witness Approach
5.5.3 Lower Bound on the Compatibility Constant
5.5.4 On the Group-Lasso
5.5.5 Dantzig Selector
5.5.6 Projection on the l1-Ball
5.5.7 Ridge and Elastic-Net
5.5.8 Approximately Sparse Linear Regression
5.5.9 Slope Estimator
5.5.10 Compress Sensing
6 Iterative Algorithms
6.1 Iterative Hard Thresholding
6.1.1 Reminder on the Proximal Method
6.1.2 Iterative Hard-Thresholding Algorithm
6.1.3 Reconstruction Error
6.2 Iterative Group Thresholding
6.3 Discussion and References
6.3.1 Take-Home Message
6.3.2 References
6.4 Exercices
6.4.1 Linear versus Non-Linear Iterations
6.4.2 Group Thresholding
6.4.3 Risk Bound for Iterative Group Thresholding
7 Estimator Selection
7.1 Estimator Selection
7.2 Cross-Validation Techniques
7.3 Complexity Selection Techniques
7.3.1 Coordinate-Sparse Regression
7.3.2 Group-Sparse Regression
7.3.3 Multiple Structures
7.4 Scale-Invariant Criteria
7.5 Discussion and References
7.5.1 Take-Home Message
7.5.2 References
7.6 Exercises
7.6.1 Expected V-Fold CV l2-Risk
7.6.2 Proof of Corollary 7.5
7.6.3 Some Properties of Penalty (7.4)
7.6.4 Selecting the Number of Steps for the Forward Algorithm
8 Multivariate Regression
8.1 Statistical Setting
8.2 Reminder on Singular Values
8.3 Low-Rank Estimation
8.3.1 When the Rank of A* is Known
8.3.2 When the Rank of A* Is Unknown
8.4 Low Rank and Sparsity
8.4.1 Row-Sparse Matrices
8.4.2 Criterion for Row-Sparse and Low-Rank Matrices
8.4.3 Convex Criterion for Low-Rank Matrices
8.4.4 Computationally Efficient Algorithm for Row-Sparse and Low-Rank Matrices
8.5 Discussion and References
8.5.1 Take-Home Message
8.5.2 References
8.6 Exercises
8.6.1 Hard Thresholding of the Singular Values
8.6.2 Exact Rank Recovery
8.6.3 Rank Selection with Unknown Variance
9 Graphical Models
9.1 Reminder on Conditional Independence
9.2 Graphical Models
9.2.1 Directed Acyclic Graphical Models
9.2.2 Non-Directed Models
9.3 Gaussian Graphical Models
9.3.1 Connection with the Precision Matrix and the Linear Regression
9.3.2 Estimating g by Multiple Testing
9.3.3 Sparse Estimation of the Precision Matrix
9.3.4 Estimation by Regression
9.4 Practical Issues
9.5 Discussion and References
9.5.1 Take-Home Message
9.5.2 References
9.6 Exercises
9.6.1 Factorization in Directed Models
9.6.2 Moralization of a Directed Graph
9.6.3 Convexity of —log(det(K))
9.6.4 Block Gradient Descent with the l1=l2 Penalty
9.6.5 Gaussian Graphical Models with Hidden Variables
9.6.6 Dantzig Estimation of Sparse Gaussian Graphical Models
9.6.7 Gaussian Copula Graphical Models
9.6.8 Restricted Isometry Constant for Gaussian Matrices
10 Multiple Testing
10.1 Introductory Example
10.1.1 Differential Expression of a Single Gene
10.1.2 Differential Expression of Multiple Genes
10.2 Statistical Setting
10.2.1 p-Values
10.2.2 Multiple-Testing Setting
10.2.3 Bonferroni Correction
10.3 Controlling the False Discovery Rate
10.3.1 Heuristics
10.3.2 Step-Up Procedures
10.3.3 FDR Control under the WPRD Property
10.4 Illustration
10.5 Discussion and References
10.5.1 Take-Home Message
10.5.2 References
10.6 Exercises
10.6.1 FDR versus FWER
10.6.2 WPRD Property
10.6.3 Positively Correlated Normal Test Statistics
11 Supervised Classification
11.1 Statistical Modeling
11.1.1 Bayes Classifier
11.1.2 Parametric Modeling
11.1.3 Semi-Parametric Modeling
11.1.4 Non-Parametric Modeling
11.2 Empirical Risk Minimization
11.2.1 Misclassification Probability of the Empirical Risk Minimizer
11.2.2 Vapnik–Chervonenkis Dimension
11.2.3 Dictionary Selection
11.3 From Theoretical to Practical Classifiers
11.3.1 Empirical Risk Convexification
11.3.2 Statistical Properties
11.3.3 Support Vector Machines
11.3.4 AdaBoost
11.3.5 Classifier Selection
11.4 Discussion and References
11.4.1 Take-Home Message
11.4.2 References
11.5 Exercises
11.5.1 Linear Discriminant Analysis
11.5.2 VC Dimension of Linear Classifiers in Rd
11.5.3 Linear Classifiers with Margin Constraints
11.5.4 Spectral Kernel
11.5.5 Computation of the SVM Classifier
11.5.6 Kernel Principal Component Analysis
12 Clustering
12.1 Proximity-Separation-Based Clustering
12.1.1 Clustering According to Proximity and Separation
12.1.2 Kmeans Paradigm
12.1.3 Hierarchical Clustering Algorithms
12.2 Model-Based Clustering
12.2.1 Gaussian Sub-Population Model
12.2.2 MLE Estimator
12.3 Kmeans
12.3.1 Kmeans Algorithm
12.3.2 Bias of Kmeans
12.3.3 Debiasing Kmeans
12.4 Semi-Definite Programming Relaxation
12.5 Lloyd Algorithm
12.5.1 Bias of Lloyd Algorithm
12.6 Spectral Algorithm
12.7 Recovery Bounds
12.7.1 Benchmark: Supervised Case
12.7.2 Analysis of Spectral Clustering
12.7.3 Analysis of Lloyd Algorithm
12.8 Discussion and References
12.8.1 Take-Home Message
12.8.2 References
12.9 Exercises
12.9.1 Exact Recovery with Hierarchical Clustering
12.9.2 Bias of Kmeans
12.9.3 Debiasing Kmeans
12.9.4 Bayes Classifier for the Supervised Case
12.9.5 Cardinality of an ε-Net
12.9.6 Operator Norm of a Random Matrix
12.9.7 Large Deviation for the Binomial Distribution
12.9.8 Sterling Numbers of the Second Kind
12.9.9 Gaussian Mixture Model and EM Algorithm
Appendix A Gaussian Distribution
A.1 Gaussian Random Vectors
A.2 Chi-Square Distribution
A.3 Gaussian Conditioning
Appendix B Probabilistic Inequalities
B.1 Basic Inequalities
B.2 Concentration Inequalities
B.2.1 McDiarmid Inequality
B.2.2 Gaussian Concentration Inequality
B.2.3 Concentration of Quadratic Forms of Gaussian Vectors
B.3 Symmetrization and Contraction Lemmas
B.3.1 Symmetrization Lemma
B.3.2 Contraction Principle
B.4 Birgé’s Inequality
Appendix C Linear Algebra
C.1 Singular Value Decomposition
C.2 Moore–Penrose Pseudo-Inverse
C.3 Matrix Norms
C.4 Matrix Analysis
C.4.1 Characterization of the Singular Values
C.4.2 Best Low-Rank Approximation
C.5 Perturbation Bounds
C.5.1 Weyl Inequality
C.5.2 Eigenspaces Localization
Appendix D Subdifferentials of Convex Functions
D.1 Subdifferentials and Subgradients
D.2 Examples of Subdifferentials
Appendix E Reproducing Kernel Hilbert Spaces
Notations
Bibliography
Index