The Energy of Data and Distance Correlation

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Energy distance is a statistical distance between the distributions of random vectors, which characterizes equality of distributions. The name energy derives from Newton's gravitational potential energy, and there is an elegant relation to the notion of potential energy between statistical observations. Energy statistics are functions of distances between statistical observations in metric spaces. The authors hope this book will spark the interest of most statisticians who so far have not explored E-statistics and would like to apply these new methods using R. The Energy of Data and Distance Correlation is intended for teachers and students looking for dedicated material on energy statistics, but can serve as a supplement to a wide range of courses and areas, such as Monte Carlo methods, U-statistics or V-statistics, measures of multivariate dependence, goodness-of-fit tests, nonparametric methods and distance based methods.

•E-statistics provides powerful methods to deal with problems in multivariate inference and analysis.

•Methods are implemented in R, and readers can immediately apply them using the freely available energy package for R.

•The proposed book will provide an overview of the existing state-of-the-art in development of energy statistics and an overview of applications.

•Background and literature review is valuable for anyone considering further research or application in energy statistics.

Author(s): Gábor J. Székely, Maria L. Rizzo
Series: Chapman & Hall/CRC Monographs on Statistics and Applied Probability
Publisher: CRC Press/Chapman & Hall
Year: 2023

Language: English
Pages: 466
City: Boca Raton

Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
Preface
Authors
Notation
I. The Energy of Data
1. Introduction
1.1. Distances of Data
1.2. Energy of Data: Distance Science of Data
2. Preliminaries
2.1. Notation
2.2. V-statistics and U-statistics
2.2.1. Examples
2.2.2. Representation as a V-statistic
2.2.3. Asymptotic Distribution
2.2.4. E-statistics as V-statistics vs U-statistics
2.3. A Key Lemma
2.4. Invariance
2.5. Exercises
3. Energy Distance
3.1. Introduction: The Energy of Data
3.2. The Population Value of Statistical Energy
3.3. A Simple Proof of the Inequality
3.4. Energy Distance and Cramér's Distance
3.5. Multivariate Case
3.6. Why is Energy Distance Special?
3.7. Infinite Divisibility and Energy Distance
3.8. Freeing Energy via Uniting Sets in Partitions
3.9. Applications of Energy Statistics
3.10. Exercises
4. Introduction to Energy Inference
4.1. Introduction
4.2. Testing for Equal Distributions
4.3. Permutation Distribution and Test
4.4. Goodness-of-Fit
4.5. Energy Test of Univariate Normality
4.6. Multivariate Normality and other Energy Tests
4.7. Exercises
5. Goodness-of-Fit
5.1. Energy Goodness-of-Fit Tests
5.2. Continuous Uniform Distribution
5.3. Exponential and Two-Parameter Exponential
5.4. Energy Test of Normality
5.5. Bernoulli Distribution
5.6. Geometric Distribution
5.7. Beta Distribution
5.8. Poisson Distribution
5.8.1. The Poisson E-test
5.8.2. Probabilities in Terms of Mean Distances
5.8.3. The Poisson M-test
5.8.4. Implementation of Poisson Tests
5.9. Energy Test for Location-Scale Families
5.10. Asymmetric Laplace Distribution
5.10.1. Expected Distances
5.10.2. Test Statistic and Empirical Results
5.11. The Standard Half-Normal Distribution
5.12. The Inverse Gaussian Distribution
5.13. Testing Spherical Symmetry; Stolarsky Invariance
5.14. Proofs
5.15. Exercises
6. Testing Multivariate Normality
6.1. Energy Test of Multivariate Normality
6.1.1. Simple Hypothesis: Known Parameters
6.1.2. Composite Hypothesis: Estimated Parameters
6.1.3. On the Asymptotic Behavior of the Test
6.1.4. Simulations
6.2. Energy Projection-Pursuit Test of Fit
6.2.1. Methodology
6.2.2. Projection Pursuit Results
6.3. Proofs
6.3.1. Hypergeometric Series Formula
6.3.2. Original Formula
6.4. Exercises
7. Eigenvalues for One-Sample E-Statistics
7.1. Introduction
7.2. Kinetic Energy: The Schrödinger Equation
7.3. CF Version of the Hilbert-Schmidt Equation
7.4. Implementation
7.5. Computation of Eigenvalues
7.6. Computational and Empirical Results
7.6.1. Results for Univariate Normality
7.6.2. Testing Multivariate Normality
7.6.3. Computational Efficiency
7.7. Proofs
7.8. Exercises
8. Generalized Goodness-of-Fit
8.1. Introduction
8.2. Pareto Distributions
8.2.1. Energy Tests for Pareto Distribution
8.2.2. Test of Transformed Pareto Sample
8.2.3. Statistics for the Exponential Model
8.2.4. Pareto Statistics
8.2.5. Minimum Distance Estimation
8.3. Cauchy Distribution
8.4. Stable Family of Distributions
8.5. Symmetric Stable Family
8.6. Exercises
9. Multi-sample Energy Statistics
9.1. Energy Distance of a Set of Random Variables
9.2. Multi-sample Energy Statistics
9.3. Distance Components: A Nonparametric Extension of ANOVA
9.3.1. The DISCO Decomposition
9.3.2. Application: Decomposition of Residuals
9.4. Hierarchical Clustering
9.5. Case Study: Hierarchical Clustering
9.6. K-groups Clustering
9.6.1. K-groups Objective Function
9.6.2. K-groups Clustering Algorithm
9.6.3. K-means as a Special Case of K-groups
9.7. Case Study: Hierarchical and K-groups Cluster Analysis
9.8. Further Reading
9.8.1. Bayesian Applications
9.9. Proofs
9.9.1. Proof of Theorem 9.1
9.9.2. Proof of Proposition 9.1
9.10. Exercises
10. Energy in Metric Spaces and Other Distances
10.1. Metric Spaces
10.1.1. Review of Metric Spaces
10.1.2. Examples of Metrics
10.2. Energy Distance in a Metric Space
10.3. Banach Spaces
10.4. Earth Mover's Distance
10.4.1. Wasserstein Distance
10.4.2. Energy vs. Earth Mover's Distance
10.5. Minimum Energy Distance (MED) Estimators
10.6. Energy in Hyperbolic Spaces and in Spheres
10.7. The Space of Positive Definite Symmetric Matrices
10.8. Energy and Machine Learning
10.9. Minkowski Kernel and Gaussian Kernel
10.10. On Some Non-Energy Distances
10.11. Topological Data Analysis
10.12. Exercises
II. Distance Correlation and Dependence
11. On Correlation and Other Measures of Association
11.1. The First Measure of Dependence: Correlation
11.2. Distance Correlation
11.3. Other Dependence Measures
11.4. Representations by Uncorrelated Random Variables
12. Distance Correlation
12.1. Introduction
12.2. Characteristic Function Based Covariance
12.3. Dependence Coefficients
12.3.1. Definitions
12.4. Sample Distance Covariance and Correlation
12.4.1. Derivation of V2n
12.4.2. Equivalent Definitions for V2n
12.4.3. Theorem on dCov Statistic Formula
12.5. Properties
12.6. Distance Correlation for Gaussian Variables
12.7. Proofs
12.7.1. Finiteness of ||fX,Y (t,s) – fX(t)fY (s)||2
12.7.2. Proof of Theorem 12.1
12.7.3. Proof of Theorem 12.2
12.7.4. Proof of Theorem 12.4
12.8. Exercises
13. Testing Independence
13.1. The Sampling Distribution of nV2n
13.1.1. Expected Value and Bias of Distance Covariance
13.1.2. Convergence
13.1.3. Asymptotic Properties of nV2n
13.2. Testing Independence
13.2.1. Implementation as a Permutation Test
13.2.2. Rank Test
13.2.3. Categorical Data
13.2.4. Examples
13.2.5. Power Comparisons
13.3. Mutual Independence
13.4. Proofs
13.4.1. Proof of Proposition 13.1
13.4.2. Proof of Theorem 13.1
13.4.3. Proof of Corollary 13.3
13.4.4. Proof of Theorem 13.2
13.5. Exercises
14. Applications and Extensions
14.1. Applications
14.1.1. Nonlinear and Non-monotone Dependence
14.1.2. Identify and Test for Nonlinearity
14.1.3. Exploratory Data Analysis
14.1.4. Identify Influential Observations
14.2. Some Extensions
14.2.1. Affine and Monotone Invariant Versions
14.2.2. Generalization: Powers of Distances
14.2.3. Distance Correlation for Dissimilarities
14.2.4. An Unbiased Distance Covariance Statistic
14.3. Distance Correlation in Metric Spaces
14.3.1. Hilbert Spaces and General Metric Spaces
14.3.2. Testing Independence in Separable Metric Spaces
14.3.3. Measuring Associations in Banach Spaces
14.4. Distance Correlation with General Kernels
14.5. Further Reading
14.5.1. Variable Selection, DCA and ICA
14.5.2. Nonparametric MANOVA Based on dCor
14.5.3. Tests of Independence with Ranks
14.5.4. Projection Correlation
14.5.5. Detection of Periodicity via Distance Correlation
14.5.6. dCov Goodness-of-fit Test of Dirichlet Distribution
14.6. Exercises
15. Brownian Distance Covariance
15.1. Introduction
15.2. Weighted L2 Norm
15.3. Brownian Covariance
15.3.1. Definition of Brownian Covariance
15.3.2. Existence of Brownian Covariance Coefficient
15.3.3. The Surprising Coincidence: BCov(X,Y) = dCov(X,Y)
15.4. Fractional Powers of Distances
15.5. Proofs of Statements
15.5.1. Proof of Theorem 15.1
15.6. Exercises
16. U-statistics and Unbiased dCov2
16.1. An Unbiased Estimator of Squared dCov
16.2. The Hilbert Space of U-centered Distance Matrices
16.3. U-statistics and V-statistics
16.3.1. Definitions
16.3.2. Examples
16.4. Jackknife Invariance and U-statistics
16.5. The Inner Product Estimator is a U-statistic
16.6. Asymptotic Theory
16.7. Relation between dCov U-statistic and V-statistic
16.7.1. Deriving the Kernel of dCov V-statistic
16.7.2. Combining Kernel Functions for Vn
16.8. Implementation in R
16.9. Proofs
16.10. Exercises
17. Partial Distance Correlation
17.1. Introduction
17.2. Hilbert Space of U-centered Distance Matrices
17.2.1. U-centered Distance Matrices
17.2.2. Properties of Centered Distance Matrices
17.2.3. Additive Constant Invariance
17.3. Partial Distance Covariance and Correlation
17.4. Representation in Euclidean Space
17.5. Methods for Dissimilarities
17.6. Population Coefficients
17.6.1. Distance Correlation in Hilbert Spaces
17.6.2. Population pdCov and pdCor Coefficients
17.6.3. On Conditional Independence
17.7. Empirical Results and Applications
17.8. Proofs
17.9. Exercises
18. The Numerical Value of dCor
18.1. Cor and dCor: How Much Can They Differ?
18.2. Relation Between Pearson and Distance Correlation
18.3. Conjecture
19. The dCor t-test of Independence in High Dimension
19.1. Introduction
19.1.1. Population dCov and dCor Coefficients
19.1.2. Sample dCov and dCor
19.2. On the Bias of the Statistics
19.3. Modified Distance Covariance Statistics
19.4. The t-test for Independence in High Dimension
19.5. Theory and Properties
19.6. Application to Time Series
19.7. Dependence Metrics in High Dimension
19.8. Proofs
19.8.1. On the Bias of Distance Covariance
19.8.2. Proofs of Lemmas
19.8.3. Proof of Propositions
19.8.4. Proof of Theorem
19.9. Exercises
20. Computational Algorithms
20.1. Linearize Energy Distance of Univariate Samples
20.1.1. L-statistics Identities
20.1.2. One-sample Energy Statistics
20.1.3. Energy Test for Equality of Two or More Distributions
20.2. Distance Covariance and Correlation
20.3. Bivariate Distance Covariance
20.3.1. An O(n log n) Algorithm for Bivariate Data
20.3.2. Bias-Corrected Distance Correlation
20.4. Alternate Bias-Corrected Formula
20.5. Randomized Computational Methods
20.5.1. Random Projections
20.5.2. Algorithm for Squared dCov
20.5.3. Estimating Distance Correlation
20.6. Appendix: Binary Search Algorithm
20.6.1. Computation of the Partial Sums
20.6.2. The Binary Tree
20.6.3. Informal Description of the Algorithm
20.6.4. Algorithm
21. Time Series and Distance Correlation
21.1. Yule's “nonsense correlation” is Contagious
21.2. Auto dCor and Testing for iid
21.3. Cross and Auto-dCor for Stationary Time Series
21.4. Martingale Difference dCor
21.5. Distance Covariance for Discretized Stochastic Processes
21.6. Energy Distance with Dependent Data: Time Shift Invariance
22. Axioms of Dependence Measures
22.1. Rényi's Axioms and Maximal Correlation
22.2. Axioms for Dependence Measures
22.3. Important Dependence Measures
22.4. Invariances of Dependence Measures
22.5. The Erlangen Program of Statistics
22.6. Multivariate Dependence Measures
22.7. Maximal Distance Correlation
22.8. Proofs
22.9. Exercises
23. Earth Mover's Correlation
23.1. Earth Mover's Covariance
23.2. Earth Mover's Correlation
23.3. Population eCor for Mutual Dependence
23.4. Metric Spaces
23.5. Empirical Earth Mover's Correlation
23.6. Dependence, Similarity, and Angles
A. Historical Background
B. Prehistory
B.1. Introductory Remark
B.2. Thales and the Ten Commandments
Bibliography
Index