Advances and Innovations in Statistics and Data Science

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book highlights selected papers from the 4th ICSA-Canada Chapter Symposium, as well as invited articles from established researchers in the areas of statistics and data science. It covers a variety of topics, including methodology development in data science, such as methodology in the analysis of high dimensional data, feature screening in ultra-high dimensional data and natural language ranking; statistical analysis challenges in sampling, multivariate survival models and contaminated data, as well as applications of statistical methods. With this book, readers can make use of frontier research methods to tackle their problems in research, education, training and consultation.

Author(s): Wenqing He, Liqun Wang, Jiahua Chen, Chunfang Devon Lin
Series: ICSA Book Series in Statistics
Publisher: Springer
Year: 2022

Language: English
Pages: 337
City: Cham

Preface
Part I: Methodology Development in Data Science (Chapters 1–6)
Part II: Challenges in Statistical Analysis (Chapters 7–15)
Contents
Contributors
Part I Methodology Development in Data Science
MiRNA–Gene Activity Interaction Networks (miGAIN): Integrated Joint Models of miRNA–Gene Targeting and Disturbance in Signaling Pathways
1 Introduction
2 Data
3 Methods
4 Results
4.1 Simulations
4.2 Data Analysis
5 Discussion
References
Robust Feature Screening for Ultrahigh-Dimensional Censored Data Subject to Measurement Error
1 Introduction
2 Notation and Framework
2.1 Survival Data
2.2 Distance Correlation of Two Random Variables
2.3 Feature Screening for Censored Data with Precise Measurements
3 Feature Screening for Censored Data with Error-Prone Covariates
3.1 Measurement Error Model
3.2 Feature Screening with Measurement Error Effects Accommodated
3.3 Asymptotic Results
4 Iteration Algorithm
5 Numerical Studies
5.1 Simulation Setup
5.2 Simulation Results
5.3 Analysis of Mantle Cell Lymphoma Microarray Data
6 Discussion
Appendix
A. Technical Lemmas
B. Proofs of Main Theorems
B.1 Proof of Theorem 1
B.2 Proof of Theorem 2
References
Simultaneous Control of False Discovery Rate and Sensitivity Using Least Angle Regressions in High-Dimensional Data Analysis
1 Introduction
2 Methodology
2.1 Cosin Distribution
2.2 Selection Criteria
3 Numerical Studies
3.1 Simulation Studies
3.1.1 Scenario I: Compound Symmetry Structure of
3.1.2 Scenario II: Auto-Regressive Correlation
3.2 A Real-Data Application
4 Discussion
References
Minimum Wasserstein Distance Estimator Under Finite Location-Scale Mixtures
1 Introduction
2 Wasserstein Distance and the Minimum Distance Estimator
2.1 Wasserstein Distance
2.2 Minimum Wasserstein Distance Estimator
2.3 Consistency of MWDE
2.4 Numerical Solution to MWDE
2.5 Penalized Maximum Likelihood Estimator
3 Experiments
3.1 Performance Measure
3.2 Performance Under Homogeneous Model
3.3 Efficiency and Robustness Under Finite Location-Scale Mixtures
3.3.1 Efficiency
3.3.2 Robustness
3.4 Image Segmentation
4 Conclusion
Appendix
Numerically Friendly Expression of W2(FN, F(·|G))
References
An Entropy-Based Comment Ranking Method with Word Embedding Clustering
1 Introduction
1.1 How to Judge a Comment's Quality?
1.2 Ranking Comments Using Entropy
1.3 Text Representation
2 Bag-of-Words Model with Word Embedding Clusters
3 Ranking Comments with General Entropy
4 Experiment with Amazon Review Data
References
A Robust Approach to Statistical Quality Control for High-Dimensional Non-Normal Data
1 Introduction
2 Notations and Preliminaries
3 Modification and Robustness
3.1 Model and Assumptions
3.2 Statistic, Its Limit, and Robustness
4 Simulations
5 Discussion
6 Basic Moments
7 Proof of Lemma 1
8 Proof of Theorem 1
9 Proof of Theorem 2
References
Part II Challenges in Statistical Analysis
Functional Linear Regression for Partially Observed Functional Data
1 Introduction
2 Functional Linear Model
3 Estimation Methods
3.1 Partially Observed Functional Data Without Measurement Error
3.2 Partially Observed Functional Data with Measurement Error
4 Simulation Studies
5 Real Data Analysis
6 Discussion
Appendix
References
Profile Estimation of Generalized Semiparametric Varying-Coefficient Additive Models for Longitudinal Data with Within-Subject Correlations
1 Introduction
2 Profile GEE Estimation Procedures
2.1 Model Estimation Using Fixed Working Covariance Matrices
2.2 Estimation of the Within-Subject Covariance Matrix
2.2.1 Estimation of Marginal Variance
2.2.2 Estimation of Correlation Coefficients
2.3 Profile Estimation via Quadratic Inference Function
2.4 Computational Algorithms
2.5 Choices of Kernel Function, Bandwidth, and Link Function
3 Simulation Studies
3.1 Continuous Longitudinal Responses
3.2 Discrete Longitudinal Responses
4 Concluding Remarks
References
Sieve Estimation of Semiparametric Linear Transformation Model with Left-Truncated and Current Status Data
1 Introduction
2 Sieve Maximum Likelihood Estimation
3 Efficient Estimation
4 Asymptotic Properties
5 Simulation Study
6 Real Data Analysis
7 Concluding Remarks and Future Work
Appendix
References
A Review of Flexible Transformations for Modeling Compositional Data
1 Introduction
2 Transformations for Compositional Data
2.1 Isometric Log-Ratio Transformation
2.2 α-transformation
2.3 α-Folding Transformation
3 Real-life Data Applications
4 Conclusion
References
Identifiability and Estimation of Autoregressive ARCH Models with Measurement Error
1 Introduction
2 The Model and Identifiability
3 GMM Estimation
4 Testing for Measurement Error
5 Impact of Measurement Error
6 Finite Sample Properties
7 Conclusions and Discussion
Appendix
Regularity Assumptions and Mathematical Proofs
Regularity Assumptions
Proof of Theorem 1
Proof of Theorem 2
References
Modal Regression for Skewed, Truncated, or Contaminated Data with Outliers
1 Introduction
2 Linear Modal Regression
2.1 Introduction of Linear Modal Regression
2.2 Asymptotic Properties
2.3 Estimation Algorithm
2.4 Prediction Intervals Based on Modal Regression
3 Nonparametric Modal Regression
3.1 Estimating Uni-Modal Regression
3.2 Estimating Multi-Modal Regression
4 Semiparametric Modal Regression
5 Discussion
Appendix
References
Spatial Multilevel Modelling in the Galveston Bay Recovery Study Survey
1 Introduction
2 Sampling Design
3 Survey Weights
4 Inferring Inclusion Probabilities from the Weights
5 Spatial Multilevel Model
6 A Bayesian Analysis
7 Frequentist Composite Likelihood Analysis
7.1 Estimating Function System for Mean Parameters
7.2 Decomposition of the Error Term
7.3 Estimating Equation System for Variance Components
7.4 Point Estimation
7.5 Uncertainty Estimation
8 Discussion and Conclusions
References
Efficient Experimental Design for Lasso Regression
1 Introduction
2 Methodology
2.1 A Construction Method Using Orthogonal Arrays
2.2 A Construction Method Using the Kronecker Product
3 Numerical Illustration
4 Discussion
References
A Selective Overview of Statistical Methods for Identification of the Treatment-Sensitive Subsets of Patients
1 Introduction
2 Statistical Methods for Treatment-Sensitive Subset Identification Based on Survival Times
2.1 An Approach Based on a Biomarker-Adaptive Threshold Design
2.2 A Hierarchical Bayesian Method
2.3 A Procedure Based on a Single-index Threshold Cox Model
2.4 An Interaction Tree Approach
3 Statistical Methods for Treatment-Sensitive Subset Identification Based on Longitudinal Measurements
3.1 A Procedure Based on Multilevel Models
3.2 A Prediction Model Approach
3.3 A Procedure Based on a Threshold Linear Mixed Model
4 Discussions and Future Work
References
Index