Building Bridges between Soft and Statistical Methodologies for Data Science

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Nowadays, data analysis is becoming an appealing topic due to the emergence of new data types, dimensions, and sources. This motivates the development of probabilistic/statistical approaches and tools to cope with these data. Different communities of experts, namely statisticians, mathematicians, computer scientists, engineers, econometricians, and psychologists are more and more interested in facing this challenge. As a consequence, there is a clear need to build bridges between all these communities for Data Science.

This book contains more than fifty selected recent contributions aiming to establish the above referred bridges. These contributions address very different and relevant aspects such as imprecise probabilities, information theory, random sets and random fuzzy sets, belief functions, possibility theory, dependence modelling and copulas, clustering, depth concepts, dimensionality reduction of complex data and robustness.

Author(s): Luis A. García-Escudero, Alfonso Gordaliza, Agustín Mayo, María Asunción Lubiano Gomez, Maria Angeles Gil, Przemyslaw Grzegorzewski, Olgierd Hryniewicz
Series: Advances in Intelligent Systems and Computing, 1433
Publisher: Springer
Year: 2022

Language: English
Pages: 420
City: Cham

Preface
Organization
General Chairs
Executive Board (Core SMPS Group)
Program Committee
Additional Referees
Organizing Local Committee
Publication Chair
Contents
Multi-dimensional Maximal Coherent Subsets Made Easy: Illustration on an Estimation Problem
1 Introduction
2 Maximal Coherent Subsets
3 Enumerating the MCSs of Axis-Aligned Hyperrectangles
4 Illustration on Linear Estimation
4.1 Estimating Possible Linear Models with MCS and Rectangles
4.2 Application
5 Conclusion
References
On Convergence in Distribution of Fuzzy Random Variables
1 Introduction
2 Preliminaries
3 Main Results
References
A Fuzzy Survival Tree (FST)
1 Introduction
2 Fuzzy Survival Tree Algorithm
2.1 FST Learning Process
2.2 FST Notation
2.3 FST Algorithm Steps
3 Results
3.1 Behaviour and Stability of the FST with Missing Data
3.2 Comparison of FST with Other Survival Algorithms
3.3 FST Performance Compared to ST(c_index)
4 Conclusions
References
Robust Rao-Type Tests for Non-destructive One-Shot Device Testing Under Step-Stress Model with Exponential Lifetimes
1 Introduction
2 Robust Rao-Type Test
3 Influence Function Analysis
4 Simulation Study
References
An Imprecise Label Ranking Method for Heterogeneous Data
1 Introduction
2 The Plackett-Luce Model
2.1 Hierarchical Model
2.2 Parameter Estimation
3 Illustration
4 Conclusion
References
The Choice of an Appropriate Stochastic Order to Aggregate Random Variables
1 Introduction
2 Basic Concepts
2.1 Aggregation Functions
2.2 Stochastic Orders
3 Aggregation of Random Variables
3.1 The Choice of the Stochastic Order
3.2 Induced Aggregation of Random Variables
4 Conclusions
References
A Framework for Probabilistic Reasoning on Knowledge Graphs
1 Introduction
2 Probabilistic Knowledge Graphs
3 The MCMC-Chase Algorithm
4 Conclusion
References
Biological Age Imputation by Data Depth
1 Introduction
2 Preliminaries, Data Depth
3 Biologial Age Imputation by Data Depth
3.1 Sample Balancing
4 Data and Results
5 Conclusions and Future Work
References
Monitoring Tools in Robust CWM for the Analysis of Crime Data
1 Introduction and Notation
2 Crime Dataset
3 Conclusions
References
Penalized Model-Based Clustering with Group-Dependent Shrinkage Estimation
1 Introduction and Motivation
2 Proposed Solution
3 Application to Handwritten Digits Recognition
4 Conclusion and Discussion
References
Robust Diagnostics for Linear Mixed Models with the Forward Search
1 Introduction and Background
2 Data, Their Features and Economic Characteristics
3 LMM for Repeated Measures
4 The Forward Search for LMM
5 Illustration for Simulated and Trade Coffee Data
6 Final Remarks
References
K-Partitioning with Imprecise Probabilistic Edges
1 Introduction
2 Notations and Problem Definition
3 Analysis
3.1 Computational Complexity
3.2 Easy Cases
4 Heuristic
4.1 Pattern and Reduction Rules
4.2 Algorithm Description
5 Numerical Experiments
5.1 Dataset
5.2 Results
6 Conclusion
References
Decision-Making with E-Admissibility Given a Finite Assessment of Choices
1 Introduction
2 Setting and Choice Functions
3 E-Admissibility
4 Assessments and Extensions
5 A Characterisation of the E-Admissible Extension
6 An Algorithmic Approach
7 Conclusion
References
Copula-Based Divergence Measures for Dependence Between Random Vectors
1 Introduction
2 Axioms
3 -Dependence Measures
4 Example
References
The Winning Probability Relation of Parametrized Families of Random Vectors
1 Introduction
2 Preliminaries
3 Multivariate Normal Distribution
4 Multivariate T-Distribution
5 Compounding the Multivariate Normal Distribution
6 Discussion
References
Convergence of Copulas Revisited: Different Notions of Convergence and Their Interrelations
1 Introduction
2 Notation and Preliminaries
3 Interrelation of Metrics and Topologies
3.1 Relations Between the Operator Metrics OPp and DR
3.2 Relations Between the Operator Metrics OPp and Weak Conditional Convergence
References
An INDSCAL-Type Approach for Three-Way Spectral Clustering
1 Introduction
2 Background Results
3 A Three-Way Extension of Spectral Clustering Method Through an INDSCAL-Type Approach
4 A Real Illustrative Application: Blue Crabs Data
References
On Clustering of Star-Shaped Sets with a Fuzzy Approach: An Application to the Clasts in the Cantabrian Coast
1 Preliminaries on Star-Shaped Sets
2 Fuzzy Clustering Proposal
3 Empirical Analysis
4 Concluding Remarks
References
Cluster Validity Measures for Fuzzy Two-Mode Clustering
1 Background on Fuzzy Two-Mode Clustering
2 Cluster Validity Measures
3 Empirical Results
4 Concluding Remarks
References
The Simplifying Assumption in Pair-Copula Constructions from an Analytic Perspective
1 Introduction
2 Simplified Copulas
3 Optimality and Continuity Results of Partial Vine Copulas (PVCs)
References
On Positive Dependence Properties for Archimedean Copulas
1 Introduction and Preliminaries
2 Dependence Properties
3 Archimedean Copulas
References
Advances in Robust Constrained Model Based Clustering
1 Trimmed Mixture Likelihood with Constraints
2 The New Constraints and Algorithm
3 Automatic Choice of the Constraints
References
Trimmed Spatio-Temporal Variogram Estimator
1 Introduction
2 -Trimmed Spatio-Temporal Variogram Estimator
3 VOM+SAD Approximation of the Distribution of the -Trimmed Spatio-Temporal Variogram Estimator
4 Example
5 Conclusions and Further Research
References
Two Notions of Depth in the Fuzzy Setting
1 Introduction
2 Definition
3 Relationship Between the Two Notions of Depth
3.1 Comparison
4 Concluding Remarks
References
Tukey Depth for Fuzzy Sets
1 Introduction and Preliminaries
2 On the Extension of the Tukey Depth for Fuzzy Spaces
3 Real Data Illustration
References
Making Data Fair Through Optimal Trimmed Matching
1 Introduction and Notations
2 Experiments and Discussion
References
Paired Sample Test for Fuzzy Data
1 Introduction
2 Fuzzy Data
3 Fuzzy Random Variables
4 Two-Sample Permutation Test for Paired Fuzzy Data
5 Simulation Study
6 Conclusions
References
Monitoring of Possibilisticaly Aggregated Complex Time Series
1 Introduction
2 Monitoring of Processes with a CUSUM Control Chart
3 Possibilistic Aggregation of Segmented Processes
4 Statistical Properties of the CUSUM Chart for Aggregated Process Data
5 Conclusions
References
Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions
1 Introduction
2 Methodology
3 Tk-Merge
4 Experiments
5 Next Steps
References
A Markov Kernel Approach to Multivariate Archimedean Copulas
1 Introduction
2 Notation and Preliminaries
3 Results
References
Using Fuzzy Cluster Analysis to Find Interesting Clusters
1 Introduction: What Is Cluster Analysis?
2 Fuzzy Cluster Analysis
3 Technical Advantages of Fuzzy Clustering
4 Identifying Clusters Step by Step
5 Application to Indirect Methods to Compute Reference Intervals in Laboratory Medicine
References
Learning Control Limits for Monitoring of Multiple Processes with Neural Network
1 Introduction
2 Learning Control Limits with CONNF Neural Network
3 Simulation Results
4 Experiment on Real-Life Data from Smartphone-Based Monitoring of Bipolar Disorder Patients
5 Conclusions
References
Testing the Homogeneity of Topic Distribution Between Documents of a Corpus
1 Introduction
2 Topic Modelling
2.1 Pre-processing Step
2.2 Latent Dirichlet Allocation
2.3 Kullback Leibler Divergence
3 Topic Distribution Equality Test
4 Simulations
5 Application
6 Conclusion
References
Circular Ordering Methods for Timing and Visualization of Oscillatory Signals
1 Introduction
2 Methods
2.1 The FMM Model
2.2 The CPCA
3 Application 1: Human Circadian Atlas
4 Application 2: Neuronal Cell-Type Taxonomy
References
The Extended Version of Cohen's d Index for Interval-Valued Data
1 Introduction
2 Preliminary Concepts
3 Standardized Mean Difference for Interval-Valued Data
4 Real-Life Data Example
5 Conclusions and Future Research
References
Copulas, Lower Probabilities and Random Sets: How and When to Apply Them?
1 Introduction
2 Preliminaries
2.1 Imprecise Models
2.2 Copulas
3 Applying Copulas to Credal Sets
3.1 Robust Method on Dominated Probabilities
3.2 Joint Masses from Copulas
3.3 Copula Applied to the Lower Probabilities
4 Special Cases
5 Conclusion
References
Complex Dimensionality Reduction: Ultrametric Models for Mixed-Type Data
1 Introduction
2 Background
3 Statistical Relationships Between Mixed-Type Variables
4 Methods
5 Application
6 Discussion
References
Partial Calibrated Multi-label Ranking
1 Introduction
2 Preliminaries
2.1 Multi-Label Classification
2.2 Calibrated Label Ranking
3 Partial Calibrated Label Ranking
3.1 Justification of Partial Calibrated Label Ranking
4 Experiments
4.1 Experimental Settings
4.2 Results and Discussion
5 Conclusions and Future Research
References
Imprecise Learning from Misclassified and Incomplete Categorical Data with Unknown Error Structure
1 Introduction
2 Misclassified and Incomplete Categorical Data
3 Learning from Misclassified Categorical Data
3.1 Incorporating the Knowledge Gained from Double Sampling
3.2 Learning Under the State of Quasi-Near-Ignorance
4 Example
5 Discussion
References
Note on Efron's Monotonicity Property Under Given Copula Structures
1 Introduction and Background
2 Efron's Marginal Monotonocity in Terms of the Copula
3 Examples of Copulas
3.1 Farlie-Gumbel-Morgenstern Copula
3.2 Clayton Copula
3.3 Ali-Mikhail-Haq Copula
3.4 Frank Copula
4 Other Examples: Generalized Order Statistics (GOSs)
References
A Minimizing Problem of Distances Between Random Variables with Proportional Reversed Hazard Rate Functions
1 Introduction
2 Preliminaries
3 Determining min"4266308 YFX"5267309 EC|X-Y| with X and C Fixed
4 Examples
References
Polytopes of Discrete Copulas and Applications
1 Introduction
2 Polytopes of Bivariate Discrete Copulas
2.1 Discrete Copulas on Arbitrary Grid Domains
2.2 Empirical Copulas as Extreme Points
3 Polytopes of d-Dimensional Discrete Copulas, d>2
4 Discussion
References
Penalized Estimation of a Finite Mixture of Linear Regression Models
1 Introduction
2 Likelihood Unboundedness
3 Covariate Selection
4 Bounded Covariate Selection
5 Concluding Remarks
References
Robust Bayesian Regression for Mislabeled Binary Outcomes
1 Introduction
2 Model Formulation
2.1 A Bayesian Model for Misclassified Binary Data
3 Misclassification Probability
3.1 Common Misclassification Probability
3.2 Subject-Specific Misclassification Probabilities
3.3 Spike-and-Slab Prior for Misclassification Probabilities
4 Posterior Computation
5 Outlier Detection
6 Simulation Study
7 Concluding Remarks
References
Case-Wise and Cell-Wise Outliers Detection Based on Statistical Depth Filters
1 Introduction
2 Half-Space Depth-Filter
3 Outlier Detection Based on Half-Space Depth-Filters
4 Simulation Study
References
The d-Depth-Based Interval Trimmed Mean
1 Introduction
2 The Space Kc(R)
3 Central Tendency Measures for Random Intervals
4 The d-Depth-Based Interval Trimmed Mean
4.1 Comparative Simulation Study
5 Concluding Remarks
References
Characterization of Extreme Points of p-Boxes via Their Normal Cones
1 Introduction
2 General Structure of Maximal Elementary Simplicial Cones of p-Boxes
3 Relating Extreme Points to Their Normal Cones
References
Measurability and Products of Random Sets
1 Introduction
2 Preliminaries
3 Comparison of Measurability Conditions
4 Measurability of Products of Random Sets
5 Random Functions
References
Fast CP Model Fitting with Integrated ASD-ALS Procedure
1 Introduction
2 The CP Model and Its ALS Estimation
3 The ASD and INT-3 Algorithms
4 Simulation Study
5 Summary and Conclusions
References
On Quantifying and Estimating Directed Dependence
1 Introduction
2 Notation and Preliminaries
3 Empirical Checkerboard Estimators
3.1 Simulations
4 Tackling the Multivariate Setting
References
Explaining Cautious Random Forests via Counterfactuals
1 Introduction
2 Background
2.1 Cautious Random Forests
2.2 Counterfactual-Based Explanations
3 Explaining Imprecision Using Counterfactuals
3.1 Extracting Determinate Counterfactuals
3.2 Region Filtering and Counterfactual Initialization
4 Experimental Results
4.1 Counterfactual Extraction Efficiency
4.2 Case Studies
5 Conclusion
References
Remarks on Martingale Representation Theorem for Set-Valued Martingales
1 Introduction
2 Notations and Preliminaries
3 Main Result
References
Author Index