Statistical Modeling and Simulation for Experimental Design and Machine Learning Applications : Selected Contributions from SimStat 2019 and Invited Papers

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This volume presents a selection of articles on statistical modeling and simulation, with a focus on different aspects of statistical estimation and testing problems, the design of experiments, reliability and queueing theory, inventory analysis, and the interplay between statistical inference, machine learning methods and related applications. The refereed contributions originate from the 10th International Workshop on Simulation and Statistics, SimStat 2019, which was held in Salzburg, Austria, September 2–6, 2019, and were either presented at the conference or developed afterwards, relating closely to the topics of the workshop. The book is intended for statisticians and Ph.D. students who seek current developments and applications in the field.

Author(s): Jürgen Pilz; Viatcheslav B. Melas; Arne Bathke
Series: Contributions to Statistics
Edition: 1
Publisher: Springer Nature Switzerland
Year: 2023

Language: English
Pages: x; 265
City: Cham
Tags: Statistical Theory and Methods; Statistics and Computing/Statistics Programs; Machine Learning; Applied Statistics; Statistical Theory and Methods

Preface
Contents
Part I Invited Papers
1 Likelihood Ratios in Forensics: What They Are and What They Are Not
1.1 Introduction
1.2 Lindley's Likelihood Ratio (LLR)
1.2.1 Notations
1.2.2 A Frequentist Framework for Lindley's Likelihood Ratio (LLR)
1.3 Score-Based Likelihood Ratio (SLR)
1.3.1 The Expression of the SLR
1.3.2 The Glass Example
1.4 Discussion
References
2 MANOVA for Large Number of Treatments
2.1 Introduction
2.2 Notations and Model Setup
2.3 Simulations
2.3.1 MANOVA Tests for Large g
2.3.2 Special Case: ANOVA for Large g
2.4 Discussion and Outlook
References
3 Pollutant Dispersion Simulation by Means of a Stochastic Particle Model and a Dynamic Gaussian Plume Model
3.1 Introduction
3.2 Meteorological Monitoring Network
3.3 Wind Field Modeling
3.3.1 Mass Correction of the Wind Field
3.3.2 Plume Rise
3.4 Stochastic Particle Model
3.4.1 Deposition
3.4.2 Implementation
3.5 Dynamic Gaussian Plume Model
3.6 Implementation on the Server
3.7 A Real-World Example with Application to an Alpine Valley
3.8 Conclusions and Outlook
References
4 On an Alternative Trigonometric Strategy for StatisticalModeling
4.1 Introduction
4.2 The Alternative Sine Distribution
4.2.1 Presentation
4.2.2 Moment Properties
4.2.3 Parametric Extensions
4.3 AS Generated Family
4.3.1 Definition
4.3.2 Series Expansions
4.3.3 Example: The ASE Exponential Distribution
4.3.4 Moment Properties
4.4 Application to a Famous Cancer Data
4.5 Conclusion
References
Part II Design of Experiments
5 Incremental Construction of Nested Designs Basedon Two-Level Fractional Factorial Designs
5.1 Introduction
5.2 Greedy Coffee-House Design
5.3 Two-Level Fractional Factorial Designs
5.3.1 Half Fractions: m=1
5.3.2 Several Generators
5.3.2.1 Defining Relations
5.3.2.2 Resolution
5.3.2.3 Word Length Pattern
5.3.3 Minimum Size
5.4 Two-Level Factorial Designs and Error-Correcting Codes
5.4.1 Definitions and Properties
5.4.2 Examples
5.5 Maximin Distance Properties of Two-Level Factorial Designs
5.5.1 Neighbouring Pattern and Distant Site Pattern
5.5.2 Optimal Selection of Generators by Simulated Annealing
5.5.2.1 SA Algorithm for the Maximisation of ρH
5.6 Covering Properties of Two-Level Factorial Designs
5.6.1 Bounds on CRH(Xn)
5.6.2 Calculation of CRH(Xn)
5.6.2.1 Algorithmic Construction of a Lower Bound on CRH(Xn)
5.7 Greedy Constructions Based on Fractional Factorial Designs
5.7.1 Base Designs
5.7.2 Rescaled Designs
5.7.3 Projection Properties
5.8 Summary and Future Work
Appendix
References
6 A Study of L-Optimal Designs for the Two-Dimensional Exponential Model
6.1 Introduction
6.2 Equivalence Theorem for L-Optimal Designs
6.3 General Case
6.4 Excess and Saturated Designs
References
7 Testing for Randomized Block Single-Case Designsby Combined Permutation Tests with Multivariate Mixed Data
7.1 Introduction
7.2 Randomized Block Single-Case Designs and NPC
7.3 Simulation Study
7.4 A Real Case Study
7.5 Conclusions
References
8 Adaptive Design Criteria Motivated by a Plug-In Percentile Estimator
8.1 Introduction
8.2 Problem Formulation and Background
8.2.1 Problem Formulation
8.2.2 Background
8.3 The Plug-In Estimator
8.4 Adaptive ``Plug-In'' Criteria
8.4.1 Monte Carlo Approximation
8.4.2 Monte Carlo Approximation Assuming Independency
8.4.3 Assuming Independency and Neglecting Uncertainty
8.4.4 Using SUR Design Criterion for Exceedance Probability
8.5 Numerical Implementation
8.6 Numerical Study
8.6.1 Comparison Study
8.6.2 Methodology
8.6.2.1 Case Studies
8.6.2.2 Performance Indicators
8.6.3 Numerical Results
8.6.3.1 Estimators Performance
8.6.3.2 Implementation
8.6.3.3 Criteria
8.7 Conclusions
Appendix 1
Posterior Mean and Variance of f Under the Gaussian Process Assumption
SUR Design Criteria for Exceedance Probability Estimation
Appendix 2
References
Part III Queueing and Inventory Analysis
9 On a Parametric Estimation for a Convolutionof Exponential Densities
9.1 Introduction
9.2 Convolution of the Exponential Densities
9.3 ML Estimation of the Parameters
9.4 Parameter's Estimation by the Moments' Method
9.5 Approximation of the Density
9.6 Experimental Study
9.7 Application to a Single Queueing System M/G/1/k
9.8 Conclusions
References
10 Statistical Estimation with a Known Quantileand Its Application in a Modified ABC-XYZ Analysis
10.1 Introduction
10.2 Methods
10.2.1 Statistical Estimation with a Known Quantile
10.2.2 ABC-XYZ Analysis
10.3 ABC-XYZ Analysis Modified with a Known Quantile
10.4 Conclusions
References
Part IV Machine Learning and Applications
11 A Study of Design of Experiments and Machine Learning Methods to Improve Fault Detection Algorithms
11.1 Introduction
11.2 Design of Experiments and Machine Learning Modelling
11.3 Application to Fault Detection
11.3.1 Design of Experiments Step
11.3.2 Machine Learning Modelling Step
11.3.2.1 Refrigerant Undercharge: Fault Detection
11.3.2.2 Condenser Fouling: Fault Detection
11.4 Conclusions
References
12 Microstructure Image Segmentation Using Patch-Based Clustering Approach
12.1 Introduction
12.2 Input Data
12.3 Previous Work
12.4 Grain Segmentation
12.4.1 Seeded Region Growing (SRG)
12.4.2 Image Denoising and Patch Determination
12.4.3 Feature Extraction
12.4.4 Patch Clustering
12.4.5 Implementation
12.5 Results
12.6 Conclusion and Outlook
References
13 Clustering and Symptom Analysis in Binary Datawith Application
13.1 Introduction
13.2 The Symptom Analysis
13.2.1 The Symptom and Syndrome Definition
13.2.2 Impulse Vector and Super-symptoms
13.2.3 Prefigurations of Super-symptom
13.2.4 The Super-symptom Recovery by Vector β
13.2.5 Clustering in Dichotomous Space and Symptom Analysis
13.3 The Medical Application of the Clustering and Symptom Analysis in Binary Data
13.3.1 Dataset
13.3.2 Result and Discussion
13.4 Conclusion
References
14 Big Data for Credit Risk Analysis: Efficient Machine Learning Models Using PySpark
14.1 Introduction
14.2 Data Processing
14.2.1 Data Treatment
14.2.2 Data Storage and Distribution
14.2.3 Munge Data
14.2.4 Creating New Measures
14.2.5 Missing Values Imputation and Outliers Treatment
14.2.6 One-Hot Code and Dummy Variables
14.2.7 Final Dataset
14.3 Method and Models
14.3.1 Method
14.3.2 Model Building
14.4 Results and Credit Scorecard Conversion
14.5 Conclusion
Appendix 1
Appendix 2
References