Artificial Intelligence, Big Data and Data Science in Statistics: Challenges and Solutions in Environmetrics, the Natural Sciences and Technology

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book discusses the interplay between statistics, data science, machine learning and artificial intelligence, with a focus on environmental science, the natural sciences, and technology. It covers the state of the art from both a theoretical and a practical viewpoint and describes how to successfully apply machine learning methods, demonstrating the benefits of statistics for modeling and analyzing high-dimensional and big data. The book’s expert contributions include theoretical studies of machine learning methods, expositions of general methodologies for sound statistical analyses of data as well as novel approaches to modeling and analyzing data for specific problems and areas. In terms of applications, the contributions deal with data as arising in industrial quality control, autonomous driving, transportation and traffic, chip manufacturing, photovoltaics, football, transmission of infectious diseases, Covid-19 and public health. The book will appeal to statisticians and data scientists, as well as engineers and computer scientists working in related fields or applications.

Author(s): Ansgar Steland, Kwok-Leung Tsui
Publisher: Springer
Year: 2022

Language: English
Pages: 377
City: Cham

Preface
Contents
Part I Methodologies and Theoretical Studies
One-Round Cross-Validation and Uncertainty Determination for Randomized Neural Networks with Applications to Mobile Sensors
1 Introduction
2 Classical Neural Networks and Extreme Learning Machines
2.1 Hidden Layer Feedforward Networks
2.2 Training Neural Networks and Extreme Learning Machines
2.3 Approximation and Generalization Bounds
3 Comparing and Cross-Validating Randomized Networks
3.1 Model Comparison and Evaluation
3.2 A Simulation Experiment
3.3 One-Round Cross-Validation for Randomized Networks
3.4 An Uncertainty Interval for the Mean Sample Prediction Error with Minimal Computational Costs
4 Application to Vehicle Integrated Photovoltaics and Data Analysis
4.1 Vehicle Mounted Data Logger
4.2 Data Analysis
Appendix: Proof of Theorem 1
References
Scale Invariant and Robust Pattern Identification in Univariate Time Series, with Application to Growth Trend Detection in Music Streaming Data
1 Introduction
2 Methodology
2.1 Sliding Windows
2.2 Scale Invariance
2.3 Pattern Identification
2.4 Model Diagnostics
3 Simulation Experiments
3.1 Simulation Design
3.2 Analyses of the Simulated Data
3.3 Example Analyses of Simulated Data
3.4 Selection of the Pattern Type and Window
3.5 Simulation Results
4 Examples from the Digital Music Industry
5 Summary and Conclusions
6 Appendix
References
Fine-Tuned Parallel Piecewise Sequential Confidence Interval and Point Estimation Strategies for the Mean of a Normal Population: Big Data Context
1 Introduction
1.1 A Brief Literature Review
1.2 An Outline of the Paper
2 An Overview of FWCI and MRPE Problems
2.1 A Purely Sequential FWCI Strategy
2.2 A Purely Sequential MRPE Strategy
3 Fine-Tuned Parallel Piecewise Sequential Strategies with Asymptotically Unbiased Sample Size Estimation
3.1 Parallel Piecewise Sequential FWCI Strategies
3.2 Parallel Piecewise Sequential MRPE Strategies
3.3 Other Selected Second-Order Results
3.3.1 FWCI Problem: Asymptotic Second-Order Expansion of the Coverage Probability
3.3.2 MRPE Problem: Asymptotic Second-Order Expansion of the Regret
4 Simulation Data Analysis
4.1 FWCI Strategies
4.1.1 Descriptions of Tables 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, and 13
4.1.2 An Overview of Comments on Simulated Performances: Tables 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, and 13
4.2 MRPE Strategies
4.2.1 Descriptions of Tables 15, 16, 17, 18, 19, and 20
4.2.2 An Overview of Comments on Simulated Performances: Tables 15, 16, 17, 18, 19, and 20
5 A Real Data Illustration: The Framingham Heart Study
5.1 Illustration of FWCI
5.2 Illustration of MRPE
6 Concluding Thoughts
References
Statistical Learning for Change Point and Anomaly Detectionin Graphs
1 Introduction
2 What Is a Graph?
3 Change Point and Anomaly Detection in Network Data
3.1 What Is a Change Point?
3.2 Methods for Network Monitoring
3.3 How Can We Specify Anomaly Detection in Terms of Network Monitoring?
4 Graph Representation Learning
4.1 Shallow Embedding Methods
4.2 Graph Convolutional Networks
4.2.1 Feedforward Fully Connected Neural Networks and Convolutional Neural Networks
4.2.2 Application Phases of Graph Convolutional Networks
5 Simulation Study
5.1 Motivation
5.2 Network Definition
5.3 Generation of the Response Time Data
5.4 Road Conditions
5.5 Calibration of the Control Chart for Quantile Function Values
5.6 Construction and Training of the Graph Convolutional Network
5.7 Phase II Analysis
6 Conclusion and Discussion
References
On the Robustness of Kernel-Based Pairwise Learning
1 Introduction
2 Mathematical Prerequisites
3 Main Results
4 Discussion
Appendix
A.1 Important Definitions, Theorems, and Lemmas
A.2 Proofs
References
Global Sensitivity Analysis for the Interpretation of Machine Learning Algorithms
1 Introduction
2 Global Sensitivity Analysis
2.1 Global Sensitivity Indices
2.2 Shapley Values
2.3 Derivative-Based Global Sensitivity Measures
2.4 Estimation of Indices
3 Visualizing Interaction Structures by FANOVA Graphs
3.1 General Idea of FANOVA Graphs
3.2 Estimation and Thresholding
4 Fields of Applications
4.1 Kriging Model of a Piston Simulator
4.2 Neural Net Model of Resistance of Sailing Yachts
5 Summary
References
Improving Gaussian Process Emulators with Boundary Information
1 Introduction
2 A Motivating Example
3 Gaussian Process Modeling with Boundary Information
3.1 Review of Standard GP Emulator
3.2 Boundary Modified GP Emulator
3.3 BMGP Model for More General Cases
4 Numerical Examples
4.1 Example 1
4.2 Example 2
5 Conclusions
References
Part II Challenges and Solutions in Applications
An Overview and General Framework for Spatiotemporal Modeling and Applications in Transportation and Public Health
1 Introduction
2 Literature Review
2.1 Statistical Approaches to Spatiotemporal Modeling
2.2 Machine Learning/Deep Learning-Based Approaches to Spatiotemporal Modeling
2.3 Spatiotemporal Modeling in Transportation
2.4 Spatiotemporal Modeling in Public Health
2.5 Challenges and Opportunities
3 A General Framework for Spatiotemporal Modeling
3.1 Mechanistic/Simulation Approach
3.2 Data-Driven Approach
3.2.1 Feature Engineering
3.2.2 Feature Selection
3.2.3 Prediction Modeling
3.3 Combining the Mechanistic and the Data-Driven Approach
3.4 Evaluation Metrics and Methods
4 Examples of the General Framework for Spatiotemporal Modeling
4.1 Spatiotemporal Modeling for Road Traffic
4.2 Spatiotemporal Modeling for Transit Passenger Flow
4.3 Spatiotemporal Modeling for Air Traffic
4.4 Spatiotemporal Modeling for Infectious Disease Transmission
5 Conclusion
References
Introduction to Wafer Tomography: Likelihood-Based Prediction of Integrated-Circuit Yield
1 Introduction
2 Tomographic Models
2.1 Notation
2.2 Parameters
2.3 Effect of Classified Defects
2.4 Effect of Unclassified Defects on Inspected Layers
2.5 Effect of Uninspected Layers
2.6 Survival of All the Observed and Unobserved Defects and Other Causes
3 Computational Aspects and Optimization
3.1 Initialization of Parameter Estimates
3.2 M-Step
3.3 E-Step
3.4 Modification and Directional D-Step
3.5 Scalability and Computational Complexity
4 Goodness of Fit and Its Theoretical Limitations
5 Performance of the Algorithm and the Case Study
6 Summary and Conclusions
Appendix
Gradient of the Log Likelihood for the M-Step
Derivation of the E-Step
Prediction on Good and Failed Chips: Proof of Lemma 1
References
Uncertainty Quantification Based on Bayesian Neural Networks for Predictive Quality
1 Prediction of Quality Characteristics
2 Definition of Prediction of Quality Characteristics
3 State of Uncertainty Quantification for Predictive Quality
4 Application of Bayesian Neural Networks to the Prediction of Quality Characteristics
4.1 Interpretation in the Industrial Context
5 Concluding Remarks
References
Two Statistical Degradation Models of Batteries Under Different Operating Conditions
1 Introduction
2 Piecewise Degradation Model for Batteries in View of Capacity Recovery Phenomenon
2.1 Theory of Piecewise Degradation Modeling
2.2 Case Study A
2.2.1 Dataset Description
2.2.2 Piecewise Model Construction and Monitoring Locations of CRPs
2.2.3 RUL Prediction for Batteries
3 State-Space-Based Capacity Degradation Model for Batteries Under Different Discharge Rates
3.1 Theory of Degradation Modeling Under Different Discharge Rates
3.2 Case Study B
3.2.1 Experimental Dataset Description and Capacity Model Construction
3.2.2 RUL Prediction for Batteries at Different Discharge Rates
4 Conclusions
References
Detecting Diamond Breakouts of Diamond Impregnated Tools for Core Drilling of Concrete by Force Measurements
1 Introduction
2 Experimental Setup and the Data
3 Identification of Periods of Active Drilling
4 Identification of the Rotations
5 Calculation of Differences Between Rotations
6 Feature Generation and Classification
7 Discussion
Acknowledgments
References
Visualising Complex Data Within a Data Science Loop: A Spatio-Temporal Example from Football
1 Introduction
2 Data
3 Pitfalls and Recommendations
3.1 Raw Data Might Not Be Suitable for the Underlying Question
3.2 Raw Data Might Not Be Given on a Suitable Scale
3.3 Visualising Derived Information from (Raw) Data
3.4 Limitations of Data
4 Visualising the Data
4.1 Interactive Visualisations
4.2 Static Visualisations
5 Conclusion
References
Application of the Singular Spectrum Analysis on Electroluminescence Images of Thin-Film Photovoltaic Modules
1 Introduction
2 Methods Overview
2.1 SSA
2.2 ESPRIT
2.3 Amplitude, Phase Estimation
2.4 Implementation: Comparison to Alternatives Methods
3 Data
4 Results
4.1 Image Decomposition
4.2 Interconnection Line Detection
4.3 Inverse Characteristic Length Estimation
4.4 Stitched Image Correction
5 Conclusions
Appendix
2D-SSA Embedding and Projection
SVD
Finite Rank Signal
Computational Complexity
References
The Impact of the Lockdown Restrictions on Air Quality During COVID-19 Pandemic in Lombardy, Italy
1 Introduction
2 The Relationship Between the COVID-19 Pandemic and Air Quality in the World
3 Air Quality in Lombardy During the COVID-19 Lockdown
3.1 The COVID-19 Lockdown in Italy
4 Air Quality and Weather Data
5 Statistical Modelling
6 Model Fitting and Selection
6.1 LASSO Performances
6.2 Meteorology and Long-Run Trend
6.3 Models Fitting and Diagnostic Checks
7 Lockdown Results
7.1 Evaluation of the Lockdown Effect Based on Area Type
7.2 Geographical Distribution of the Lockdown Effect
8 Conclusions and Future Developments
References
Author Index