Data Fusion Methodology and Applications

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Data Fusion Methodology and Applications explores the data-driven discovery paradigm in science and the need to handle large amounts of diverse data. Drivers of this change include the increased availability and accessibility of hyphenated analytical platforms, imaging techniques, the explosion of omics data, and the development of information technology. As data-driven research deals with an inductive attitude that aims to extract information and build models capable of inferring the underlying phenomena from the data itself, this book explores the challenges and methodologies used to integrate data from multiple sources, analytical platforms, different modalities, and varying timescales.

Author(s): Marina Cocchi
Series: Data Handling in Science and Technology, 31
Publisher: Elsevier
Year: 2019

Language: English
Pages: 396
City: Amsterdam

Front Cover
DATA FUSION METHODOLOGY AND APPLICATIONS
DATA FUSIONMETHODOLOGY ANDAPPLICATIONS
Copyright
Contents
Contributors
Preface
1 - Introduction: Ways and Means to Deal With Data From Multiple Sources
1. MOTIVATION
2. CONTEXT, DEFINITION
3. MAIN APPROACHES
3.1 Low-Level DF
3.2 Mid-Level DF
3.3 High-Level DF
4. REMARKS IN THE USER'S PERSPECTIVE
References
2 - A Framework for Low-Level Data Fusion
1. INTRODUCTION AND MOTIVATION
1.1 Data Integration
1.2 Model-Based Data Fusion
1.3 Goals of Data Fusion
1.4 Motivating Examples
2. DATA STRUCTURES
3. FRAMEWORK FOR LOW-LEVEL DATA FUSION
3.1 Submodel per Data Block
3.1.1 Quantifications of Data Block Modes
3.1.2 Block-Specific Association Rule
3.2 Linking Structure Between Different Submodels
3.3 Examples of the Framework
4. COMMON AND DISTINCT COMPONENTS
4.1 Generic Model for Common and Distinct Components
4.2 DISCO (Distinct and Common Components)
5. EXAMPLES
5.1 Microbial Metabolomics Example
5.2 Medical Biology Example
6. CONCLUSIONS
References
3 - General Framing of Low-, Mid-, and High-Level Data Fusion With Examples in the Life Sciences
1. INTRODUCTION
2. DATA SAMPLING, MEASUREMENTS, AND PREPROCESSING
2.1 Exhaled Breath and Fecal Microbiota (Data Sets One and Two)
2.2 Beer Survey by GC and LC-MS (Data Sets Three and Four)
3. DATA FUSION STRATEGY
3.1 Taxonomy of Data Fusion
3.2 Low-Level Data Fusion
3.2.1 Unsupervised Analysis
3.2.2 Supervised Analysis
3.3 Mid-Level Data Fusion
3.4 High-Level Data Fusion
4. DATA FUSION STRATEGIES WITH EXAMPLES
4.1 Groupings and Outliers Detection: Data Preparation
4.2 Data Fusion Strategy: Data Set One and Two
4.2.1 Performance per Platform
4.2.2 Performance for Fused Data
4.3 Data Fusion Strategies: Data Set Three and Four
4.3.1 Performance per Platform
4.3.2 Performance for Fused Data
5. INTERPRETATION OF THE OUTCOMES
6. CONCLUSIONS
References
4 - Numerical Optimization-Based Algorithms for Data Fusion
1. INTRODUCTION
1.1 Outline
1.2 Notation and Definitions
2. NUMERICAL OPTIMIZATION FOR TENSOR DECOMPOSITIONS
2.1 Line Search and Trust Region
2.2 Determining Step Direction pk
2.3 Solving Hp=−g
3. CANONICAL POLYADIC DECOMPOSITION
3.1 Intermezzo: Multilinear Algebra
3.2 Gauss–Newton-Type Algorithms
3.2.1 Gradient
3.2.2 Gramian-Vector Products
3.2.3 Preconditioner
3.3 Alternating Least Squares
3.4 More General Objective Functions
4. CONSTRAINED DECOMPOSITIONS
4.1 Parametric Constraints
4.2 Projection-Based Bound Constraints
4.3 Regularization and Soft Constraints
4.4 Symmetry
5. COUPLED DECOMPOSITIONS
5.1 Exact Coupling
5.2 Approximate Coupling
6. LARGE-SCALE COMPUTATIONS
6.1 Compression
6.2 Sampling: Incompleteness, Randomization, and Updating
6.3 Exploiting Structure: Sparsity and Implicit Tensorization
6.4 Parallelization
References
5 - Recent Advances in High-Level Fusion Methods to Classify Multiple Analytical Chemical Data
1. INTRODUCTION
1.1 From Single Model Prediction to Data Fusion
1.2 Introducing High-Level Fusion
2. METHODS
2.1 Majority Voting
2.2 Bayesian Consensus
2.3 Dempster-Shafer's Theory of Evidence
3. APPLICATION ON ANALYTICAL DATA
3.1 Datasets
3.2 Classification Methods
3.3 Validation Protocol
3.4 Software
4. RESULTS
4.1 Classification by High-Level Fusion Approaches
4.2 High-Level Protective Approaches
5. CONCLUSIONS
References
6 - The Sequential and Orthogonalized PLS Regression for Multiblock Regression: Theory, Examples, and Extensions
1. INTRODUCTION
2. HOW IT STARTED
3. MODEL AND ALGORITHM
4. SOME MATHEMATICAL FORMULAE AND PROPERTIES
5. HOW TO CHOOSE THE OPTIMAL NUMBER OF COMPONENTS
5.1 The Måge Plot
6. HOW TO INTERPRET THE MODELS
6.1 Interpretation of the Scores Plots
6.2 Interpretation of the Loadings Plots
6.3 Interpretation by the Use of the PCP Plots
7. SOME FURTHER PROPERTIES OF THE SO-PLS METHOD
8. EXAMPLES OF STANDARD SO-PLS REGRESSION
9. EXTENSIONS AND MODIFICATIONS OF SO-PLS
9.1 SO-PLS Can Be Extended to Handle Multiway Data Arrays (Without Unfolding)
9.1.1 Interpretation of Scores Plots in SO-N-PLS Models
9.1.2 Interpretation of Loadings Plots/Weights Plots in SO-N-PLS Models
9.1.3 Interpretation of Regression Coefficients Plots in SO-N-PLS Models
9.1.4 Example
9.2 SO-PLS Can Be Used for Classification
9.2.1 Example Based on Lambrusco Wines
9.3 SO-PLS Can Handle Interactions Between Blocks
9.4 Variable Selection in SO-PLS
10. CONCLUSIONS
References
7 - ComDim Methods for the Analysis of Multiblock Data in a Data Fusion Perspective
1. INTRODUCTION
2. COMDIM ANALYSIS
2.1 Structure and Preprocessing of the Data
2.2 Algorithm
2.3 Global and Block Components
3. P-COMDIM ANALYSIS
3.1 Algorithm
3.2 Global and Block Components
3.3 Prediction Model
4. PATH-COMDIM ANALYSIS
5. SOFTWARE
6. ILLUSTRATION
6.1 Lignin Data
6.2 Potatoes Data
7. CONCLUSION
References
8 - Data Fusion by Multivariate Curve Resolution
1. INTRODUCTION. GENERAL MULTIVARIATE CURVE RESOLUTION FRAMEWORK. WHY TO USE IT IN DATA FUSION?
2. DATA FUSION STRUCTURES IN MCR. MULTISET ANALYSIS
3. CONSTRAINTS IN MCR. VERSATILITY LINKED TO DATA FUSION. HYBRID MODELS (HARD–SOFT, BILINEAR/MULTILINEAR)
4. LIMITATIONS OVERCOME BY MULTISET MCR ANALYSIS. BREAKING RANK DEFICIENCY AND DECREASING AMBIGUITY
5. ADDITIONAL OUTCOMES OF MCR MULTISET ANALYSIS. THE HIDDEN DIMENSIONS
6. DATA FUSION FIELDS
1 Complex analytical data
2 Process analysis
3 Environmental data sets
4 –Omics data
7. CONCLUSIONS
References
9 - Dealing With Data Heterogeneity in a Data Fusion Perspective: Models, Methodologies, and Algorithms
1. INTRODUCTION
2. OVERVIEW OF LIFE SCIENCE DATA SOURCES
3. ADDRESSING DATA HETEROGENEITY
3.1 Entity Resolution
3.1.1 Similarity-Based Techniques
3.1.2 Learning-Based Techniques
3.1.3 Similarity Functions
3.2 Data fusion
3.2.1 Dealing With Conflicts
3.2.2 The Data Fusion Process
4. LATEST TRENDS AND CHALLENGES
4.1 Big Data Integration
4.1.1 Entity Resolution With Large Volumes of Data
4.1.2 Entity Resolution With Dynamic Data
4.1.3 Data Fusion and Big Data
4.2 A Crowdsource Approach
5. CONCLUSIONS
References
10 - Data Fusion Strategies in Food Analysis
1. INTRODUCTION
2. CHEMOMETRIC STRATEGIES APPLIED IN DATA FUSION
3. BUILDING, OPTIMIZATION, AND VALIDATION OF DATA-FUSED MODELS
4. APPLICATIONS
4.1 Olive Oil
4.2 Wine
4.2.1 Geographical Traceability and Varietal Characterization of Lambrusco Di Modena Wine
4.2.1.1 Aiding Soil Characterization for Optimal Sampling
4.2.1.2 Discrimination of Varieties and Modeling Authenticity
4.3 Vinegar
4.4 Beer
4.4.1 Classification of Beers According to Their Style
4.4.2 Classification of Beers According to the Factory
4.4.3 Authentication of a Signature Beer From a Craft Brewery
4.5 Dairy Products
4.6 Tea
4.7 Other Food Products
4.7.1 Beverages
4.7.2 Fruits
4.7.3 Seeds and Their Derivatives
4.7.4 Meat and Fish
5. CONCLUSIONS
References
11 - Image Fusion
1. INTRODUCTION
2. IMAGE FUSION BY USING SINGLE FUSED DATA STRUCTURES
3. IMAGE FUSION BY CONNECTING DIFFERENT IMAGES THROUGH REGRESSION MODELS
4. IMAGE FUSION FOR SUPERRESOLUTION PURPOSES
4.1 Stochastic Superresolution Fluorescence Imaging
Sparse Deconvolution of High-Density Images
Improvements by Combining the Results on Successive Image Frames
Results on Live Cell Samples Labeled with Fluorescent Proteins
4.2 Spatially Based Superresolution Approaches
5. CONCLUSIONS
References
12 - Data Fusion of Nonoptimized Models: Applications to Outlier Detection, Classification, and Image Library Searching
1. OUTLIER DETECTION
1.1 Outlier Mathematical Notation
1.2 Outlier Measures
1.2.1 Procrustes Analysis (X-Outlier)
1.2.2 Enhanced Inverted Signal Correction Differences (X-Outlier)
1.2.3 Matrix Matching (Y-Outlier)
1.2.4 Tuning Parameter Windows (X- and Y-Outliers)
1.3 Sum of Ranking Differences
1.3.1 Outlier Detection
1.3.2 Outlier Verification
1.3.3 Sample Swamping Assessment
1.3.4 Sample Masking Assessment
1.4 Outlier Detection Example
1.5 Outlier Detection Summary
2. CLASSIFICATION
2.1 Classification Mathematical Notation
2.2 Classifiers and Fusion Rule
2.2.1 Partial Least Squares-2 Discriminant Analysis
2.2.2 k Nearest Neighbors
2.2.3 Fusion
2.3 Classification Example
2.4 Classification Summary
3. THERMAL IMAGE ANALYSIS
3.1 Thermal Image Summary
Acknowledgments
References
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Z
Back Cover