Advances in Big Data Analytics: Theory, Algorithms and Practices

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Today, big data affects countless aspects of our daily lives. This book provides a comprehensive and cutting-edge study on big data analytics, based on the research findings and applications developed by the author and his colleagues in related areas. It addresses the concepts of big data analytics and/or data science, multi-criteria optimization for learning, expert and rule-based data analysis, support vector machines for classification, feature selection, data stream analysis, learning analysis, sentiment analysis, link analysis, and evaluation analysis. The book also explores lessons learned in applying big data to business, engineering and healthcare. Lastly, it addresses the advanced topic of intelligence-quotient (IQ) tests for artificial intelligence.

Author(s): Yong Shi
Publisher: Springer
Year: 2022

Language: English
Pages: 742
City: Singapore

2
978-981-16-3607-3
Binder1
2022_Bookmatter_AdvancesInBigDataAnalytics
Preface
Contents
Shi2022_Chapter_MultipleCriteriaOptimizationCl
2 Multiple Criteria Optimization Classification
2.1 Multi-criteria Linear Programming for Supervised Learning
2.1.1 Error Correction Method in Classification by Using Multiple-Criteria and Multiple-Constraint Levels Linear Programming
2.1.2 Multi-instance Classification Based on Regularized Multiple Criteria Linear Programming
2.1.3 Supportive Instances for Regularized Multiple Criteria Linear Programming Classification
2.1.4 Kernel Based Simple Regularized Multiple Criteria Linear Programming for Binary Classification and Regression
2.2 Multiple Criteria Linear Programming with Expert and Rule Based Knowledge
2.2.1 A Group of Knowledge-Incorporated Multiple Criteria Linear Programming Classifier
2.2.2 Decision Rule Extraction for Regularized Multiple Criteria Linear Programming Model
2.3 Multiple-Criteria Decision Making Based Data Analysis
2.3.1 A Multicriteria Decision Making Approach for Estimating the Number of Clusters
2.3.1.1 MCDM Methods
2.3.1.2 Clustering Algorithm
2.3.1.3 Clustering Validity Measures
2.3.2 Parallel Regularized Multiple Criteria Linear Programming Classification Algorithm
2.3.3 An Effective Intrusion Detection Framework Based on Multiple Criteria Linear Programming and Support Vector Machine
2.3.3.1 Discrete Binary PSO
References
Shi2022_Chapter_SupportVectorMachineClassifica
3 Support Vector Machine Classification
3.1 Support Vector Machine in Data Analytics
3.1.1 Recent Advances on Support Vector Machines Research
3.1.1.1 The Nature of C-Support Vector Machines
3.1.1.2 Optimization Models of Support Vector Machines
3.1.1.3 Universum Support Vector Machine
3.1.1.4 Robust Support Vector Machine
3.1.1.5 Knowledge Based Support Vector Machine
3.1.1.6 Multi-instance Support Vector Machine
3.1.2 Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines
3.1.2.1 The Decomposition Algorithm Framework
3.1.2.2 Using First Order Information for Working Set Selection
3.1.2.3 Using Second Order Information for Working Set Selection
3.1.2.4 Global Convergence Analysis
3.2 Twin Support Vector Machine in Classification
3.2.1 Improved Twin Support Vector Machine
3.2.1.1 TBSVM (Twin Bounded Support Vector Machine)
3.2.1.2 Improved TWSVM
3.2.1.3 Fast Solvers for ITSVM
3.2.2 Extending Twin Support Vector Machine Classifier for Multi-category Classification Problems
3.2.2.1 One-Versus-All Twin Support Vector Machines
3.2.3 Robust Twin Support Vector Machine for Pattern Classification
3.2.3.1 Robust Twin Support Vector Machine ( R -TWSVM)
3.2.4 Structural Twin Support Vector Machine for Classification
3.2.4.1 Structural Twin Support Vector Machine (S-TWSVM)
3.3 Nonparallel Support Vector Machine Classifiers
3.3.1 A Nonparallel Support Vector Machine for a Classification Problem with Universum Learning
3.3.1.1 Nonparallel SVM for Classification with a Universum: U-NSVM
3.3.2 A Divide-and-Combine Method for Large Scale Nonparallel Support Vector Machines
3.3.2.1 NPSVM
3.3.2.2 A Divide-and-Combine NPSVM Solver with a Single Level
3.3.2.3 Divide and Combine NPSVM with Multiple Levels
3.3.3 Nonparallel Support Vector Machines for Pattern Classification
3.3.3.1 NPSVM
3.3.4 A Multi-instance Learning Algorithm Based on Nonparallel Classifier
3.3.4.1 MI-NSVM
3.4 Laplacian Support Vector Machine Classifiers
3.4.1 Successive Overrelaxation for Laplacian Support Vector Machine
3.4.1.1 Background
3.4.1.2 FLAPSVM
3.4.1.3 Implementation Issues
3.4.1.4 Complexity Analysis
3.4.2 Laplacian Twin Support Vector Machine for Semi-supervised Classification
3.4.2.1 Laplacian Twin Support Vector Machine for Semi-supervised Classification (Called Lap-TSVM)
3.5 Loss Function of Support Vector Machine Classification
3.5.1 Ramp Loss Least Squares Support Vector Machine
3.5.1.1 Background
3.5.1.2 Ramp Loss ISSVM
3.5.2 Ramp Loss Nonparallel Support Vector Machine for Pattern Classification
3.5.2.1 Ramp Loss NPSVM
3.5.3 A New Classification Model Using Privileged Information and Its Application
3.5.3.1 Fast Twin Support Vector Machine Using Privileged Information (FTSVMPI)
References
2022_Bookmatter_AdvancesInBigDataAnalytics (1)
Part II Functional Analysis
Shi2022_Chapter_FeatureSelection
4 Feature Selection
4.1 Systematic Methods for Feature Selection
4.1.1 An Integrated Feature Selection and Classification Scheme
4.1.1.1 Proposed Feature Selection Methods
4.1.1.2 MCDM Methods
4.1.1.3 Classification Algorithms
4.1.1.4 Performance Measures
4.1.1.5 Experimental Design
4.1.2 Two-Stage Hybrid Feature Selection Algorithms
4.1.2.1 Generalized F-Score
4.1.2.2 The New Classification Accuracy Measure
4.1.2.3 Several Hybrid Feature Selection Algorithms
4.1.3 Feature Selection with Attributes Clustering by Maximal Information Coefficient
4.1.3.1 Maximal Information Coefficient
4.1.3.2 Affinity Propagation Clustering
4.1.3.3 Attributes Clustering by Maximal Information Coefficient
4.2 Regularizations for Feature Selections
4.2.1 Supervised Feature Selection with 2, 1 - 2 Regularization
4.2.1.1 Feature Selection with Sparse Learning
4.2.1.2 ConCave-Convex Procedure
4.2.1.3 Supervised Feature Selection with the 2, 1 − 2 Regularization
4.2.2 Feature Selection with 2, 1 − 2 Regularization
4.2.2.1 Algorithm
4.2.3 Feature Selection with MCP2 Regularization
4.2.3.1 Sparse Regularization for Vectors
4.2.3.2 Sparse Regularization for Matrices
4.2.3.3 The Proposed Model
4.2.3.4 The Optimization Algorithm
4.2.3.5 Computational Complexity
4.3 Distance-Based Feature Selections
4.3.1 Spatial Distance Join Based Feature Selection
4.3.1.1 Fundamental Concepts
4.3.1.2 Feature Selection in the SDJ Framework
4.3.2 Domain Driven Two-Phase Feature Selection Method Based on Bhattacharyya Distance and Kernel Distance Measurements
4.3.2.1 Preliminary Feature Selection Based on Bhattacharyya Distance Measurement
4.3.2.2 Second-Phase Feature Selection Based on Kernel Distance Measurement
References
Shi2022_Chapter_DataStreamAnalysis
5 Data Stream Analysis
5.1 Application-Driven Classification of Data Streams
5.1.1 Data Streams in Big Data
5.1.2 Categorization of Training Examples and Learning Cases
5.1.2.1 Categorization of Training Examples
5.1.3 Learning Models of Data Stream
5.1.3.1 Solution to the TS3VM Objective Function
5.2 Robust Ensemble Learning for Mining Noisy Data Streams
5.2.1 Noisy Description for Data Stream
5.2.2 Ensemble Frameworks for Mining Data Stream
5.2.2.1 Horizontal Ensemble and Weighted Ensemble Frameworks
5.2.2.2 Vertical Ensemble Framework
5.2.2.3 Aggregate Ensemble Framework
5.2.3 Theoretical Studies of the Aggregate Ensemble
5.2.3.1 Performance Study of AE Framework
5.2.3.2 Time Complexity Analysis
References
Shi2022_Chapter_LearningAnalysis
6 Learning Analysis
6.1 Concept of the View of Learning
6.1.1 Concept-Cognitive Learning Model for Incremental Concept Learning
6.1.1.1 Preliminaries
6.1.1.2 Theoretical Foundation
6.1.1.3 Proposed Model
6.1.2 Concurrent Concept-Cognitive Learning Model for Classification
6.1.2.1 Initial Concurrent Concept Learning in C3LM
6.1.2.2 Concurrent Concept-Cognitive Process in C3LM
6.1.2.3 Concept Generalization Process in C3LM
6.1.3 Semi-Supervised Concept Learning by Concept-Cognitive Learning and ConceptualClustering
6.1.3.1 Concept Space with Structural Information
6.1.3.2 Cognitive Process with Unlabeled Data in Concept Learning
6.1.3.3 Concept Recognition
6.1.3.4 Theoretical Analysis
6.1.3.5 Framework and Computational Complexity Analysis
6.1.4 Fuzzy-Based Concept Learning Method: Exploiting Data with Fuzzy Conceptual Clustering
6.1.4.1 Preliminaries
6.1.4.2 Fuzzy Concept Learning Method
6.1.4.3 Theoretical Analysis
6.2 Label Proportion for Learning
6.2.1 A Fast Algorithm for Multi-Class Learning from Label Proportions
6.2.1.1 Background
6.2.1.2 The LLP-ELM Algorithm
6.2.2 Learning from Label Proportions with Generative Adversarial Networks
6.2.2.1 Preliminaries
6.2.2.2 Adversarial Learning for LLP
6.2.3 Learning from Label Proportions on High-Dimensional Data
6.2.3.1 Background
6.2.3.2 The LLP-RF Algorithm
6.2.4 Learning from Label Proportions with Pinball Loss
6.2.4.1 Preliminary
6.2.4.2 Noise and Pinball Loss
6.2.4.3 Learning from Label Proportions Model with Pinball Loss
6.2.4.4 Dual Problem
6.2.4.5 Overall Optimization Procedure
6.2.4.6 Complexity
6.3 Other Enlarged Learning Models
6.3.1 Classifying with Adaptive Hyper-Spheres: An Incremental Classifier Based on CompetitiveLearning
6.3.1.1 Basic Theory
6.3.1.2 Proposed Classifier: ADA-HS
6.3.2 A Construction of Robust Representations for Small Data Sets Using Broad Learning System
6.3.2.1 Review of Broad Learning System
6.3.2.2 Proposed BLS Framework and BLS with RLA
References
Shi2022_Chapter_SentimentAnalysis
7 Sentiment Analysis
7.1 Word Embedding
7.1.1 Methods: Single Sense Model vs Multiple Sense Model
7.1.2 Evaluation: Intrinsic vs Extrinsic
7.2 Sentiment Analysis Applications
References
Shi2022_Chapter_LinkAnalysis
8 Link Analysis
8.1 Recommender System for Marketing Optimization
8.1.1 Terminologies and Related Techniques
8.1.1.1 Score Matrix
8.1.1.2 Weibull Distribution
8.1.1.3 Gradient Descent
8.1.1.4 Loss Function and Measurement
8.1.2 Trigger and Triggered Model
8.1.2.1 Meaningful Trigger and Triggered Pairs
8.1.2.2 Transformation of Trigger and Triggered Pairs
8.1.2.3 Extract Meaningful Pairs
8.1.3 Trigger-Triggered Model for the Anonymous Recommendation
8.1.4 Trigger-Triggered Model for Product Promotion
8.2 Advertisement Clicking Prediction by Using Multiple Criteria Mathematical Programming
8.2.1 Research Background of Behavioral Targeting
8.2.1.1 Concept of Click-Through Rate
8.2.1.2 Concept of Clicking Events Prediction
8.2.2 Feature Creation and Selection
8.2.2.1 Feature Creation Method for T-Set-1
8.2.2.2 Feature Creation Method for T-Set-2
8.2.2.3 Normalization
8.2.2.4 Categorization Method for Positive/Negative Samples
8.2.2.5 Confusion Matrix
8.2.2.6 Receiver Operating Characteristics (ROC) Graph
8.3 Customer Churn Prediction Based on Feature Clustering and Nonparallel Support Vector Machine
8.3.1 Related Work
8.3.1.1 Maximal Information Coefficient
8.3.1.2 Affinity Propagation Clustering
8.3.1.3 Nonparallel Support Vector Machine
8.3.2 Customer Churn Prediction with NPSVM
8.4 Node-Coupling Clustering Approaches for Link Prediction
8.4.1 Preliminaries
8.4.1.1 Clustering Coefficient
8.4.1.2 Evaluation Metrics
8.4.2 Node-Coupling Clustering Approaches
8.4.2.1 Node-Coupling Clustering Coefficient
8.4.2.2 Node-Coupling Clustering Approach Based on Probability Theory (NCCPT)
8.4.3 Node-Coupling Clustering Approach Based on Common Neighbors (NCCCN)
8.4.4 The Extensions of NCCPT and NCCCN
8.4.5 Complexity Analysis of Our Approaches
8.5 Pyramid Scheme Model for Consumption Rebate Frauds
8.5.1 Networks
8.5.1.1 Tree Network
8.5.1.2 Random Network
8.5.1.3 Small-World Network
8.5.1.4 Scale-Free Network
8.5.2 The Model
8.5.2.1 Assumptions
8.5.2.2 Tree Network Case
8.5.2.3 Random Network Case
8.5.2.4 Small World Network Case
8.5.2.5 Scale-Free Network Case
8.5.3 A Pyramid Scheme in Real World
References
Shi2022_Chapter_EvaluationAnalysis
9 Evaluation Analysis
9.1 Reviews of Evaluation Formations
9.1.1 Decision-Making Support for the Evaluation of Clustering Algorithms Based on MCDM
9.1.1.1 Clustering Algorithms
9.1.1.2 MCDM Methods
9.1.1.3 PROMETHEE II
9.1.1.4 Performance Measures
9.1.1.5 The Contingency Table
9.1.1.6 Index Weights
9.1.1.7 The Proposed Model
9.1.2 Evaluation of Classification Algorithms Using MCDM And Rank Correlation
9.1.2.1 Two MCDM Methods
9.1.2.2 Spearman's Rank Correlation Coefficient
9.1.3 Public Blockchain Evaluation Using Entropy and TOPSIS
9.1.3.1 Proposed Evaluation Model
9.2 Evaluation Methods for Software
9.2.1 Classifier Evaluation for Software Defect Prediction
9.2.1.1 Research Methodology
9.2.1.2 Experimental Study
9.2.2 Ensemble of Software Defect Predictors: An AHP-Based Evaluation Method
9.2.2.1 Ensemble Methods
9.2.2.2 Selected Classification Models
9.2.2.3 The Analytic Hierarchy Process (AHP)
9.3 Evaluation Methods for Sociology and Economics
9.3.1 Delivery Efficiency and Supplier Performance Evaluation in China's E-Retailing Industry
9.3.1.1 Case, Research Problem and Data
9.3.1.2 Research Methodology
9.3.1.3 Variables Description
9.3.1.4 Empirical Results
9.3.1.5 Operations Model Comparison
9.3.2 Credit Risk Evaluation with Kernel-Based Affine Subspace Nearest Points Learning Method
9.3.2.1 Affine Subspace Nearest Point Algorithm
9.3.2.2 Affine Subspace Nearest Points (ASNP) Algorithm
9.3.2.3 Kernel Affine Subspace Nearest Points (KASNP) Algorithm
9.3.2.4 Two-Spiral Problem Test
9.3.2.5 Credit Evaluation Applications and Experiments
9.3.2.6 Discussion
9.3.3 A Dynamic Assessment Method for Urban Eco-Environmental Quality Evaluation
9.3.3.1 Related Works
9.3.3.2 Selecting Indicators
9.3.3.3 Dynamic Technique for Order Preference by Similarity to Ideal Solution Evaluation Method
9.3.4 An Empirical Study of Classification Algorithm Evaluation for Financial Risk Prediction
9.3.4.1 Evaluation Approach for Classification Algorithms
9.3.4.2 Classification Algorithms
9.3.4.3 Financial Risk Datasets
9.3.4.4 Experimental Design
9.3.4.5 Results and Discussion
9.3.4.6 Knowledge-Rich Financial Risk Management Process
References
2022_Bookmatter_AdvancesInBigDataAnalytics (2)
Part III Application and Future Analysis
Shi2022_Chapter_BusinessAndEngineeringApplicat
10 Business and Engineering Applications
10.1 Banking and Financial Market Analysis
10.1.1 Domestic Systemically Important Banks: A Quantitative Analysis for the Chinese Banking System
10.1.1.1 Literature Review
10.1.1.2 Methodology and Data
10.1.1.3 Copula Approach
10.1.1.4 Data Description
10.1.1.5 Quantitative Results
10.1.2 How Does Credit Portfolio Diversification Affect Banks' Return and Risk? Evidence from Chinese Listed Commercial Banks
10.1.2.1 Methodology
10.1.2.2 Model Specification
10.1.2.3 Data
10.1.3 A New Approach of Integrating Piecewise Linear Representation and Weighted Support Vector Machine for Forecasting Stock Turning Points
10.1.3.1 Literature Review
10.2 Agriculture Classification
10.2.1 An Alternative Approach for the Classification of Orange Varieties Based on Near Infrared Spectroscopy
10.2.1.1 Materials and Methods
10.2.1.2 Results and Discussions
10.3 Engineering Problems
10.3.1 Automatic Road Crack Detection Using Random Structured Forests
10.3.1.1 Related Work
10.3.1.2 Automatic Road Crack Detection
10.3.2 Efficient Railway Tracks Detection and Turnouts Recognition Method Using HOG Features
10.3.2.1 Railway Tracks Detection Using HOG Features
10.3.2.2 Railway Tracks Detection Based on Region-Growing Algorithm
10.3.2.3 Railway Turnouts Recognition
References
Shi2022_Chapter_HealthcareApplications
11 Healthcare Applications
11.1 Evaluating Doctor Performance: Ordinal Regression-Based Approach
11.1.1 Methods
11.1.1.1 Preprocessing and Text Representation
11.1.1.2 Model Training
11.1.1.3 Model Prediction
11.1.1.4 Statistical Methods and Evaluation Metrics
11.1.1.5 Mining Predictive Features
11.2 Transmission Patterns of COVID-19 Outbreak
11.2.1 Methods
11.2.1.1 Scope of This Study
11.2.1.2 Data Sources
11.2.1.3 Age-Specific Social Contact Characterization
11.2.1.4 Role of the Funding Source
11.2.2 Results
11.2.2.1 Social Contact-Based Transmission Characterization
11.2.2.2 Retrospective Analysis of the Disease Outbreak
11.2.2.3 Prospective Analysis of Disease Transmission Risks and Economic Impacts
11.2.2.4 Sensitivity to Parameter Variations
11.2.3 Discussion
References
Shi2022_Chapter_ArtificialIntelligenceIQTest
12 Artificial Intelligence IQ Test
12.1 A Basic AI-IQ Test
12.1.1 The Concepts of AI-IQ Test
12.1.1.1 A Small Sample of AI-IQ Test
12.1.2 A Data Mining for Features of AI-IQ Test
12.1.2.1 A Large Sample of AI-IQ Test
12.1.2.2 In-Sample Experiment
12.1.2.3 Out-Sample Experiment
12.1.3 A Standard Intelligence Model
12.1.3.1 Extensions of the von Neumann Architecture
12.2 Laws of Intelligence Based on AI IQ Research
12.2.1 Law of Intelligent Model (M Law)
12.2.2 Absolute 0 Agents (αpoint)
12.2.3 Omniscient and Omnipotent Agents (point)
12.2.4 Conventional Agent (aC)
12.2.5 Relative 0 Agent (aR)
12.2.6 Shared Agent (aGor AG)
12.2.7 Universe Agent (aU)
12.2.8 Law of Intelligence Evolution ( Law)
12.2.8.1 FΩ(Ωgravity)
12.2.8.2 Falpha (αgravity)
12.2.8.3 Agent of Life and Agent of Engineering (aLand aE)
12.2.8.4 Intelligence
12.2.8.5 Consciousness
12.2.8.6 Law of Intelligence (Zero-Infinity) Duality (A Law)
12.3 A Fuzzy Cognitive Map-Based Approach Finding Characteristics on AI-IQ Test
12.3.1 Research Method
12.3.1.1 Methodology
12.3.1.2 Linguistic Term Analyses
12.3.1.3 Defuzzification Method
12.3.2 Data Analysis
12.3.2.1 Fuzzy Cognitive Map Results
12.3.2.2 FCM Steady-State Analysis
12.3.3 Dynamic Scenario Analysis of the AI System IQ
12.3.3.1 Worst and Best-Case Scenario
12.3.3.2 FCM Inference Simulation
References
2022_Bookmatter_AdvancesInBigDataAnalytics (3)
Conclusions
References