Advances in Data Science and Analytics: Concepts and Paradigms

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

ADVANCES in DATA SCIENCE and ANALYTICS

Presenting the concepts and advances of data science and analytics, this volume, written and edited by a global team of experts, also goes into the practical applications that can be utilized across multiple disciplines and industries, for both the engineer and the student, focusing on machining learning, big data, business intelligence, and analytics.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning, and big data. Data analytics software is a more focused version of this and can even be considered part of the larger process. Analytics is devoted to realizing actionable insights that can be applied immediately based on existing queries. For the purposes of this volume, data science is an umbrella term that encompasses data analytics, data mining, machine learning, and several other related disciplines. While a data scientist is expected to forecast the future based on past patterns, data analysts extract meaningful insights from various data sources.

Although data mining and other related areas have been around for a few decades, data science and analytics are still quickly evolving, and the processes and technologies change, almost on a day-to-day basis. This volume provides an overview of some of the most important advances in these areas today, including practical coverage of the daily applications. Valuable as a learning tool for beginners in this area as well as a daily reference for engineers and scientists working in these areas, this is a must-have for any library.

Author(s): M. Niranjanamurthy, Hemant Kumar Gianey, Amir H. Gandomi
Series: Advances in Data Engineering and Machine Learning
Publisher: Wiley-Scrivener
Year: 2022

Language: English
Pages: 351
City: Beverly

Cover
Title Page
Copyright Page
Contents
Preface
Chapter 1 Implementation Tools for Generating Statistical Consequence Using Data Visualization Techniques
1.1 Introduction
1.2 Literature Review
1.3 Tools in Data Visualization
1.4 Methodology
1.4.1 Plotting the Data
1.4.2 Plotting the Model on Data
1.4.3 Quantifying Linear Relationships
1.4.4 Covariance vs. Correlation
1.5 Conclusion
References
Chapter 2 Decision Making and Predictive Analysis for Real Time Data
2.1 Introduction
2.2 Data Analytics
2.2.1 Descriptive Analytics
2.2.2 Diagnostic Analytics
2.2.3 Predictive Analytics
2.2.4 Prescriptive Analytics
2.3 Predictive Modeling
2.4 Categories of Predictive Models
2.5 Process of Predictive Modeling
2.5.1 Requirement Gathering
2.5.2 Data Gathering
2.5.3 Data Analysis and Massaging
2.5.4 Machine Learning Statistics
2.5.5 Predictive Modeling
2.5.6 Prediction and Decision Making
2.6 Predictive Analytics Opportunities
2.6.1 Detecting Fraud
2.6.2 Reduction of Risk
2.6.3 Marketing Campaign Optimization
2.6.4 Operation Improvement
2.6.5 Clinical Decision Support System
2.7 Classification of Predictive Analytics Models
2.7.1 Predictive Models
2.7.2 Descriptive Models
2.7.3 Decision Models
2.8 Predictive Analytics Techniques
2.8.1 Predictive Analytics Software
2.8.2 The Importance of Good Data
2.8.3 Predictive Analytics vs. Business Intelligence
2.8.4 Pricing Information
2.9 Data Analysis Tools
2.9.1 Excel
2.9.2 Tableau
2.9.3 Power BI
2.9.4 Fine Report
2.9.5 R & Python
2.10 Advantages & Disadvantages of Predictive Modeling
2.10.1 Advantages
2.10.2 Disadvantages
2.10.2.1 Data Labeling
2.10.2.2 Obtaining Massive Training Datasets
2.10.2.3 The Explainability Problem
2.10.2.4 Generalizability of Learning
2.10.2.5 Bias in Algorithms and Data
2.11 Predictive Analytics Biggest Impact
2.11.1 Predicting Demand
2.11.2 Transformation Using Technology and Process
2.11.3 Improved Pricing
2.11.4 Predictive Maintenance
2.12 Application of Predictive Analytics
2.12.1 Financial and Banking Services
2.12.2 Retail
2.12.3 Health and Insurance
2.12.4 Oil and Gas Utilities
2.12.5 Public Sector
2.13 Future Scope of Predictive Modeling
2.13.1 Technological Advancements
2.13.2 Changes in Work
2.13.3 Risk Mitigation
2.14 Conclusion
References
Chapter 3 Optimizing Water Quality with Data Analytics and Machine Learning
3.1 Introduction
3.2 Related Work
3.3 Data Sources and Collection
3.4 Water Demand Forecasting
3.4.1 Network Flow and Zone Demand Estimation
3.4.2 Demand Forecasting
3.4.2.1 Feature Importance
3.4.2.2 Forecast Horizon
3.4.3 Performance Characterization
3.5 Re-Chlorination Optimization
3.5.1 Data
3.5.2 Water Age Estimation
3.5.2.1 Travel Time Estimation
3.5.2.2 Residential Time Estimation
3.5.3 Ammonia Prediction
3.5.4 Optimization Model Definition
3.5.5 Improvements in Customer Water Quality
3.5.6 Plant Dosing Optimization
3.6 Conclusion
Acknowledgements
References
Chapter 4 Lip Reading Framework using Deep Learning and Machine Learning
4.1 Introduction
4.1.1 Overview
4.1.2 Motivation
4.1.3 Lip Reading System Outcomes and Deliverables
4.2 The Emergence and Definition of the Lip-Reading System
4.2.1 Background of Domain
4.2.2 Identified Problems
4.2.3 Tools and Technologies Used
4.2.4 Implementation Aspects
4.2.4.1 Data Preparation
4.3 Design and Components of Lip-Reading System
4.4 Lip Reading System Architecture
4.5 Testing
4.6 Problems Encountered During Implementation
4.6.1 Assumptions and Constraints
4.7 Conclusion
4.8 Future Work
References
Chapter 5 New Perspective to Management, Economic Growth and Debt Nexus Analysis: Evidence from Indian Economy
5.1 Introduction
5.2 Literature Review
5.2.1 External Debt and Economic Growth
5.2.2 Trade Openness, FDI, and Economic Growth
5.2.3 FDI and Economic Growth
5.3 Data
5.3.1 Analytical Framework and Data Description
5.3.2 Theoretical Background and Specifications
5.3.2.1 Model Specification
5.4 Methodology and Findings
5.4.1 Unit Root Testing
5.4.2 Cointegration
5.4.3 Vector Error Correction Model
5.4.4 Long-Run Relationship Estimation
5.4.5 Causality Test
5.5 Conclusion and Policy Implications
Declarations
Availability of Data and Materials
Competing Interests
Funding
Authors’ Contributions
Acknowledgments
References
Chapter 6 Data-Driven Delay Analysis with Applications to Railway Networks
6.1 Introduction
6.2 Related Works
6.3 Background Knowledge
6.3.1 Background and Problem Formulation
6.3.1.1 Train Delay
6.3.1.2 Delay Propagation
6.3.2 Preliminaries
6.3.2.1 Bayesian Inference
6.3.2.2 Markov Property
6.4 Delay Propagation Model
6.4.1 Conditional Bayesian Delay Propagation
6.4.1.1 Delay Self-Propagation
6.4.1.2 Incremental Run-Time Delay
6.4.1.3 Incremental Dwell Time Delay
6.4.1.4 Accumulative Departure Delay
6.4.2 Cross-Line Propagation, Backward Propagation and Train Connection Propagation
6.5 Primary Delay Tracing Back
6.5.1 Delay Candidates Selection
6.5.2 Relation Construction
6.5.2.1 Preceding and Following Trains
6.5.2.2 Preceding and Connecting Trains
6.6 Evaluation on Dwell Time Improvement Strategy
6.7 Experiments
6.7.1 Experiment Setting
6.7.2 Temporal Prediction of Delay Propagation
6.7.3 Spatial Prediction of Delay Propagation
6.7.4 Case Study of Primary Delay Tracing Down
6.7.5 Evaluation of Dwell Time Improvement Strategy
6.8 Conclusion
References
Chapter 7 Proposing a Framework to Analyze Breast Cancer in Mammogram Images Using Global Thresholding, Gray Level Co-Occurrence Matrix, and Convolutional Neural Network (CNN)
7.1 Introduction & Purpose of Study
7.1.1 Segmentation
7.1.1.1 Types of Segmentation
7.1.2 Compression
7.2 Literature Review & Motivation
7.3 Proposed Work
7.3.1 Algorithm
7.3.2 Explanation
7.3.3 Flowchart
7.4 Observation Tables and Figures
7.5 Conclusion
7.6 Future Work
References
Chapter 8 IoT Technologies for Smart Healthcare
8.1 Introduction
8.2 Literature Review
8.2.1 IoT-Based Smart Health
8.2.2 Advantages of Applying IoT in Health
8.3 Findings
8.3.1 Significant Features and Applications of IoT in Health
8.3.1.1 Simultaneous Monitoring and Reporting
8.3.1.2 End-to-End Connectivity and Affordability
8.3.1.3 Data Analysis
8.3.1.4 Tracking, Alerts, and Remote Medical Care
8.3.1.5 Research
8.3.1.6 Patient-Generated Health Data (PGHD)
8.3.1.7 Management of Chronic Diseases and Preventative Care
8.3.1.8 Home-Based and Short-Term Care
8.4 Case Study: CyberMed as an IoT-Based Smart Health Model
8.5 Discussions
8.5.1 Limitations of Adopting IoT in Health
8.5.1.1 Data Security and Privacy
8.5.1.2 Connectivity
8.5.1.3 Compatibility and Data Integration
8.5.1.4 Implementation Cost
8.5.1.5 Complexity and Risk of Errors
8.6 Future Insights
8.7 Conclusions
References
Chapter 9 Enhancement of Scalability of SVM Classifiers for Big Data
9.1 Introduction
9.2 Support Vector Machine
9.2.1 Challenges
9.3 Parallel and Distributed Mechanism
9.3.1 Shared-Memory Parallelism
9.4 Distributed Big Data Architecture
9.4.1 Hadoop MapReduce
9.4.2 Spark
9.4.3 AKKA
9.5 Distributed High Performance Computing
9.5.1 GASNet
9.5.2 Charm++
9.6 GPU Based Parallelism
9.6.1 CUDA
9.6.2 OpenCL
9.7 Parallel and Distributed SVM Algorithms
9.7.1 LS-SVM
9.7.2 Cascade SVM
9.7.3 DC SVM
9.7.4 Parallel Distributed Multiclass SVM Algorithms
9.8 Conclusion and Future Research Directions
References
Chapter 10 Electrical Network-Related Incident Prediction Based on Weather Factors
10.1 Introduction
10.2 Related Work
10.3 Methodology
10.3.1 Binary Classification of Incident and Normality
10.3.2 Incident Categorization Using Natural Language Processing
10.3.3 Classification of Multiple Types of Incidents
10.4 Experiments
10.4.1 Data Sets
10.4.2 Evaluation Metrics
10.4.3 Binary Classification
10.4.4 Incident Categorization
10.4.5 Multi-Class Classification
10.5 Conclusion and Future Work
Acknowledgements
References
Chapter 11 Green IoT: Environment-Friendly Approach to IoT
11.1 Introduction
11.2 G-IoT (Green Internet of Things)
11.3 Layered Architecture of G-IoT
11.3.1 Data Center/Cloud
11.3.2 Data Analytics and Control Applications It
11.3.3 Data Aggregation and Storage
11.3.4 Edge Computing
11.3.5 Communication and Processing Unit
11.4 Techniques for Implementation of G-IoT
11.5 Power Saving Methods Based on Components
11.6 Applications of G-IoT
11.7 Challenges and Future Scope
11.8 Case Study
11.9 Conclusion
References
Chapter 12 Big-Data Analytics: A New Paradigm Shift in Micro Finance Industry
12.1 Introduction
12.2 Reality of Area and Transcendent Difficulties
12.2.1 Probable Overlending
12.2.2 Information Imbalance
12.2.3 Retreating Not-for-Profit Sector
12.2.4 Neighbourhood Pressure
12.3 Data Analytics in Microfinance
12.3.1 Types of Data Analytics Used in Microfinance
12.3.2 Use of Big Data in Microfinance Industry
12.3.3 Risk and Data Based Credit Decisions
12.3.4 Product Development and Selection
12.3.5 Product or Service Positioning
12.3.6 M-Commerce and E-Payments
12.3.7 Making Reliable Credit Decisions
12.3.8 Big Data-Driven Model Promises Psychometric Evaluations
12.3.9 Product Build-Up, Service Positioning, and Offering
12.4 Opportunities and Risks in Using Data Analytics
12.5 Risk in Utilizing Big Data
12.6 Conclusion
References
Chapter 13 Big Data Storage and Analysis
13.1 Introduction
13.1.1 6 V’s of Big Data
13.1.2 Types of Data
13.1.3 Issues in Handling Big Data
13.2 Hadoop as a Solution to Challenges of Big Data
13.2.1 The Hadoop Ecosystem
13.2.2 Rack Awareness Policy in HDFS
13.3 In-Memory Storage and NoSQL
13.3.1 Key-Value Data Stores
13.3.2 Document Stores
13.3.3 Wide Column Stores
13.3.4 Graph Stores
13.3.5 Multi-Modal Databases
13.4 Advantages of NoSQL Database
13.5 Conclusion
References
Chapter 14 A Framework for Analysing Social Media and Digital Data by Applying Machine Learning Techniques for Pandemic Management
14.1 Introduction
14.2 Literature Review
14.3 Understanding Pandemic Analogous to a Disaster
14.4 Application of Machine Learning Techniques at Various Phases of Pandemic Management
14.4.1 Mitigation Phase
14.4.2 Preparedness Phase
14.4.3 Response Phase
14.4.4 Recovery Phase
14.5 Generalized Framework to Apply Machine Learning Techniques for Pandemic Management
14.6 Conclusion
References
About the Editors
Index
EULA