With the advent of such advanced technologies as cloud computing, the Internet of Things, the Medical Internet of Things, the Industry Internet of Things and sensor networks as well as the exponential growth in the usage of Internet-based and social media platforms, there are enormous oceans of data. These huge volumes of data can be used for effective decision making and improved performance if analyzed properly. Due to its inherent characteristics, big data is very complex and cannot be handled and processed by traditional database management approaches. There is a need for sophisticated approaches, tools and technologies that can be used to store, manage and analyze these enormous amounts of data to make the best use of them.
Big Data Concepts, Technologies, and Applications covers the concepts, technologies, and applications of big data analytics. Presenting the state-of-the-art technologies in use for big data analytics. it provides an in-depth discussion about the important sectors where big data analytics has proven to be very effective in improving performance and helping industries to remain competitive. This book provides insight into the novel areas of big data analytics and the research directions for the scholars working in the domain. Highlights include
The advantages, disadvantages and challenges of big data analytics
State-of-the-art technologies for big data analytics such as Hadoop, NoSQL databases, data lakes, deep learning and blockchain
The application of big data analytic in healthcare, business, social media analytics, fraud detection and prevention and governance
Exploring the concepts and technologies behind big data analytics, the book is an ideal resource for researchers, students, data scientists, data analysts and business analysts who need insight into big data analytics
Author(s): Mohammad Shahid Husain, Mohammad Zunnun Khan, Tamanna Siddiqui
Publisher: CRC Press
Year: 2023
Language: English
Pages: 216
Cover
Half Title
Title Page
Copyright Page
Table of Contents
Preface
Acknowledgments
List of Figures
List of Tables
About the Authors
Section A Understanding Big Data
Chapter 1 Overview of Big Data
1.1 Introduction
1.2 Types of Data
1.2.1 Structured Data
1.2.2 Unstructured Data
1.2.3 Semi-Structured Data
1.3 Evolution of Big Data
1.3.1 Big Data Stage-1
1.3.2 Big Data Stage-2
1.3.3 Big Data Stage-3
1.4 Big Data Characteristics
1.4.1 Volume
1.4.2 Velocity
1.4.3 Variety
1.4.4 Veracity
1.4.5 Value
1.5 Difference between Big Data and Data Warehouse
1.6 Advantages and Disadvantages of Big Data
1.6.1 Advantages
1.6.2 Disadvantages of Big Data
1.7 Obstacles in Utilizing Big Data
1.7.1 Lack of Proper Understanding of Big Data
1.7.2 Exponential Data Growth
1.7.3 Confusion in Big Data Tool Selection
1.7.4 Securing Big Data
1.7.5 Data Quality
1.7.6 Lack of Expert Personnel
1.7.7 Applications of Big Data
1.8 Impact of Big Data
References
Chapter 2 Challenges of Big Data
2.1 Introduction
2.2 Big Data Integration
2.2.1 Issues in Data Integration
2.2.2 Approach to Data Integration
2.2.3 Data Integration Methods
2.3 Storing Big Data
2.3.1 Big Data Storage Methods
2.4 Maintaining Data Quality
2.4.1 Data Quality Dimensions
2.4.2 Data Quality Management Steps
2.5 Analysis of Big Data
2.5.1 Working Principle of Big Data Analytics
2.6 Security and Privacy Management
2.6.1 Need for Data Protection
2.6.2 Challenges in Protecting Big Data
2.6.3 Best Practices for Big Data Protection
2.7 Accessing and Sharing Information
References
Chapter 3 Big Data Analytics
3.1 Introduction
3.2 Applications of Big Data Analytics
3.2.1 Traditional Business Applications of Big Data Analytics
3.2.2 Recent Application Trends in Big Data Analytics
3.3 Types of Big Data Analytics
3.3.1 Descriptive Analytics
3.3.2 Diagnostic Analytics
3.3.3 Predictive Analytics
3.3.4 Prescriptive Analytics
3.4 Comparison of Data Analytics Stages
References
Section B Big Data Technologies
Chapter 4 Hadoop Ecosystem
4.1 Introduction
4.2 Components of the Hadoop Ecosystem
4.2.1 Data Storage
4.2.2 Data Processing
4.2.3 Data Access
4.2.4 Data Management
4.3 Data Storage Component
4.3.1 Google File System (GFS)
4.3.2 Hadoop Distributed File System (HDFS)
4.3.3 HBase
4.4 Data Processing Component
4.4.1 MapReduce
4.4.2 YARN
4.5 Data Access Component
4.5.1 Hive
4.5.2 Apache Pig
4.5.3 Apache Drill
4.5.4 Apache Sqoop
4.5.5 Apache Avro
4.5.6 Apache Mahout
4.6 Data Management Component
4.6.1 ZooKeeper
4.6.2 Oozie
4.6.3 Ambari
4.6.4 Apache Flume
4.7 Apache Spark
References
Chapter 5 NoSQL Databases
5.1 Introduction
5.1.1 Features of NoSQL
5.1.2 Difference between NoSQL and SQL
5.2 Types of NoSQL Databases
5.2.1 Types of NoSQL Databases
5.3 Key-Value Pair Based Storage
5.4 Column-Oriented Databases
5.5 Document-Oriented Databases
5.6 Graph-Based Databases
5.7 Summary of NoSQL Databases
5.8 BASE Model of NoSQL
5.8.1 CAP Theorem
5.8.2 BASE Model
5.8.3 ACID vs BASE Model
5.9 Advantages of NoSQL
5.10 Disadvantages of NoSQL
References
Chapter 6 Data Lakes
6.1 Introduction
6.2 Data Lake Architecture
6.2.1 Transient Zone
6.2.2 Raw Zone
6.2.3 Trusted Zone
6.2.4 Refined Zone
6.3 Usage of Data Lakes
6.3.1 Facilitating Data Science and Machine Learning Capabilities
6.3.2 Centralizing, Consolidating and Cataloguing Data
6.3.3 Seamless Integration of Diverse Data Sources and Formats
6.3.4 Offering Various Self-Service Tools
6.4 Data Lake Challenges
6.4.1 Data Swamps
6.4.2 Slow Performance
6.4.3 Lack of Security Features
6.4.4 Reliability Issues
6.5 Data Lake Advantages and Disadvantages
6.6 Lake House
6.6.1 Delta Lake
6.7 Difference between Data Warehouses, Data Lakes and Lake Houses
6.8 Best Practices Regarding Data Lakes
6.8.1 Data Lake as Landing Zone
6.8.2 Data Quality
6.8.3 Reliability
6.8.4 Data Catalog
6.8.5 Security
6.8.6 Privacy
6.8.7 Data Lineage
References
Chapter 7 Deep Learning
7.1 Introduction
7.2 Deep Learning Architecture
7.2.1 Supervised Learning
7.2.2 Unsupervised Learning
7.3 Training Approaches for Deep Learning Models
7.3.1 Training from Scratch
7.3.2 Transfer Learning
7.3.3 Feature Extraction
7.4 Challenges in Deep Learning Implementation
7.4.1 Data Volume Required
7.4.2 Biasness
7.4.3 Explainability
7.5 Applications of Deep Learning
7.5.1 Healthcare Industry
7.5.2 Autonomous Vehicles
7.5.3 E-Commerce
7.5.4 Personal Assistant
7.5.5 Medical Research
7.5.6 Customer Service
7.5.7 Finance Industry
7.5.8 Industrial Automation
7.5.9 Smart Devices
7.5.10 Aerospace and Defense
7.5.11 Weather Predictions
References
Chapter 8 Blockchain
8.1 Introduction
8.2 Structure of the Blockchain
8.3 Security Features of the Blockchain
8.3.1 Block Linking
8.3.2 Consensus Mechanism
8.4 Types of Blockchain
8.4.1 Public Blockchain
8.4.2 Private Blockchain
8.4.3 Consortium Blockchain
8.4.4 Hybrid Blockchain
8.5 Blockchain Evolution
8.5.1 The First Generation (Blockchain 1.0: Cryptocurrency)
8.5.2 The Second Generation (Blockchain 2.0: Smart Contracts)
8.5.3 The Third Generation (Blockchain 3.0: DApps)
8.5.4 The Fourth Generation (Blockchain 4.0: Industry Applications)
8.6 Advantages of Blockchain
8.7 Disadvantages of Blockchain
8.7.1 Security Risk
8.7.2 Speed and Performance
8.7.3 Scalability
8.7.4 Data Modification
8.7.5 High Implementation Cost
8.8 Applications of Blockchain
8.8.1 Banking and Financial Industry
8.8.2 Healthcare industry
8.8.3 Supply Chain Management
8.8.4 Food Chain Management
8.8.5 Governance
8.8.6 Internet of Things Network Management
References
Section C Big Data Applications
Chapter 9 Big Data for Healthcare
9.1 Introduction
9.2 Benefits of Big Data Analytics in Healthcare
9.2.1 Improved Healthcare
9.2.2 Pervasive Healthcare
9.2.3 Drug Discovery
9.2.4 Reduced Cost
9.2.5 Risk Prediction
9.2.6 Early Detection of the Spread of Diseases
9.2.7 Fraud Detection and Prevention
9.2.8 Clinical Operations
9.3 Challenges in Implementing Big Data in Healthcare
9.3.1 Confidentiality and Data Security
9.3.2 Data Aggregation
9.3.3 Reliability
9.3.4 Access Control
9.3.5 Interoperability
References
Chapter 10 Big Data Analytics for Fraud Detection
10.1 Introduction
10.2 Types of Fraud
10.2.1 Insurance Fraud
10.2.2 Network Intrusion
10.2.3 Credit Card Fraud
10.2.4 Money Laundering
10.2.5 Accounting Fraud
10.2.6 Financial Markets Fraud
10.2.7 Telecommunication Fraud
10.3 Fraud Detection and Prevention
10.3.1 Traditional Fraud Detection Methods
10.3.2 Big Data Analytics for Fraud Detection
10.4 Features Used for Fraud Detection
10.5 Benefits of Big Data Analytics for Fraud Detection
10.6 Applications of Big Data Analytics for Fraud Detection
10.7 Issues in Implementing Big Data Analytics for Fraud Detection
References
Chapter 11 Big Data Analytics in Social Media
11.1 Introduction
11.2 Types of Social Media Platforms
11.3 Social Media Statistics
11.4 Big Data Analytics in Social Media
11.4.1 Analytic Techniques
11.5 Applications of Big Data Analytics in Social Media
11.5.1 Business
11.5.2 Disaster Management
11.5.3 Healthcare
11.5.4 Governance
11.6 Key Challenges in Social Media Analytics
References
Chapter 12 Novel Applications and Research Directions in Big Data Analytics
12.1 Introduction
12.2 Education Sector
12.3 Agriculture Sector
12.4 Entertainment Industry
12.5 Manufacturing
12.6 Renewable Energy
12.7 Business Applications
12.8 Financial Services
12.9 Sport
12.10 Politics
References
Index