Building upon the knowledge introduced in The Data Science Framework, this book provides a comprehensive and detailed examination of each aspect of Data Analytics, both from a theoretical and practical standpoint. The book explains representative algorithms associated with different techniques, from their theoretical foundations to their implementation and use with software tools.
Author(s): Juan J. Cuadrado-Gallego, Yuri Demchenko
Publisher: BPB Publications
Year: 2023
Language: English
Pages: 729
Preface
Contents
List of Figures
Introduction to Data Science and Data Analytics
About Data Science
About the EDISON Project and Data Science Framework
The EDISON Project
The EDISON Data Science Framework (EDSF)
About Data Analytics
Data Analytics Competences
Data Analytics Body of Knowledge
Data Analytics Model Curriculum Approach
Data Analytics Professional Profiles
About This Book
Data
A. Theory
Introduction
Characteristic
Definition of Characteristic
Types of Characteristics
Data
Definition of Data
Types of Data from Their Nature
Types of Data from Their Storage
Available Data
Experiment
Data Population
Data Sample
Data Quality
Frequency
Definition of Frequency
Types of Frequency
Frequency of Grouped Data
Frequency Distribution
Mode
Mean
Definition of Mean
Arithmetic Mean
Variance and Standard Deviation
Median
Range
Median
Quantiles
Quantiles Range
B. Computer-Based Solving
R Project
Website of the R Project
Download
Software
Documentation
R Project
R Foundation
Help with R
Documentation
Links
R Graphical User Interface, RGUI
R Installation
Starting to Work with the RGui
Data Exercises Solved with R
C. Data Exercises Solved
Hand-Made Exercises
Exercises Solved in R
Annex. Data Extended Concepts
Frequency
Absolute Frequency
Relative Frequency
Cumulative Frequency
Frequency Distribution
Mean
Geometric Mean
Harmonic Mean
Potential Mean
Mean Deviation
Probability
A. Theory
Introduction
Event
Sets Theory Axioms and Operations
Laplace or Classic Probability
Bayesian Probability
Probability Distribution of Random Variables
Random Variable
Probability Distributions
Discrete Probability Distributions
Bernoulli Probability Distribution
Binomial Probability Distribution
Geometric Probability Distribution
Poisson Probability Distribution
Continuous Probability Distributions
Normal Distribution
Pearson Chi-Squared Distribution
t-Student Distribution
F of Fisher Distribution
B. Computer-Based Solving
Probability Exercises solved in R
C. Probability Exercises Solved
Hand-Made Exercises
Exercises Solved in R
Annex: Probability Extended Concepts
Axiomatic Probability of Kolmogorov
Anomaly Detection
A. Theory
Introduction
Anomaly Detection Based on Statistics
Anomaly Detection Based on the Mean and Standard Deviation
Anomaly Detection Based on the Quartiles
Anomaly Detection Based on the Standard Error of the Residuals
Anomaly Detection Based on Proximity
K-Nearest Neighbor Algorithm
Anomaly Detection Based on Density
Simplified Local Outlier Factor Algorithm
B. Computer-Based Solving
R Packages
R Default Packet Loading
Loading Packages from the R Standard Library
Install and Load Other R Packages
Modifying the Default Packet Load of R
Anomaly Detection Exercises Solved in R
Anomaly Detection Based on Statistics: Mean and Standard Deviation
Anomaly Detection Based on Statistics: Quartiles
Anomaly Detection Based on the Standard Error of the Residuals
Anomaly Detection Based on Proximity: K-Nearest Neighbor Algorithm
Anomaly Detection Based on Density: Simplified Local Outlier Factor, LOF
C. Anomaly Detection Exercises Solved
Hand Made Exercises
Exercises Solved in R
Unsupervised Classification
A. Theory
Introduction
Unsupervised Classification Based on Distances
K-Means Algorithm
Agglomerative Hierarchical Clustering
B. Computer-Based Solving
RStudio
Download of RStudio
Installation of RStudio
Getting Started with RStudio
Unsupervised Classification Exercises Solved in R
Unsupervised Classification with the K-Means Algorithm
Agglomerative Hierarchical Clustering
C. Unsupervised Classification Exercises Solved
Handmade Exercises
Exercises Solved in R
Supervised Classification
A. Theory
Introduction
Decision Trees
Optimizing the Construction of a Decision Tree: ID3 Algorithm
Information Gain
Entropy
Optimizing the Construction of a Decision Tree: CART Algorithm
Gini
Optimizing the Construction of a Decision Tree: Error Algorithm
Optimizing the Construction of a Decision Tree: Other Approaches
Neural Networks
Two-Layer Artificial Neural Network or Perceptron: Rosenblatt Algorithm
Naïve Bayes
Qualitative Characteristics
Quantitative Characteristics
Regression Functions
Linear Regression of Polynomials (or Linear Fit) for Two Events
Linear Regression of Polynomials (or Linear Fit) for Three Events
Linear Regression of Polynomials (or Linear Fit) for K Events
No Linear Regression of Polynomials (or Linear Fit) for 2 Events
Events of Dimension k
No Linear Regression of No Polynomials (or Linear Fit) for 2 Events
Exponential Nonlinear Fit
Geometric Nonlinear Fit
Linear Regression: Validity Analysis
Standard Error of the Residuals
B. Computer-Based Solving
Supervised Classification Exercises Solved in R
C. Supervised Classification Analysis Exercises Solved
Hand-Made Exercises
Exercises Solved in R
Association
A. Theory
Introduction
Analysis of the Association of Events Composed by a Single Elementary Event
Support
Confidence
Contingency
Correlation
Analysis of the Association of Events Composed by More Than One Elementary Event
Apriori Algorithm
Step A
Step A.1
Step A.2.
Step A.2.1
Step A.2.2
Step A.2.2.1
Step A.2.2.2
Step A.2.2.3
Step B
B. Computer-Based Solving
Exercises of Association Analysis Solved in R
C. Association Analysis Exercises Solved
Handmade Exercises
Exercises Solved in R
Bibliography