Data Science, Analytics and Machine Learning with R

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Data Science, Analytics and Machine Learning with R explains the principles of data mining and machine learning techniques and accentuates the importance of applied and multivariate modeling. The book emphasizes the fundamentals of each technique, with step-by-step codes and real-world examples with data from areas such as medicine and health, biology, engineering, technology and related sciences. Examples use the most recent R language syntax, with recognized robust, widespread and current packages. Code scripts are exhaustively commented, making it clear to readers what happens in each command. For data collection, readers are instructed how to build their own robots from the very beginning.

In addition, an entire chapter focuses on the concept of spatial analysis, allowing readers to build their own maps through geo-referenced data (such as in epidemiologic research) and some basic statistical techniques. Other chapters cover ensemble and uplift modeling and GLMM (Generalized Linear Mixed Models) estimations, both linear and nonlinear.

Author(s): Luiz Favero, Patrícia Belfiore, Rafael de Freitas Souza
Edition: 1
Publisher: Academic Press
Year: 2023

Language: English
Commentary: true
Pages: 660

Cover
Front Matter
Copyright
Dedication
Epigraph
Overview of data science, analytics, and machine learning
Introduction
Overview of the book
Final remarks
Introduction to R-based language
Introduction
How to use this work
R-based language installation
Installing RStudio
The Script Editor
The console
The Environment, History, Connections, and Tutorial tabs
The Files, Plots, Packages, Help, and Viewer tabs
Objects
Functions and arguments
Packages
Loading datasets
Loading a dataset using the mouse
Loading a dataset using codes
Opening datasets present in R-based language
Brief notion of data manipulation
Final remarks
Supplementary data sets
Types of variables, measurement scales, and accuracy scales*
Introduction
Types of variables
Nonmetric or qualitative variables
Metric or quantitative variables
Types of variables and scales of measurement
Nonmetric variables: Nominal scale
Nonmetric variables: Ordinal scale
Quantitative variables: Interval scale
Quantitative variables: Ratio scale
Types of variables based on number of categories and scales of accuracy
Dichotomous or binary variables: Dummy
Polychotomous variables
Discrete quantitative variables
Continuous quantitative variables
Final remarks
Exercises
Univariate descriptive statistics
Introduction
Frequency distribution table
Frequency distribution table for qualitative variables
Frequency distribution table for discrete data
Frequency distribution table for continuous data grouped into classes
Graphical representation of the results
Graphical representation for qualitative variables
Bar chart
Pie chart
Pareto chart
Graphical representation for quantitative variables
Line graph
Scatter plot
Histogram
Stem-and-leaf plot
Boxplot or box-and-whisker diagram
The most common summary measures in univariate descriptive statistics
Measures of dispersion or variability
Coefficient of skewness in R
Coefficient of kurtosis in R
Final remarks
Exercises
Supplementary data sets
Bivariate descriptive statistics
Introduction
Association between two qualitative variables
Joint frequency distribution tables (Fávero and Belfiore, 2019)
Elaborating contingency tables in R
Measures of association (Fávero and Belfiore, 2019)
Chi-square statistic
Solving the chi-square statistic in R
Solution: calculating Phi, contingency, and Cramérs V coefficients in R
Spearmans coefficient (Fávero and Belfiore, 2019)
Calculating Spearmans coefficient in R
Correlation between two quantitative variables
Constructing a scatter plot in R
Solution (calculation of covariance and Pearsons correlation coefficient) in R
Final remarks
Exercises
Supplementary data sets
Hypotheses tests
Introduction
Univariate tests for normality
Solving tests for normality in R
Tests for homogeneity of variance
Solving tests for homogeneity of variance in R
Hypotheses tests regarding a population mean (μ) from one random sample
Solving the z-test and the Students t-test for a single sample in R
Students t-test to compare two population means from two independent random samples
Solving the Students t-test for two independent samples in R
Students t-test to compare two population means from two paired random samples
Solving Students t-test for two paired samples in R
Analysis of variance to compare the means of more than two populations
Factorial ANOVA test
Final remarks
Exercises
Supplementary data sets
Data visualization and multivariate graphs
Introduction
The library ggplot2
Bar chart with ggplot2
Pareto chart with ggplot2
Line graph with ggplot2
Scatter plot with ggplot2
Histogram with ggplot2
Boxplot with ggplot2
Final remarks
Exercises
Appendix
Main colors and color range accepted by R-based language
Pie charts with ggplot2 and an easier solution
Supplementary data sets
Webscraping and handcrafted robots
Introduction
CSS selector and XPATH
The tool SelectorGadget
The library rvest
Example 1: Using the Function HTML_TEXT()
Example 2: Using the Function html_table()
The library RSelenium
Requirements necessary for using RSelenium
Creating a robot with RSelenium
Final remarks
Exercises
Using application programming interfaces to collect data
Introduction
Verbs about API
Example 1: Who is in the space stations?
Example 2: Where is the ISS now?
Example 3: When will the ISS fly over a certain point on the globe?
Example 4: Health indicators of the World Health Organization
Final remarks
Exercises
Managing data
Introduction
The operator %>%
The function rename()
The function mutate()
The function filter()
The function arrange()
The function group_by()
The function select()
The function summarise()
The functions separate() and unite()
The functions gather() and spread()
Join functions
The function left_join()
The function right_join()
The function full_join()
The function inner_join()
The functions semi_join() and anti_join()
Final remarks
Exercise
Supplementary data sets
Cluster analysis
Cluster analysis with hierarchical and nonhierarchical agglomeration schedules in R
Elaborating hierarchical agglomeration schedules in R
Elaborating nonhierarchical k-means agglomeration schedules in R
Final remarks
Exercise
Supplementary data sets
Principal component factor analysis
Principal component factor analysis in R
Final remarks
Exercise
Supplementary data sets
Simple and multiple correspondence analysis
Applications in R
Correspondence Analysis
Multiple correspondence analysis
Final remarks
Exercises
Appendix
Supplementary data sets
Simple and multiple regression models
Estimation of regression models in R
Estimation of a simple linear regression model in R
Estimation of a multiple linear regression model in R
Final remarks
Exercises
Supplementary data sets
Binary and multinomial logistic regression models
Estimation of binary and multinomial logistic regression models in R
Estimation of a binary logistic regression model in R
Estimation of a multinomial logistic regression model in R
Final remarks
Exercises
Supplementary data sets
Count-data and zero-inflated regression models
Estimating regression models for count data in R
Poisson regression model in R
Negative binomial regression model in R
Zero-inflated Poisson regression model in R
Zero-inflated negative binomial regression model in R
Final remarks
Exercise
Supplementary data sets
Generalized linear mixed models
Estimation of hierarchical linear models in R
Estimation of a two-level hierarchical linear model (HLM2) with clustered data in R
Final remarks
Exercise
Supplementary data sets
Support vector machines
Introduction
Separating hyperplanes
Maximal margin classifiers
Support vector classifiers
Support vector machines
Support vector machines in R
Construction of a support vector machine classification plot in R
Support vector machines application with a linear kernel in R
Training and validation samples, tuning, and other support vector machine estimations in R
Comparison of SVM models performance to a binary logistic regression model
Final remarks
Exercise
Supplementary data sets
Classification and regression trees
Introduction
CARTs estimation methods
The entropy of information
The Gini index
Variance
Overfitting
Pruning
Hyperparameters
Estimating CART models in R
Classification trees in R
Regression trees in R
Final remarks
Exercises
Supplementary data sets
Boosting and bagging
Introduction
Boosting
Main hyperparameters for boosting
Number of trees
Learning rate
Tree depth
Minimum number of observations in leaf nodes
Subsampling
Bagging
Main hyperparameters for bagging
Number of trees
Minimum number of observations in leaf nodes
Boosting and bagging applications in R
Boosting in R
Bagging in R
Final remarks
Exercise
Supplementary data sets
Random forests
Introduction
Random forests
Hyperparameters
The number of predictive variables selected at each iterative step
The number of model iterations
Random forests applications in R
Final remarks
Exercise
Supplementary data sets
Artificial neural networks
Introduction
Artificial neural networks
Activation functions and estimations of the ouput values of each layer
Linear activation function
Sigmoid or logistic activation function
Hyperbolic tangent activation function
Softmax activation function
Softplus activation function
Rectifier linear unit activation function
Demonstration of calculations of layer output values
Method of calculation of estimation errors for iteration feeding
Hyperparameters
Defining an activation function
Choosing a number of hidden layers
Defining the number of neurons in hidden layers
Learning rate
The threshold to evaluate the misclassification rate
The number of iterations
Artificial neural networks applications in R
Estimation of an artificial neural network for a metric-type phenomenon
Estimation of an artificial neural network for a categorical type phenomenon
Final remarks
Exercise
Supplementary data sets
Working on shapefiles
Introduction
Using shapefiles
Carring a shapefile
Incorporating information into a shapefile
Plotting information from a dataset on a map
Dismembering shapefiles
Joining shapefiles
Final considerations
Supplementary data sets
Dealing with simple feature objects
Introduction
Working with simple features
Creating a simple feature object
Using layers in simple feature objects
Combining simple feature objects with shapefiles
Using R like geographic information systems software
Buffering
Buffer union
Kernel densities
Combining simple feature layers and objects in search of insight
Example of using a robot to capture space data
Final considerations
Supplementary data sets
Raster objects
Introduction
Loading a raster file
Plotting the raster file information
Combining a raster object with a shapefile
Loading raster objects entirely into the computers RAM
Cutting out raster objects
Cutting out raster objects with the aid of a mouse
Cutting raster objects with vector aid
Final considerations
Exploratory spatial analysis
Introduction
Establishing neighborhoods
Contiguity spatial weights matrix W
Geographic proximity spatial weights matrix W
k-Nearest neighbors spatial weights matrix W
Socioeconomic proximity spatial weights matrix W
Standardization of matrices
Row standardization of the matrix W
Double standardization of the matrix W
Variance stabilizing of the matrix W
Techniques for verification of spatial autocorrelation
Global autocorrelation: Morans I statistic
Moran scatter plot
Local autocorrelation: The local Morans statistic
Local autocorrelation: The Getis and Ords G statistic
Final remarks
Exercise
Supplementary data sets
Enhanced and interactive graphs
Introduction
The library plotly
Scatter plot with plotly
Line graph with plotly
Bar chart with plotly
Pareto chart with plotly
Histogram with plotly
Boxplot with plotly
Pie charts with plotly
Final remarks
Exercises
Supplementary data sets
Dashboards with R
Introduction
First steps in the library shiny
Creating the first dashboard in the library shiny
Reactive programming
Construction of a complex dashboard
First step: Preparing the ui.R and server.R Scripts
Second step: Introducing the dataset
Third step: Introducing univariate descriptive statistics and frequency tables of the dataset variables
Fourth step: Variable distributions graphics
Fifth step: Including a predictive model
Final remarks
Exercise
Supplementary data sets
References
Answers
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 21
Chapter 22
Chapter 26
Chapter 27
Index