Machine Learning Toolbox for Social Scientists: Applied Predictive Analytics with R

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Machine Learning Toolbox for Social Scientists covers predictive methods with complementary statistical “tools” that make it mostly self-contained. The inferential statistics is the traditional framework for most data analytics courses in social science and business fields.

Author(s): Yigit Aydede
Publisher: CRC Press LLC
Year: 2023

Language: English
Pages: 601

Contents

Preface

1 How We Define Machine Learning

2 Preliminaries

2.1 Data and Dataset Types

2.1.1 Cross-Sectional

2.1.2 Time-Series

2.1.3 Panel

2.2 Plots

2.3 Probability Distributions with R

2.4 Regressions

2.4.1 Ordinary Least Squares (OLS)

2.4.2 Maximum Likelihood Estimators

2.4.3 Estimating MLE with R

2.5 BLUE

2.6 Modeling the Data

2.7 Causal vs. Predictive Models

2.7.1 Causal Models

2.7.2 Predictive Models

2.8 Simulation

Part 1 Formal Look at Prediction

3 Bias-Variance Tradeoff

3.1 Estimator and MSE

3.2 Prediction - MSPE

3.3 Biased Estimator as a Predictor

3.4 Dropping a Variable in a Regression

3.5 Uncertainty in Estimations and Predictions

3.6 Prediction Interval for Unbiased OLS Predictor

4 Overfitting

Part 2 Nonparametric Estimations

5 Parametric Estimations

5.1 Linear Probability Models (LPM)

5.2 Logistic Regression

5.2.1 Estimating Logistic Regression

5.2.2 Cost Functions

5.2.3 Deviance

5.2.4 Predictive Accuracy

6 Nonparametric Estimations - Basics

6.1 Density Estimations

6.2 Kernel Regressions

6.3 Regression Splines

6.4 MARS - Multivariate Adaptive Regression Splines

6.5 GAM - Generalized Additive Model

7 Smoothing

7.1 Using Bins

7.2 Kernel Smoothing

7.3 Locally Weighted Regression loess()

7.4 Smooth Spline Regression

7.5 Multivariate Loess

8 Nonparametric Classifier - kNN

8.1 mnist Dataset

8.2 Linear Classifiers (again)

8.3 k-Nearest Neighbors

8.4 kNN with Caret

8.4.1 mnist 27

8.4.2 Adult Dataset

Part 3 Self-Learning

9 Hyperparameter Tuning

9.1 Training, Validation, and Test Datasets

9.2 Splitting the Data Randomly

9.3 k-Fold Cross-Validation

9.4 Cross-Validated Grid Search

9.5 Bootstrapped Grid Search

9.6 When the Data Is Time-Series

9.7 Speed

10 Tuning in Classification

10.1 Confusion Matrix

10.2 Performance Measures

10.3 ROC Curve

10.4 AUC - Area Under the Curve

11 Classification Example

11.1 LPM

11.2 Logistic Regression

11.3 kNN

11.3.1 kNN Ten-Fold CV

11.3.2 kNN with caret

Part 4 Tree-Based Models

12 CART

12.1 CART - Classification Tree

12.2 rpart - Recursive Partitioning

12.3 Pruning

12.4 Classification with Titanic

12.5 Regression Tree

13 Ensemble Learning

13.1 Bagging

13.2 Random Forest

13.3 Boosting

13.3.1 Sequential Ensemble with gbm

13.3.2 AdaBoost

13.3.3 XGBoost

14 Ensemble Applications

14.1 Classification

14.2 Regression

14.3 Exploration

14.4 Boosting Applications

14.4.1 Regression

14.4.2 Random Search with Parallel Processing

14.4.3 Boosting vs. Others

14.4.4 Classification

14.4.5 AdaBoost.M1

14.4.6 Classification with XGBoost

Part 5 SVM & Neural Networks

15 Support Vector Machines

15.1 Optimal Separating Classifier

15.1.1 The Margin

15.1.2 The Non-Separable Case

15.2 Nonlinear Boundary with Kernels

15.3 Application with SVM

16 Artificial Neural Networks

16.1 Neural Network - The Idea

16.2 Backpropagation

16.3 Neural Network - More Inputs

16.4 Deep Learning

Part 6 Penalized Regressions

17 Ridge

18 Lasso

19 Adaptive Lasso

20 Sparsity

20.1 Lasso

20.2 Adaptive Lasso

Part 7 Time Series Forecasting

21 ARIMA Models

21.1 Hyndman–Khandakar Algorithm

21.2 TS Plots

21.3 Box–Cox Transformation

21.4 Stationarity

21.5 Modeling ARIMA

22 Grid Search for ARIMA

23 Time Series Embedding

23.1 VAR for Recursive Forecasting

23.2 Embedding for Direct Forecast

24 Random Forest with Times Series

24.1 Univariate

24.2 Multivariate

24.3 Rolling and Expanding Windows

25 Recurrent Neural Networks

25.1 Keras

25.2 Input Tensors

25.3 Plain RNN

25.4 LSTM

Part 8 Dimension Reduction Methods

26 Eigenvectors and Eigenvalues

27 Singular Value Decomposition

28 Rank(r) Approximations

29 Moore-Penrose Inverse

30 Principal Component Analysis

31 Factor Analysis

Part 9 Network Analysis

32 Fundamentals

32.1 Covariance

32.2 Correlation

32.3 Precision Matrix

32.4 Semi-Partial Correlation

33 Regularized Covariance Matrix

33.1 Multivariate Gaussian Distribution

33.2 High-Dimensional Data

33.3 Ridge (ℓ2) and Glasso (ℓ1)

Part 10 R Labs

34 R Lab 1 Basics

34.1 R, RStudio, and R Packages

34.2 RStudio

34.3 Working Directory

34.4 Data Types and Structures

34.5 Vectors

34.6 Subsetting Vectors

34.7 Vectorization or Vector Operations

34.8 Matrices

34.9 Matrix Operations

34.10 Subsetting Matrix

34.11 R-Style Guide

35 R Lab 2 Basics II

35.1 Data Frames and Lists

35.1.1 Lists

35.1.2 Data Frames

35.1.3 Subsetting Data Frames

35.1.4 Plotting from Fata Frame

35.1.5 Some Useful Functions

35.1.6 Categorical Variables in Data Frames

35.2 Programming Basics

35.2.1 If/Else

35.2.2 Loops

35.2.3 The apply() Family

35.2.4 Functions

36 Simulations in R

36.1 Sampling in R: sample()

36.2 Random Number Generating with Probability Distributions

36.3 Simulation for Statistical Inference

36.4 Creating Data with a Data Generating Model (DGM)

36.5 Bootstrapping

36.6 Monty Hall – Fun Example

36.6.1 Here Is the Simple Bayes Rule

36.6.2 Simulation to Prove It

Appendix 1: Algorithmic Optimization

A.1 Brute-Force Optimization

A.2 Derivative-Based Methods

A.3 ML Estimation with Logistic Regression

A.4 Gradient Descent Algorithm

A.4.1 One-Variable

A.4.2 Adjustable lr and SGD

A.4.3 Multivariable

A.5 Optimization with R

Appendix 2: Imbalanced Data

A.1 SMOTE

A.2 Fraud Detection

Bibliography

Index