Applied Linear Regression for Business Analytics with R introduces regression analysis to business students using the R programming language with a focus on illustrating and solving real-time, topical problems. Specifically, this book presents modern and relevant case studies from the business world, along with clear and concise explanations of the theory, intuition, hands-on examples, and the coding required to employ regression modeling. Each chapter includes the mathematical formulation and details of regression analysis and provides in-depth practical analysis using the R programming language.
Author(s): Daniel P. McGibney
Series: International Series in Operations Research & Management Science, 337
Publisher: Springer
Year: 2023
Language: English
Pages: 285
City: Cham
Preface
Acknowledgments
Contents
About the Author
1 Introduction
1.1 Introduction
1.2 History
1.3 Linear Regression, Machine Learning, and Data Science
1.4 Case Studies
1.5 R Versus Python
1.6 R Installation
1.6.1 R (Programming Language)
1.6.2 RStudio IDE
1.7 Book Organization
2 Basic Statistics and Functions Using R
2.1 Introduction
2.2 Basic Statistics
2.3 Sales Calls Application: Basic Statistics
2.4 Data Input and Dataframes in R
2.4.1 Variable Assignment
2.4.2 Basic Operations in R
2.4.3 The c Function
2.4.4 The data.frame Function
2.4.5 The read.csv Function
2.4.6 Indexing Vectors
2.4.7 Indexing Dataframes
2.5 Accessing the Objects of a Dataframe in R
2.5.1 The Head Function
2.5.2 The str Function
2.6 Basic Statistics in R
2.6.1 The Summary Function
2.6.2 The Sum Function
2.6.3 The Mean Function
2.6.4 The sd Function
2.7 Sales Calls Application: Basic Statistics in R
2.8 Plotting in R
2.8.1 Scatterplots
2.8.2 The Plot Function
2.8.3 Histograms
2.8.4 The Hist Function
2.8.5 Boxplots
2.8.6 The Boxplot Function
2.9 Sales Calls Application: Plotting Using R
2.10 Case Study: Top Companies
2.10.1 Problem Statement
2.10.2 Data Description
2.10.3 Scatterplots
2.10.4 Histograms
2.10.5 Case Conclusion
Problems
3 Regression Fundamentals
3.1 Introduction
3.2 Covariance
3.3 Correlation Coefficient
3.4 Coefficient of Determination
3.5 Sales Calls Application: Variable Correlations
3.6 Least Squares Criterion
3.7 Sales Calls Application: Simple Regression
3.8 Interpolation and Extrapolation
3.9 Sales Calls Application: Prediction
3.10 Explained Deviation
3.11 Case Study: Accounting Analytics
3.11.1 Problem Statement
3.11.2 Correlation and Scatterplot
3.11.3 Linear Regression Modeling
3.11.4 Audit Scenarios
3.11.5 Case Conclusion
Problems
4 Simple Linear Regression
4.1 Introduction
4.2 Simple Linear Regression Model
4.3 Model Assumptions
4.4 Model Variance
4.5 Application: Stock Revenues
4.6 Hypothesis Testing
4.6.1 The qt Function
4.6.2 The pt Function
4.7 Application: Using the pt and qt Functions
4.8 Hypothesis Testing: Student's t-Test
4.9 Employee Churn Application: Testing for Significance with t
4.10 Coefficient Confidence Interval
4.11 Employee Churn Application: Confidence Interval Hypothesis Testing
4.12 Hypothesis Testing: F-test
4.13 The qf Function
4.14 The pf Function
4.15 Employee Churn Application: Testing for Significance with F
4.16 Cautions About Statistical Significance
4.17 Case Study: Stock Betas
4.17.1 Problem Statement
4.17.2 Descriptive Statistics
4.17.3 Plots and Graphs
4.17.4 Finding Beta Values
4.17.5 Finding All the Betas
4.17.6 Recommendations and Findings
4.17.7 Case Conclusion
Problems
5 Multiple Regression
5.1 Introduction
5.2 Multiple Regression Model
5.3 Multiple Regression Equation
5.4 Website Marketing Application: Modeling
5.5 Significance Testing: t
5.6 Coefficient Interpretation
5.7 Website Marketing Application: Individual Significance Tests
5.8 Significance Testing: F
5.9 Multiple R^2 and Adjusted R^2
5.10 Website Marketing Application: Multiple R^2 and Adjusted R^2
5.11 Correlations in Multiple Regression
5.12 Case Study: Real Estate
5.12.1 Problem Statement
5.12.2 Data Description
5.12.3 Simple Linear Regression Models
5.12.4 Multiple Regression Model
Model Interpretation
Coefficient Interpretation
Confidence Interval
5.12.5 Case Conclusion
Problems
6 Estimation Intervals and Analysis of Variance
6.1 Introduction
6.2 Expected Value
6.3 Confidence Interval
6.4 House Prices Application: Confidence Interval
6.5 Prediction Interval
6.6 House Price Application: Prediction Interval
6.7 Confidence Intervals verse Prediction Intervals
6.8 Analysis of Variance
6.8.1 Mean of Squares Due to Regression
6.8.2 Mean Squared Error
6.8.3 The F-Statistic
6.9 ANOVA Table
6.10 House Price Application: ANOVA Table
6.11 Generalized F Statistic
6.12 Case Study: Employee Retention Modeling
6.12.1 Problem Statement
6.12.2 Data Description
6.12.3 Multiple Regression Model
Model Interpretation
Coefficient Interpretation
ANOVA Table
6.12.4 Predictions
Prediction Interval
Confidence Intervals
6.12.5 Case Conclusion
Problems
7 Predictor Variable Transformations
7.1 Introduction
7.2 Categorical Variables
7.3 Employee Salary Application: Dummy Variables
7.4 Employee Salary Application: Dummy Variables 2
7.5 Multilevel Categorical Variables
7.6 Employee Salary Application: Dummy Variables with Multiple Levels
7.7 Coding Dummy Variables
7.8 Employee Salary Application: Dummy Variable Coding
7.9 Modeling Curvilinear Relationships
7.10 Sales Performance Application: Quadratic Modeling
7.11 Mean-Centering
7.12 Marketing Toys Application: Mean-Centering
7.13 General Linear Regression Model
7.14 Interactions
7.15 Marketing Toys Application: Interactions
7.16 Case Study: Social Media
7.16.1 Problem Statement
7.16.2 Data Description
7.16.3 Promoter A Model
7.16.4 Promoter B Model
7.16.5 Combined Model
7.16.6 Case Conclusion
Problems
8 Model Diagnostics
8.1 Introduction
8.2 Multiple Regression Model Revisited
8.3 Model Assumptions
8.4 Violations of the Model Assumptions
8.5 Residual Analysis
8.6 Sales Performance Application: Residual Analysis
8.7 Constant Variance
8.8 Twitter Application: Residual Variance
8.9 Response Variable Transformations
8.10 Logarithmic Transformations
8.11 Other Response Variable Transformations
8.12 Box–Cox Transformation
8.13 Twitter Application: Box–Cox
8.14 Assessing Normality
8.15 Assessing Independence
8.16 Outliers and Influential Observations
8.17 Residuals and Leverage
8.17.1 Leverage
8.17.2 Standardized Residuals
8.17.3 Studentized Residuals
8.17.4 Cook's Distance
8.18 Case Study: Lead Generation
8.18.1 Problem Statement
8.18.2 Data Description
8.18.3 Revenue by Lead Generation Method
8.18.4 Revenue by Dealership
8.18.5 Sales Versus Radio Ads
8.18.6 Sales Versus Robocalls
8.18.7 Sales Versus Emails
8.18.8 Sales Versus Cold-Calls
8.18.9 Recommendations and Findings
8.18.10 Case Conclusion
Problems
9 Variable Selection
9.1 Introduction
9.2 Parsimonious Models
9.3 Airbnb Pricing Application
9.4 Assessing Model Performance
9.4.1 Multiple R-Squared and Adjusted R-Squared
9.4.2 Akaike Information Criterion
9.4.3 Bayesian Information Criterion
9.4.4 Mallows's C_p
9.5 Airbnb Pricing Application: Model Comparison
9.6 Backward Elimination
9.7 Airbnb Pricing Application: Backward Elimination
9.8 Forward Selection
9.9 Airbnb Pricing Application: Forward Selection
9.10 Stepwise Regression
9.11 Airbnb Pricing Application: Stepwise Regression
9.12 Best Subsets Regression
9.13 Airbnb Pricing Application: Best Subsets Regression 1
9.14 Airbnb Pricing Application: Best Subsets Regression 2
9.15 Stepwise and Best Subsets Regression
9.16 Case Study: Cancer Treatment Cost Analysis
9.16.1 Problem Statement
9.16.2 Data Description
9.16.3 Preliminary Analysis
9.16.4 Revised Analysis
9.16.5 Regression Modeling
Best Subsets Model
Coefficient Interpretation
Charges Across BMI Categories
Income Across Plans
9.16.6 Recommendations and Findings
9.16.7 Case Conclusion
Problems
A Installing Packages
A.1 Installation of the ggplot2 Package
A.2 Loading in the ggplot2 Package
A.3 Additional Installation Methods
B The quantmod Package
B.1 Data Source
B.2 Downloading a Single Stock or Index
B.3 Plotting Stock Prices
B.4 Multiple Stock Download
B.5 Calculate Stock Returns
B.6 Create a Dataframe
Bibliography