Data Analysis in Medicine and Health using R

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

In medicine and health, data are analyzed to guide treatment plans, patient care and control and prevention policies. However, in doing so, researchers in medicine and health often lack the understanding of data and statistical concepts and the skills in programming. In addition, there is also an increasing demand for data analyses to be reproducible, along with more complex data that require cutting-edge analysis. This book provides readers with both the fundamental concepts of data and statistical analysis and modeling. It also has the skills to perform the analysis using the R programming language, which is the lingua franca for statisticians. The topics in the book are presented in a sequence to minimize the time to help readers understand the objectives of data and statistical analysis, learn the concepts of statistical modeling and acquire the skills to perform the analysis. The R codes and datasets used in the book will be made available on GitHub for easy access. The book will also be live on the website bookdown.org, a service provided by RStudio, PBC, to host books written using the bookdown package in the R programming language.

Author(s): Kamarul Imran Musa, Wan Nor Arifin Wan Mansor, Tengku Muhammad Hanis
Series: Analytics and AI for Healthcare
Publisher: CRC Press/Chapman & Hall
Year: 2023

Language: English
Pages: 309
City: Boca Raton

Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Contents
Preface
1. R, RStudio and RStudio Cloud
1.1. Objectives
1.2. Introduction
1.3. RStudio IDE
1.4. RStudio Cloud
1.4.1. The RStudio Cloud registration
1.4.2. Register and log in
1.5. Point and Click Graphical User Interface (GUI)
1.6. RStudio Server
1.7. Installing R and RStudio on Your Local Machine
1.7.1. Installing R
1.7.2. Installing RStudio IDE
1.7.3. Checking R and RStudio installations
1.7.4. TinyTeX, MiKTeX or MacTeX (for Mac OS) and TeX live
1.8. Starting Your RStudio
1.8.1. Console tab
1.8.2. Files, plots, packages, help and viewer pane
1.8.3. Environment, history, connection and build pane
1.8.4. Source pane
1.9. Summary
2. R Scripts and R Packages
2.1. Objectives
2.2. Introduction
2.3. Open a New Script
2.3.1. Our first R script
2.3.2. Function, argument and parameters
2.3.3. If users require further help
2.4. Packages
2.4.1. Packages on CRAN
2.4.2. Checking availability of R package
2.4.3. Install an R package
2.5. Working Directory
2.5.1. Starting a new R job
2.5.2. Creating a new R project
2.5.3. Location for dataset
2.6. Upload Data to RStudio Cloud
2.7. More Resources on RStudio Cloud
2.8. Guidance and Help
2.9. Bookdown
2.10. Summary
3. RStudio Project
3.1. Objectives
3.2. Introduction
3.3. Dataset Repository on GitHub
3.4. RStudio Project on RStudio or Posit Cloud
3.5. RStudio Project on Local Machine
3.6. Summary
4. Data Visualization
4.1. Objectives
4.2. Introduction
4.3. History and Objectives of Data Visualization
4.4. Ingredients for Good Graphics
4.5. Graphics Packages in R
4.6. The ggplot2 Package
4.7. Preparation
4.7.1. Create a new RStudio project
4.7.2. Important questions before plotting graphs
4.8. Read Data
4.9. Load the Packages
4.10. Read the Dataset
4.11. Basic Plots
4.12. More Complex Plots
4.12.1. Adding another variable
4.12.2. Making subplots
4.12.3. Overlaying plots
4.12.4. Combining different plots
4.12.5. Statistical transformation
4.12.6. Customizing title
4.12.7. Choosing themes
4.12.8. Adjusting axes
4.13. Saving Plots
4.14. Summary
5. Data Wrangling
5.1. Objectives
5.2. Introduction
5.2.1. Definition of data wrangling
5.3. Data Wrangling with dplyr Package
5.3.1. dplyr package
5.3.2. Common data wrangling processes
5.3.3. Some dplyr functions
5.4. Preparation
5.4.1. Create a new project or set the working directory
5.4.2. Load the libraries
5.4.3. Datasets
5.5. Select Variables, Generate New Variable and Rename Variable
5.5.1. Select variables using dplyr::select()
5.5.2. Generate new variable using mutate()
5.5.3. Rename variable using rename()
5.6. Sorting Data and Selecting Observation
5.6.1. Sorting data using arrange()
5.6.2. Select observation using filter()
5.7. Group Data and Get Summary Statistics
5.7.1. Group data using group_by()
5.7.2. Summary statistic using summarize()
5.8. More Complicated dplyr Verbs
5.9. Data Transformation for Categorical Variables
5.9.1. forcats package
5.9.2. Conversion from numeric to factor variables
5.9.3. Recoding variables
5.9.4. Changing the level of categorical variable
5.10. Additional Resources
5.11. Summary
6. Exploratory Data Analysis
6.1. Objectives
6.2. Introduction
6.3. EDA Using ggplot2 Package
6.3.1. Usage of ggplot2
6.4. Preparation
6.4.1. Load the libraries
6.4.2. Read the dataset into R
6.5. EDA in Tables
6.6. EDA with Plots
6.6.1. One variable: Distribution of a categorical variable
6.6.2. One variable: Distribution of a numerical variable
6.6.3. Two variables: Plotting a numerical and a categorical variable
6.6.4. Three variables: Plotting a numerical and two categorical variables
6.6.5. Faceting the plots
6.6.6. Line plot
6.6.7. Plotting means and error bars
6.6.8. Scatterplot with fit line
6.7. Summary
7. Linear Regression
7.1. Objectives
7.2. Introduction
7.3. Linear Regression Models
7.4. Prepare R Environment for Analysis
7.4.1. Libraries
7.4.2. Dataset
7.5. Simple Linear Regression
7.5.1. About simple linear regression
7.5.2. Data exploration
7.5.3. Univariable analysis
7.5.4. Model fit assessment
7.5.5. Presentation and interpretation
7.6. Multiple Linear Regression
7.6.1. About multiple linear regression
7.6.2. Data exploration
7.6.3. Univariable analysis
7.6.4. Multivariable analysis
7.6.5. Interaction
7.6.6. Model fit assessment
7.6.7. Presentation and interpretation
7.7. Prediction
7.8. Summary
8. Binary Logistic Regression
8.1. Objectives
8.2. Introduction
8.3. Logistic Regression Model
8.4. Dataset
8.5. Logit and Logistic Models
8.6. Prepare Environment for Analysis
8.6.1. Creating a RStudio project
8.6.2. Loading libraries
8.7. Read Data
8.8. Explore Data
8.9. Estimate the Regression Parameters
8.10. Simple Binary Logistic Regression
8.11. Multiple Binary Logistic Regression
8.12. Convert the Log Odds to Odds Ratio
8.13. Making Inference
8.14. Models Comparison
8.15. Adding an Interaction Term
8.16. Prediction from Binary Logistic Regression
8.16.1. Predict the log odds
8.16.2. Predict the probabilities
8.17. Model Fitness
8.18. Presentation of Logistic Regression Model
8.19. Summary
9. Multinomial Logistic Regression
9.1. Objectives
9.2. Introduction
9.3. Examples of Multinomial Outcome Variables
9.4. Models for Multinomial Outcome Data
9.5. Estimation for Multinomial Logit Model
9.5.1. Log odds and odds ratios
9.5.2. Conditional probabilities
9.6. Prepare Environment
9.6.1. Load libraries
9.6.2. Dataset
9.6.3. Read data
9.6.4. Data wrangling
9.6.5. Create new categorical variable from fbs
9.6.6. Exploratory data analysis
9.6.7. Confirm the order of cat_fbs
9.7. Estimation
9.7.1. Single independent variable
9.7.2. Multiple independent variables
9.7.3. Model with interaction term between independent variables
9.8. Inferences
9.9. Interpretation
9.10. Prediction
9.11. Presentation of Multinomial Regression Model
9.12. Summary
10. Poisson Regression
10.1. Objectives
10.2. Introduction
10.3. Prepare R Environment for Analysis
10.3.1. Libraries
10.4. Poisson Regression for Count
10.4.1. About Poisson regression for count
10.4.2. Dataset
10.4.3. Data exploration
10.4.4. Univariable analysis
10.4.5. Multivariable analysis
10.4.6. Interaction
10.4.7. Model fit assessment
10.4.8. Presentation and interpretation
10.4.9. Prediction
10.5. Poisson Regression for Rate
10.5.1. About Poisson regression for rate
10.5.2. Dataset
10.5.3. Data exploration
10.5.4. Univariable analysis
10.5.5. Multivariable analysis
10.5.6. Interaction
10.5.7. Model fit assessment
10.5.8. Presentation and interpretation
10.6. Quasi-Poisson Regression for Overdispersed Data
10.7. Summary
11. Survival Analysis: Kaplan–Meier and Cox Proportional Hazard (PH) Regression
11.1. Objectives
11.2. Introduction
11.3. Types of Survival Analysis
11.4. Prepare Environment for Analysis
11.4.1. RStudio project
11.4.2. Packages
11.5. Data
11.6. Explore Data
11.7. Kaplan–Meier Survival Estimates
11.8. Plot the Survival Probability
11.9. Comparing Kaplan–Meier Estimates across Groups
11.9.1. Log-rank test
11.9.2. Peto-peto test
11.10. Semi-Parametric Models in Survival Analysis
11.10.1. Cox proportional hazards regression
11.10.2. Advantages of the Cox proportional hazards regression
11.11. Estimation from Cox Proportional Hazards Regression
11.11.1. Simple Cox PH regression
11.11.2. Multiple Cox PH regression
11.12. Adding Interaction in the Model
11.13. The Proportional Hazard Assumption
11.13.1. Risk constant over time
11.13.2. Test for PH assumption
11.13.3. Plots to assess PH assumption
11.14. Model Checking
11.14.1. Prediction from Cox PH model
11.14.2. Residuals from Cox PH model
11.14.3. Influential observations
11.15. Plot the Adjusted Survival
11.16. Presentation and Interpretation
11.17. Summary
12. Parametric Survival Analysis
12.1. Objectives
12.2. Introduction
12.2.1. Advantages of parametric survival analysis models
12.3. Parametric Survival Analysis Model
12.3.1. Proportional hazard parametric models
12.3.2. Accelerated failure time model (AFT) models
12.4. Analysis
12.4.1. Dataset
12.4.2. Set the environment
12.4.3. Read dataset
12.4.4. Data wrangling
12.4.5. Exploratory data analysis (EDA)
12.4.6. Exponential survival model
12.4.7. Weibull (accelerated failure time)
12.4.8. Weibull (proportional hazard)
12.4.9. Model adequacy for Weibull distribution
12.5. Summary
13. Introduction to Missing Data Analysis
13.1. Objectives
13.2. Introduction
13.3. Types of Missing Data
13.4. Preliminaries
13.4.1. Packages
13.4.2. Dataset
13.5. Exploring Missing Data
13.6. Handling Missing Data
13.6.1. Listwise deletion
13.6.2. Simple imputation
13.6.3. Single imputation
13.6.4. Multiple imputation
13.7. Presentation
13.8. Resources
13.9. Summary
14. Model Building and Variable Selection
14.1. Objectives
14.2. Introduction
14.3. Model Building
14.4. Variable Selection for Prediction
14.4.1. Backward elimination
14.4.2. Forward selection
14.4.3. Stepwise selection
14.4.4. All possible subset selection
14.5. Stopping Rule and Selection Criteria in Automatic Variable Selection
14.6. Problems with Automatic Variable Selections
14.7. Purposeful Variable Selection
14.8. Summary
Bibliography
Index