Statistical Analytics for Health Data Science with SAS and R

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book aims to compile typical fundamental-to-advanced statistical methods to be used for health data sciences. Although the book promotes applications to health and health-related data, the models in the book can be used to analyze any kind of data. The data are analyzed with the commonly used statistical software of R/SAS (with online supplementary on SPSS/Stata). The data and computing programs will be available to facilitate readers’ learning experience. There has been considerable attention to making statistical methods and analytics available to health data science researchers and students. This book brings it all together to provide a concise point-of-reference for the most commonly used statistical methods from the fundamental level to the advanced level. We envisage this book will contribute to the rapid development in health data science. We provide straightforward explanations of the collected statistical theory and models, compilations of a variety of publicly available data, and illustrations of data analytics using commonly used statistical software of SAS/R. We will have the data and computer programs available for readers to replicate and implement the new methods. The primary readers would be applied data scientists and practitioners in any field of data science, applied statistical analysts and scientists in public health, academic researchers, and graduate students in statistics and biostatistics. The secondary readers would be R&D professionals/practitioners in industry and governmental agencies. This book can be used for both teaching and applied research.

Author(s): Jeffrey Wilson, Ding-Geng Chen, Karl E. Peace
Series: Chapman & Hall/CRC Biostatistics Series
Publisher: CRC Press/Chapman & Hall
Year: 2023

Language: English
Pages: 279
City: Boca Raton

Cover
Half Title
Series Page
Title Page
Copyright Page
Table of Contents
Preface
Readings
Acknowledgments
Author Biographies
List of Abbreviations
Chapter 1 Survey Sampling and Data Collection
1.1 Research Interest/Question
1.2 Some Basic Terminology
1.3 Sample Properties
1.4 Probability Sampling Design
1.4.1 Simple Random Sampling Design
1.4.2 Stratified Random Sampling Design
1.4.3 Cluster Random Sampling Design
1.4.4 Systematic Random Sampling Design
1.4.5 Complex Sampling Design or Multi-Stage Sampling
1.5 Nonprobability Sampling Schemes
1.6 Surveys
1.6.1 Studies
1.6.2 Cohort Study
1.6.3 Cross-Sectional Studies
1.6.4 Case-Control Studies
1.7 Key Terms in Surveys
1.8 Examples of Public Health Surveys
1.8.1 Los Angeles County Health Survey (2015)
1.8.2 National Crime Victimization Survey
References
Chapter 2 Measures of Tendency, Spread, Relative Standing, Association, and Belief
2.1 Research Interest/Question
2.2 Body Size, Glycohemoglobin, and Demographics from NHANES 2009–2010 Data
2.3 Statistical Measures of Central Tendency and Measures of Spread
2.3.1 Variables
2.3.2 Measures of Central Tendency
2.3.3 Measures of Spread or Variation or Dispersion
2.3.4 Measures of Relative Standing
2.3.4.1 Single Values
2.3.4.2 Interval Values
2.3.5 Measure of Skewness
2.3.6 Measure of Kurtosis
2.4 Results from NHANES Data on BMI
2.4.1 Analysis Results with SAS Program
2.4.2 Analysis Results with R Program
2.5 Measures of Association of Two Continuous Variables
2.6 NHANES Data on Measures of Association for Continuous Variables
2.6.1 Analysis of Data with SAS Program
2.6.2 Analysis of Data with R Program
2.7 Association Between Two Categorical Variables
2.7.1 Contingency Table
2.7.2 Odds Ratio
2.7.3 An Example in Odds Ratio
2.7.4 Relative Risk
2.7.5 Properties of a Diagnostic Test
2.8 Measure of Belief
2.9 Analysis of Data on Association Between Two Categorical Variables
2.9.1 Analysis of Data with SAS Program
2.9.2 Analysis of Data with R Program
Summary
2.10 Exercises
References
Chapter 3 Statistical Modeling of the Mean of Continuous and Binary Outcomes
3.1 Research Interest/Question
3.2 German Breast Cancer Study Data
3.3 Parametric Versus Nonparametric Tests
3.4 Statistical Model for Continuous Response
3.4.1 Parametric Tests
3.4.2 Statistical Model
3.4.3 Statistical Test – One Sample t-Test
3.4.4 Alternative Methods to One-Sample t-Test
3.5 Analysis of the Data
3.5.1 Analysis of the Data Using SAS Program
3.5.2 Analysis of the Data Using R Program
3.6 Continuous Response with No Covariate: Nonparametric Test
3.7 Data Analysis in SAS and R
3.7.1 Analysis of the Data Using SAS Program
3.7.2 Analysis of the Data Using R Program
3.8 Statistical Models for Categorical Responses
3.8.1 Parametric Tests
3.8.2 Test of Proportions [z-statistics]
3.8.3 Analysis of the Data Using SAS Program
3.8.4 Analysis of the Data Using R Program
3.9 Exact Tests
3.9.1 Analysis of the Data Using SAS Program
3.9.2 Analysis of the Data Using R Program
3.10 Summary and Discussion on One-Sample Test
3.11 Model with a Binary Factor (Two Subpopulations): Two-Sample Independent t
3.12 Modeling Continuous Response with a Binary Factor
3.12.1 Parametric Tests for Population Mean Difference
3.12.2 Assumptions
3.12.3 Two-sample Independent t-Statistic
3.12.4 Decision Using Two-Sample t-Test
3.13 Analysis of Data for Two-Sample t-Test in SAS and R
3.13.1 Analysis of Data with SAS Program
3.13.2 Analysis of Data with R Program
3.14 Nonparametric Two-Sample Test
3.14.1 Analysis of Data for Two-Sample Nonparametric Test
3.14.2 Analysis of Data SAS Program
3.14.3 Analysis of Data R Program
3.15 Modeling on Categorical Response with Binary Covariate
3.15.1 Analysis of Data
3.15.2 Analysis of Data with SAS Program
3.15.3 Analysis of Data with R Program
3.16 Summary and Discussion
3.17 Exercises
References
Chapter 4 Modeling of Continuous and Binary Outcomes with Factors: One-Way and Two-Way ANOVA Models
4.1 Research Interest/Question
4.2 Hospital Doctor Patient: Tumor Size
4.3 Statistical Modeling of Continuous Response with One Categorical Factor
4.3.1 Models
4.3.1.1 One-way ANOVA
4.3.1.2 F-Statistic in One-Way ANOVA
4.3.1.3 Assumptions
4.3.1.4 Decision
4.3.1.5 Multiple Comparisons
4.3.2 Analysis of Data in SAS and R
4.3.2.1 Analysis of Data with SAS Program
4.3.2.2 Analysis of Data with R Program
4.3.3 Continuous Response with a Categorical Factor: Nonparametric Tests
4.3.3.1 Analysis of Data with SAS Program
4.3.3.2 Analysis of Data with R Program
4.4 Modeling Binary Response with One Categorical Factor
4.4.1 Models
4.4.1.1 Test of Homogeneity-Hypotheses
4.4.1.2 Test of Independence-Hypotheses
4.4.1.3 Pearson Chi-Square
4.4.2 Analysis of Data using SAS and R
4.4.2.1 Analysis of Data with SAS Program
4.4.2.2 Analysis of Data with R Program
4.5 Modeling of Continuous Response with Two Categorical Factors
4.5.1 Models
4.5.2 Analysis of Data with SAS Program
4.5.3 Analysis of Data with R Program
4.6 Modeling of Continuous Response: Two-Way ANOVA with Interactions
4.6.1 Analysis of Data with SAS Program
4.6.2 Analysis of Data with R Program
4.7 Summary and Discussion
4.8 Exercises
References
Chapter 5 Statistical Modeling of Continuous Outcomes with Continuous Explanatory Factors: Linear Regression Models
5.1 Research Interest/Question
5.2 Continuous Response with One Continuous Covariate: Simple Linear Regression
5.2.1 Correlation Coefficient (a Linear Relation)
5.2.2 Fundamentals of Linear Regression
5.2.3 Estimation – Ordinary Least-Squares (OLS)
5.2.4 The Coefficient of Determination:
5.2.5 Hypothesis Testing
5.2.6 F-Test Statistic
5.2.7 Diagnostic Measures (Weisberg 1985)
5.2.8 Analysis of Data Using SAS Program
5.2.9 Analysis of Data Using R Program
5.3 Continuous Responses with Multiple Factors: Multiple Linear Regression
5.3.1 Analysis of Data Using SAS Program
5.3.2 Analysis of Data Using R Program
5.4 Continuous Responses with Continuous and Qualitative Predictors
5.4.1 Qualitative Predictors with More Than Two Levels
5.4.2 Analysis of Data Using SAS Program
5.4.3 Analysis of Data Using R Program
5.5 Continuous Responses with Continuous Predictors: Stepwise Regression
5.5.1 Analysis of Data Using SAS Program
5.5.2 Analysis of Data Using R Program
5.6 Research Questions and Comments
5.7 Exercises
References
Chapter 6 Modeling Continuous Responses with Categorical and Continuous Covariates: One-Way Analysis of Covariance (ANCOVA)
6.1 Research Interest/Question
6.2 Public Health Right Heart Catheterization Data
6.3 Hypothesis
6.3.1 One-Way ANCOVA Model
6.3.2 Assumptions for ANCOVA
6.3.2.1 What If We Do Not Have All These assumptions Satisfied?
6.3.3 F-Test Statistic in ANCOVA Model
6.3.4 ANCOVA Hypothesis Tests and the Analysis of Variance Table
6.4 Data Analysis
6.4.1 Analysis of Data with SAS Program
6.4.2 Analysis of Data with R Program
6.5 Summary and Discussion
6.6 Exercises
References
Chapter 7 Statistical Modeling of Binary Outcomes with One or More Covariates: Standard Logistic Regression Model
7.1 Research Interest/Question
7.2 Bangladesh Data
7.3 Binary Response with Categorical Predictors and Covariates
7.3.1 Models for Probability, Odds, and Logit
7.3.2 Statistical Model
7.3.2.1 Assumptions for Standard Logistic Regression Model
7.3.2.2 Interpretation of Coefficients on the Logits
7.3.2.3 Interpretation of the Odds Ratio
7.3.3 Model Fit
7.3.3.1 Classification Table
7.3.3.2 Hosmer-Lemeshow Test – Measure of Fit
7.3.3.3 ROC Curves
7.4 Statistical Analysis of Data with SAS and R
7.4.1 Statistical Analysis of Data with SAS Program
7.4.2 Statistical Analysis of Data with R Program
7.5 Research/Questions and Comments
7.6 Questions
Answers
7.7 Exercises
References
Chapter 8 Generalized Linear Models
8.1 Research/Question
8.2 German Breast Cancer Study Data
8.3 Generalized Linear Model
8.3.1 The Model
8.3.2 Assumptions When Fitting GLMs
8.4 Examples of Generalized Linear Models
8.4.1 Multiple Linear Regression
8.4.2 Logistic Regression
8.5 Numerical Poisson Regression Example
Predicted, Leverage, Residual, Standardized Residual, Cook’s Distance
Change the Model: Adjusting for Overdispersion
8.5.1 Statistical Analysis of Data SAS Program
8.5.2 Statistical Analysis of Data R Program
8.6 Summary and Discussions
8.7 Exercises
References
Chapter 9 Modeling Repeated Continuous Observations Using GEE
9.1 Research Interest/Question
9.2 Public Health Data
9.3 Two-Measurements on An Experimental Unit
9.4 Generalized Estimating Equations Models
9.4.1 Generalized Estimating Equations and Covariance Structure
9.4.2 Working Correlation Matrices
9.5 Statistical Data Analysis in SAS and R
9.5.1 Statistical Analysis of Data SAS Program
9.5.2 Statistical Analysis of Data R Program
9.6 Research/Questions and Comments
9.7 Exercises
References
Chapter 10 Modeling for Correlated Continuous Responses with Random-Effects
10.1 Research Interest/Question
10.2 Hospital Doctor Patient Data
10.2.1 Data Source and Previous Analyses
10.2.2 Nesting Structure
10.3 Linear Mixed-Effects Models and Parameter Estimation
10.3.1 Intra-Cluster Correlation
10.3.2 The Two-Level LME Model
Random-Intercept Model with Two Levels
Random-Intercepts and Random-Slopes Model
10.3.3 The Three-Level LME Model
10.3.4 Methods for Parameter Estimation
10.4 Data Analysis Using SAS
10.4.1 Two-Level Random-Intercept LME Model
10.4.2 Three-Level Random-Slope LME Model Adjusting All Other Covariates
10.5 Data Analysis Using R
10.5.1 Two-Level LME Model with Random-Intercept
10.5.2 Two-Level LME Model with Random-Intercept and Random-Slope
10.5.3 Model Selection
10.5.4 Three-Level Random-Slope LME Model Adjusting for All Other Covariates
10.5.5 Three-Level LME Model with Random-Intercept and Random-Slope
10.6 Discussions and Comments
10.7 Exercises
References
Chapter 11 Modeling Correlated Binary Outcomes Through Hierarchical Logistic Regression Models
11.1 Research Interest/Question
11.2 Bangladesh Data
11.3 Hierarchical Models
11.3.1 Two-Level Nested Logistic Regression with Random-Intercept Model
11.3.2 Interpretation of Parameter Estimates in a Subject-Specific Model
11.3.3 Statistical Analysis of DATA Using SAS Program
11.3.4 Statistical Analysis of DATA Using R Program
11.4 Two-Level Nested Logistic Regression Model with Random-Intercept and Random-Slope
11.4.1 Statistical Analysis of DATA with SAS Program
11.4.2 Statistical Analysis of DATA Using SAS Program
11.4.3 Statistical Analysis of Data Using R Program
11.5 Three-Level Nested Logistic Regression Model with Random-Intercepts
11.5.1 Statistical Analysis of Data Using SAS Program
11.5.2 Statistical Analysis of Data Using R Program
11.6 Research/Questions and Comments
11.6.1 Research Questions and Answers
11.6.2 Comments
11.7 Exercises
References
Index