Multivariable Model - Building: A Pragmatic Approach to Regression Analysis based on Fractional Polynomials for Modelling Continuous Variables

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Multivariable regression models are of fundamental importance in all areas of science in which empirical data must be analyzed. This book proposes a systematic approach to building such models based on standard principles of statistical modeling. The main emphasis is on the fractional polynomial method for modeling the influence of continuous variables in a multivariable context, a topic for which there is no standard approach. Existing options range from very simple step functions to highly complex adaptive methods such as multivariate splines with many knots and penalisation. This new approach, developed in part by the authors over the last decade, is a compromise which promotes interpretable, comprehensible and transportable models.

Author(s): Patrick Royston, Willi Sauerbrei
Series: Wiley Series in Probability and Statistics
Edition: 1
Publisher: Wiley
Year: 2008

Language: English
Pages: 324
Tags: Математика;Теория вероятностей и математическая статистика;Математическая статистика;Прикладная математическая статистика;

Multivariable Model-Building......Page 4
Contents......Page 8
Preface......Page 18
1.1.1 Many Candidate Models......Page 22
1.1.3 Example 1: Continuous Response......Page 23
1.1.4 Example 2: Multivariable Model for Survival Data......Page 26
1.2.1 Effects of Assumptions......Page 29
1.2.3 Disadvantages of Fractional Polynomial Modelling......Page 30
1.3.1 Normal-Errors Regression......Page 31
1.3.3 Cox Regression......Page 33
1.3.5 Linear and Additive Predictors......Page 35
1.4.2 Graphical Analysis of Residuals......Page 36
1.5 Role of Subject-Matter Knowledge in Model Development......Page 37
1.6 Scope of Model Building in our Book......Page 38
1.7.2 Criteria for a Good Model......Page 39
1.7.3 Personal Preferences......Page 40
1.8 General Notation......Page 41
2.1 Introduction......Page 44
2.2 Background......Page 45
2.3 Preliminaries for a Multivariable Analysis......Page 46
2.4 Aims of Multivariable Models......Page 47
2.6 Procedures for Selecting Variables......Page 50
2.6.1 Strength of Predictors......Page 51
2.6.2 Stepwise Procedures......Page 52
2.6.3 All-Subsets Model Selection Using Information Criteria......Page 53
2.6.4 Further Considerations......Page 54
2.7.1 Myeloma Study......Page 56
2.7.2 Educational Body-Fat Data......Page 57
2.7.3 Glioma Study......Page 59
2.8.2 Simulation Study......Page 61
2.8.3 Shrinkage to Correct for Selection Bias......Page 63
2.8.4 Post-estimation Shrinkage......Page 65
2.8.5 Reducing Selection Bias......Page 66
2.8.6 Example......Page 67
2.9.2 Full, Pre-specified or Selected Model?......Page 68
2.9.4 Complexity, Stability and Interpretability......Page 70
2.9.5 Conclusions and Outlook......Page 71
3.1 Introduction......Page 74
3.2.2 Nominal......Page 75
3.3.1 Coding Schemes......Page 76
3.3.2 Effect of Coding Schemes on Variable Selection......Page 77
3.4.1 ‘Optimal’ Cutpoints: A Dangerous Analysis......Page 79
3.4.2 Other Ways of Choosing a Cutpoint......Page 80
3.5 Example: Issues in Model Building with Categorized Variables......Page 81
3.5.1 One Ordinal Variable......Page 82
3.5.2 Several Ordinal Variables......Page 83
3.6.1 Beyond Linearity......Page 85
3.6.2 Does Nonlinearity Matter?......Page 86
3.6.4 Interpretability and Transportability......Page 87
3.7 Empirical Curve Fitting......Page 88
3.7.2 Critique of Local and Global Influence Models......Page 89
3.8.2 Choice of Coding Scheme......Page 90
3.8.4 Handling Continuous Variables......Page 91
4 Fractional Polynomials for One Variable......Page 92
4.2.1 Genesis......Page 93
4.2.3 Relation to Box–Tidwell and Exponential Functions......Page 94
4.3.2 First Derivative......Page 95
4.4.2 Maximum or Minimum of a FP2 Function......Page 96
4.5 Examples of Curve Shapes with FP1 and FP2 Functions......Page 97
4.6 Choice of Powers......Page 99
4.9.1 Hypothesis Testing......Page 100
4.9.2 Interval Estimation......Page 101
4.10.2 Closed Test Procedure for Function Selection......Page 103
4.10.4 Sequential Procedure......Page 104
4.11.1 Computational Aspects......Page 105
4.12.2 Example......Page 106
4.13.1 Graphical......Page 107
4.13.2 Tabular......Page 108
4.14.1 Details of all Fractional Polynomial Models......Page 110
4.14.3 Details of the Fitted Model......Page 111
4.14.5 Fitted Odds Ratio and its Confidence Interval......Page 112
4.15 Modelling Covariates with a Spike at Zero......Page 113
4.16 Power of Fractional Polynomial Analysis......Page 115
4.16.2 Underlying Function FP1 or FP2......Page 116
4.16.3 Comment......Page 117
4.17 Discussion......Page 118
5.1 Introduction......Page 120
5.3 A Diagnostic Plot for Influential Points in FP Models......Page 121
5.3.2 Example 2: Primary Biliary Cirrhosis Data......Page 122
5.4 Dependence on Choice of Origin......Page 124
5.5 Improving Robustness by Preliminary Transformation......Page 126
5.5.1 Example 1: Educational Body-Fat Data......Page 127
5.5.3 Practical Use of the Pre-transformation g<\delta>(x)......Page 128
5.6.2 Negative Exponential Pre-transformation......Page 129
5.7.1 Example 1: Nerve Conduction Data......Page 130
5.7.2 Example 2: Triceps Skinfold Thickness......Page 131
5.8.1 Not all Curves are Fractional Polynomials......Page 132
5.8.2 Example: Kidney Cancer......Page 133
5.9 Discussion......Page 134
6.1 Introduction......Page 136
6.2 Motivation......Page 137
6.3 The MFP Algorithm......Page 138
6.3.2 Example......Page 139
6.4.1 Parameter Estimates......Page 141
6.4.3 Effect Estimates......Page 142
6.5.1 Function Plots......Page 144
6.5.2 Graphical Analysis of Residuals......Page 145
6.5.3 Assessing Fit by Adding More Complex Functions......Page 146
6.6.1 Interval Estimation......Page 150
6.6.2 Importance of the Nominal Significance Level......Page 151
6.6.3 The Full MFP Model......Page 152
6.6.4 A Single Predictor of Interest......Page 153
6.6.5 Contribution of Individual Variables to the Model Fit......Page 155
6.6.6 Predictive Value of Additional Variables......Page 157
6.7.1 Example 1: Oral Cancer......Page 159
6.7.2 Example 2: Diabetes......Page 160
6.7.3 Example 3: Whitehall I......Page 161
6.8.2 Example: GBSG Breast Cancer Data......Page 165
6.9 Discussion......Page 167
6.9.1 Philosophy of MFP......Page 168
6.9.3 Improving Robustness by Preliminary Covariate Transformation......Page 169
6.9.4 Conclusion and Future......Page 170
7.1 Introduction......Page 172
7.3.1 Effect of Type of Predictor......Page 173
7.3.4 Predefined Hypothesis or Hypothesis Generation......Page 174
7.3.7 Graphical Checks, Sensitivity and Stability Analyses......Page 175
7.4 The MFPI Procedure......Page 176
7.4.2 Check of the Results and Sensitivity Analysis......Page 177
7.5 Example 1: Advanced Prostate Cancer......Page 178
7.5.1 The Fitted Model......Page 179
7.5.2 Check of the Interactions......Page 181
7.5.3 Final Model......Page 182
7.5.4 Further Comments and Interpretation......Page 183
7.6.2 A Predefined Hypothesis: Tamoxifen–Oestrogen Receptor Interaction......Page 184
7.7.1 Interaction with Categorized Variables......Page 186
7.7.2 Example: GBSG Study......Page 187
7.8 STEPP......Page 188
7.9.2 Stability Investigation......Page 189
7.10 Comment on Type I Error of MFPI......Page 192
7.11 Continuous-by-Continuous Interactions......Page 193
7.11.1 Mismodelling May Induce Interaction......Page 194
7.11.2 MFPIgen: An FP Procedure to Investigate Interactions......Page 195
7.11.3 Examples of MFPIgen......Page 196
7.11.4 Graphical Presentation of Continuous-by-Continuous Interactions......Page 200
7.11.5 Summary......Page 201
7.13 Discussion......Page 202
8.1 Introduction......Page 204
8.2 Background......Page 205
8.3.1 Selection of Variables within a Bootstrap Sample......Page 206
8.4 Example 1: Glioma Data......Page 207
8.5 Example 2: Educational Body-Fat Data......Page 209
8.5.1 Effect of Influential Observations on Model Selection......Page 210
8.6 Example 3: Breast Cancer Diagnosis......Page 211
8.7.1 Summarizing Variation between Curves......Page 212
8.7.2 Measures of Curve Instability......Page 213
8.8.2 Plots of Functions......Page 214
8.8.3 Instability Measures......Page 216
8.8.4 Stability of Functions Depending on Other Variables Included......Page 217
8.9 Discussion......Page 218
8.9.2 Stability of Functions......Page 219
9.1 Introduction......Page 222
9.2 Background......Page 223
9.3.1 Restricted Cubic Spline Functions......Page 224
9.4.1 Cubic Smoothing Splines......Page 226
9.4.3 The MVSS Algorithm......Page 227
9.5 Example 1: Boston Housing Data......Page 228
9.5.1 Effect of Reducing the Sample Size......Page 229
9.5.2 Comparing Predictors......Page 233
9.6 Example 2: GBSG Breast Cancer Study......Page 235
9.7 Example 3: Pima Indians......Page 236
9.8 Example 4: PBC......Page 238
9.9 Discussion......Page 240
9.9.1 Splines in General......Page 241
9.9.4 Reporting of Selected Models......Page 242
9.9.5 Conclusion......Page 243
10.2 The Dataset......Page 244
10.3 Univariate Analyses......Page 247
10.4 MFP Analysis......Page 248
10.5.2 Residuals and Lack of Fit......Page 249
10.5.3 Robustness Transformation and Subject-Matter Knowledge......Page 250
10.5.4 Diagnostic Plot for Influential Observations......Page 251
10.5.6 Interactions......Page 252
10.6 Stability Analysis......Page 253
10.8.1 Selecting the Main-Effects Model......Page 256
10.8.2 Further Comments on Stability......Page 257
10.9 Discussion......Page 259
11.1 Time-Varying Hazard Ratios in the Cox Model......Page 262
11.1.1 The Fractional Polynomial Time Procedure......Page 263
11.1.3 Prognostic Model with Time-Varying Effects for Patients with Breast Cancer......Page 264
11.1.4 Categorization of Survival Time......Page 266
11.1.5 Discussion......Page 267
11.2.1 Example: Fetal growth......Page 268
11.2.2 Using FP Functions as Smoothers......Page 269
11.2.4 Discussion......Page 270
11.3.1 Quantitative Risk Assessment in Developmental Toxicity Studies......Page 271
11.3.2 Model Uncertainty for Functions......Page 272
11.3.3 Relative Survival......Page 273
11.3.4 Approximating Smooth Functions......Page 274
11.3.5 Miscellaneous Applications......Page 275
12.2.1 Variable Selection Procedure......Page 276
12.2.4 Sensitivity Analysis......Page 278
12.3.2 Meta-analysis......Page 279
12.4 Conclusion......Page 280
A.1 Summaries of Datasets......Page 282
A.2.2 GBSG Breast Cancer......Page 283
A.2.3 Educational Body Fat......Page 284
A.2.5 Prostate Cancer......Page 285
A.2.7 PBC......Page 286
A.2.9 Kidney Cancer......Page 287
A.3 Software......Page 288
Appendix B: Glossary of Abbreviations......Page 290
References......Page 292
Index......Page 306