Heterogeneity, or mixtures, are ubiquitous in genetics. Even for data as simple as mono-genic diseases, populations are a mixture of affected and unaffected individuals. Still, most statistical genetic association analyses, designed to map genes for diseases and other genetic traits, ignore this phenomenon.
In this book, we document methods that incorporate heterogeneity into the design and analysis of genetic and genomic association data. Among the key qualities of our developed statistics is that they include mixture parameters as part of the statistic, a unique component for tests of association. A critical feature of this work is the inclusion of at least one heterogeneity parameter when performing statistical power and sample size calculations for tests of genetic association.
We anticipate that this book will be useful to researchers who want to estimate heterogeneity in their data, develop or apply genetic association statistics where heterogeneity exists, and accurately evaluate statistical power and sample size for genetic association through the application of robust experimental design.
Author(s): Derek Gordon, Stephen J. Finch, Wonkuk Kim
Series: Statistics for Biology and Health
Publisher: Springer
Year: 2021
Language: English
Pages: 352
City: Cham
Foreword
References
Acknowledgements
Initial Comments and Technical Notes
References
Contents
1 Introduction to Heterogeneity in Statistical Genetics
1.1 Different Types of Heterogeneity
1.2 A Note on Definitions and Notation Throughout This Book
1.3 Hardy–Weinberg Equilibrium (HWE) Proportions and Their Importance in Gene-Mapping
1.4 Determination of Conditional Genotype Frequencies
1.4.1 Genetic Model-Free Approaches
1.4.2 Genetic Model-Based Approach Through the Use of Genotype Relative Risks
1.5 The Box (and Whiskers) Plot as a Tool for Visualizing Empirical Data Distributions
1.6 Power and Minimum Sample Size (MSSN) for Different Statistical Tests of Genetic Association
1.6.1 Contingency Table for Organizing Categorical Phenotype and Genomic-Data
1.7 The Expectation–Maximization (EM) Algorithm
1.7.1 Example Application
References
2 Overview of Genomic Heterogeneity in Statistical Genetics
2.1 Heterogeneity Due to SNP Genotype Misclassification
2.2 Examples of How Genotype Misclassification May Arise in Practice
2.3 Mathematical Models of Genotype Misclassification
2.4 Genotype Misclassification for Genomic Data with Three or More Categories
2.5 Effects of Misclassification on Statistical Tests
2.5.1 Non-differential Misclassification Error
2.5.2 Differential Misclassification Error
2.5.3 Non-differential Misclassification in Family-Based Tests of Association
2.6 Errors in Next-Generation Sequencing (NGS)
2.6.1 Definitions and Notation
2.6.2 Mathematical Model for NGS Data
2.6.3 Empirical Type I Error for Test Statistics Applied to NGS Data with Sequence Error—Simulation Results
2.7 Non-misclassification Forms of Heterogeneity
2.7.1 Mathematical Model for Heterogeneity
References
3 Phenotypic Heterogeneity
3.1 Phenotype Misclassification
3.2 How Phenotype Misclassification May Arise in Practice
3.2.1 Lack of Access to Gold-Standard Classification
3.2.2 Variability of Phenotype Expression over Time
3.2.3 Variable Age of Onset
3.2.4 Incomplete Knowledge of Gold-Standard Classifications
3.2.5 Model Misspecification
3.3 Effects of Misclassification on Statistical Tests
3.3.1 Non-differential Misclassification Error Example for Single-Stage Genetic Association
3.3.2 Why Do We Observe Such Large Power Loss/MSSN Increase for Phenotype Misclassification?
3.3.3 Multi-stage Phenotype Classification and Limits of Observed Genotype Frequencies
3.4 Non-misclassification Forms of Heterogeneity
3.5 Summary
References
4 Association Tests Allowing for Heterogeneity
4.1 Introduction
4.2 Statistical Tests that Use Genotype Data
4.2.1 Likelihood Ratio Test that Allows for Random Phenotype and Genotype Misclassification Error (LRTae)
4.2.2 Trend Statistic that Allows for Random Phenotype and Genotype Misclassification Error
4.2.3 Likelihood Ratio Statistic for Family-Based Association that Incorporates Genotype Misclassification Errors (TDTae)
4.3 Statistical Tests that Consider Heterogeneity Other Than Misclassification
4.3.1 Mixture Likelihood Ratio Test (MLRT) for Genetic Association in the Presence of Locus Heterogeneity
4.3.2 Transmission Disequilibrium Test that Allows for Locus Heterogeneity (TDT-HET)
4.3.3 Tests that Incorporate Phenotype Heterogeneity
4.4 Statistical Tests that Use Sequence Data
4.4.1 Single-Variant and Multiple Variant Tests of Trend for Genetic Association that Allows for Random and Differential NGS Error ( LTTae,NGS )
4.4.2 Transmission Disequilibrium Test that Allows for Next-Generation Sequence Error (TDT1-NGS)
References
5 Designing Genetic Linkage and Association Studies that Maintain Desired Statistical Power in the Presence of Mixtures
5.1 Parameter Settings, for Example, Calculations
5.1.1 Example Parameter Settings to Compute Power for a Fixed Sample Size and Significance Level
5.1.2 Example Parameter Settings to Compute MSSN for a Fixed Power and Significance Level
5.2 Statistical Tests that Use Genotype Data
5.2.1 Power and MSSN for Population-Based Data in the Presence of Non-differential Genotype Misclassification
5.2.2 Power and MSSN for Population-Based Data in the Presence of Non-differential Phenotype Misclassification
5.2.3 Likelihood Ratio Test that Allows for Random Phenotype and Genotype Misclassification Error ( LRTae )—Empirical Power
5.2.4 Trend Statistic that Allows for Random Phenotype and Genotype Misclassification Error
5.2.5 Family-Based Tests of Association—Analytic Solution to Increase in Rejection Rate for TDT in the Presence of Genotype Misclassification Errors
5.2.6 Family-Based Tests of Association—Analytic Solution to Increase in Rejection Rate for TDT in the Presence of Phenotype Misclassification Errors
5.3 Statistical Tests that Consider Heterogeneity Other Than Misclassification
5.3.1 Sample Size Calculations in the Presence of Locus Heterogeneity—Population-Based Tests of Genetic Association
5.3.2 Power and Sample Size Calculations for Chi-Square Tests of Independence on Allele and Genotype Data for Phenotype Heterogeneity
5.3.3 Family-Based Test of Linkage/Association
5.4 Power Calculations in the Presence of NGS Misclassification
5.4.1 Test of Trend Applied to Multiple NGS Data for SNP Loci
5.4.2 Increase in Interest for NGS Statistics
5.4.3 Empirical Null and Power Simulations for the LTTae,NGS Statistic
5.4.4 Factors that Most Significantly Affect Power for NGS-Based TDT
References
6 Threshold-Selected Quantitative Trait Loci and Pleiotropy
6.1 Quantitative Trait Locus with Single Phenotype
6.1.1 Notation
6.1.2 Conditional Genotype Frequencies for Threshold-Selected Phenotypes
6.1.3 Example Sample Size Calculation for Threshold-Selected Phenotypes
6.1.4 Why Use Threshold-Selected Dichotomous Phenotypes as Compared with Quantitative Phenotypes? Power Comparison with ANOVA
6.2 Quantitative Trait Locus with Multiple Phenotypes
6.2.1 Notation for Multivariate Quantitative Traits
6.2.2 Methods
6.2.3 Thresholds
6.2.4 Example MSSN Calculation
6.2.5 A Final Note on Advantages of the Threshold-Selected Approach
References
Bibliography
Index