Genomic Prediction of Complex Traits

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Includes cutting-edge methods and protocols Provides step-by-step detail essential for reproducible results Contains key notes and implementation advice from the experts

Author(s): Nourollah Ahmadi, Jérôme Bartholomé
Publisher: Humana Press
Year: 2022

Language: English

Preface
Contents
Contributors
Chapter 1: Genetic Bases of Complex Traits: From Quantitative Trait Loci to Prediction
Abbreviations
1 Introduction
2 Methods for Mapping Quantitative Trait Loci
2.1 Linkage Mapping Methods
2.2 Linkage Disequilibrium (LD) Mapping
2.3 Combination of LD and Linkage Mapping
3 Genetic Architecture of Complex Traits
3.1 The Number and the Effect Size of QTLs
3.2 Epistatic Interactions
3.3 Genotype by Environment Interactions (G x E)
4 Prediction of Complex Traits Based on Mapped QTLs
4.1 Polygenetic Disease Risk Score for Human Complex Diseases
4.2 Marker Assisted Prediction of Breeding Value for Animal and Plant Breeding
5 General Conclusion
References
Chapter 2: Genomic Prediction of Complex Traits, Principles, Overview of Factors Affecting the Reliability of Genomic Predicti...
1 Introduction
2 Genomic Prediction Methods
2.1 SNPBLUP
2.2 GBLUP
2.3 Equivalence Between GBLUP and SNPBLUP, Definitions of Genomic Matrices
2.4 Definitions of Genomic Matrices
2.5 Conclusion, Implementation, Usefulness of Knowing the Accuracy of Predictions
3 Observed Accuracy of Genomic Predictions
3.1 Accuracy Measurement Methods
3.1.1 A Posteriori Accuracy of GEBV in the Ideal Case
3.1.2 Experimental Procedures for an A Priori Estimate of the Accuracy
3.1.3 Accuracy of Genomic Predictions Obtained by Simulations
3.2 Main Factors of Affecting the Accuracy
3.2.1 Relative Accuracies of the Different Genetic and Genomic Predictions
3.2.2 Effect of the Marker Panel Characteristics
3.2.3 Effect of the Reference Population Structure
4 Algebra of the Genomic Predictions Accuracy
4.1 Proportion of Genetic Variability Captured by Markers
4.2 Accuracy of the Estimation of Marker Effects
4.2.1 Basic Eqs
4.2.2 Multilocus Approach
4.2.3 Taking into Account Linkage Disequilibrium
The Approach by the Effective Number of Loci
A Posteriori Calculation of the Effective Number of Loci
4.2.4 Consideration of Kinship Relationships
4.3 Combining the Proportion of Variability Captured by the Marker and the Accuracy of the Estimation of Their Effect
4.4 Testing the Prediction Equations
5 Conclusions
6 Notes
References
Chapter 3: Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches
1 Introduction
2 Impact of the Composition of the Calibration Set on the Accuracy of Genomic Prediction
2.1 Calibration and Predicted Individuals Ideally Originate from the Same Population
2.1.1 LD Between Markers and QTLs Can Be Different Between Populations
2.1.2 QTL Allele Frequencies Can Be Different Between Populations
2.1.3 QTL Allele Effects Can Be Different Between Populations
2.2 Genetic Relationships Between Calibration and Predicted Individuals Are Needed
2.3 Calibration Set Should Be As Large as Possible
2.4 Genetic Relationships Between CS Individuals Should Be Limited
3 Methods to Optimize the Composition of the Calibration Set
3.1 Model-Free Optimization Criteria Based on Genetic Distances Between Individuals
3.1.1 Optimization Based on Genetic Diversity Within the CS
3.1.2 Optimization Based on Genetic Relatedness Between the CS and the PS
3.1.3 Taking Population Structure into Account
3.2 Optimization Using ``Model-Based´´ Criteria Derived from the Mixed Model Theory (PEV, CD, r)
3.2.1 CS Optimization Using the Prediction Error Variance (PEV) or the Coefficient of Determination (CD)
3.2.2 Multitrait CS Optimization with CDmulti
3.2.3 CS Optimization Using the Expected Predictive Ability Or Accuracy (r)
3.3 Search Algorithms for Optimal CS and Corresponding Packages
4 Focus on Some Specific Applications of CS Optimization
4.1 CS Optimization for Predicting Biparental Populations
4.2 CS Optimization or Update When Phenotypes Are Already Available
4.2.1 Updating the CS
4.2.2 Subsampling Historical Phenotypic Records
4.2.3 Optimizing the Choice of Individuals to Be Genotyped
4.3 Optimization of the Calibration Set in the Context of Hybrid Breeding
4.4 Optimization of the Phenotypic Evaluation of the Calibration Set
5 Conclusion and Prospects
References
Chapter 4: Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools
1 Introduction
2 Imputation Methods and Tools: Advantages and Drawbacks
2.1 Population-Based Methods Requiring a Reference Panel
2.2 Pedigree-Based Imputation Methods
2.3 Imputation Methods that Do Not Require a Reference Population
3 Factors Affecting Imputation Accuracy and Subsequent Genomic Prediction Quality
3.1 Choice of the Imputation Method
3.2 Characteristics of the Low-Density Panel and its Optimized Choice
3.2.1 Characteristics of LDP Influencing the Imputation Accuracy
3.2.2 Optimization of the Low-Density Panel
3.3 Characteristics of the Reference Population and its Optimized Choice
3.3.1 Characteristics of RP Influencing the Imputation Accuracy
3.3.2 Optimization of the Reference Population
3.4 Choice of the Genomic Prediction Method
4 Conclusion
References
Chapter 5: Overview of Genomic Prediction Methods and the Associated Assumptions on the Variance of Marker Effect, and on the ...
1 Introduction
2 Methods and Models for GS
2.1 Linear Least Squares Regression
2.2 Penalized and Bayesian Methods
2.2.1 Normal Prior
2.2.2 Bayes A
2.2.3 Bayes B
2.2.4 Bayesian Least Absolute Angle and Selection Operator (LASSO)
2.3 Nonparametric and Semiparametric Methods for GS
2.3.1 Gaussian Kernel (GK) Method
2.4 Deep Learning (DL) Method
2.5 Arc-Cosine Kernel (AK) Method
3 Genomic Models Including GxE
4 Important Factors Affecting Prediction Accuracy
5 Concluding Remarks
References
Chapter 6: Overview of Major Computer Packages for Genomic Prediction of Complex Traits
1 Prediction Models in Breeding
1.1 The Beginnings of Prediction Models in Breeding
1.2 The Genomic Prediction Model for Complex Traits (Many Sides of the Same Dice)
1.3 Linear and Nonlinear Genomic Prediction Models
1.4 REML-Based Estimation Methods Used in Software for Genomic Prediction
1.5 The Algorithm for Genomic Prediction Programs Based on First and Second Derivatives
1.6 A Minimal Example of Estimation of Variance-Covariance Parameters
2 Most Common Software Used for Genomic Prediction Models
2.1 Algorithm Types
2.2 Modeling Strengths and Weaknesses
2.3 Language and Support
2.4 Additional REML-Based Software
3 Bayesian Approach for Genomic Prediction
3.1 The Likelihood
3.2 Maximum Likelihood
3.3 Prior Distributions
3.4 Posterior Distribution
3.5 Markov Chain Monte Carlo (MCMC)
3.6 Metropolis-Hastings Updates
3.7 Gibbs Sampling
4 The Most Common Software Used for Bayesian Genomic Prediction Models
5 Notes
References
Chapter 7: Genome-Enabled Prediction Methods Based on Machine Learning
Abbreviations
1 Introduction
1.1 Overview
1.2 What is Machine Learning?
2 Neural Networks
2.1 Multilayer Perceptron
2.2 Convolutional Neural Network (CNN)
3 Ensemble Methods
3.1 Random Forest
3.2 Boosting
4 Kernel Methods
4.1 Reproducing Kernel Hilbert Spaces
4.2 Support Vector Machines
5 Other Machine Learning Models
6 The Computer Science Behind Machine Learning in GWP
7 Predictive Ability of Machine Learning Method: a Meta Comparison
7.1 Literature Review
7.2 Meta-Analysis
8 Conclusions
References
Chapter 8: Genomic Prediction Methods Accounting for Nonadditive Genetic Effects
1 Introduction
2 Genomic Prediction
2.1 Genomic Prediction Models with Dominance
2.2 Dominance and Inbreeding Depression (or Heterosis)
2.3 Imprinting
2.4 Epistasis
2.5 Epistatic Relationship Matrices
2.6 Machine Learning Approaches
3 Applications of Genomic Prediction with Nonadditive Genetic Effects
3.1 Phenotype Prediction
3.2 Mate Allocation
4 Selection for Crossbreeding
5 Selection in Purebred Populations
6 Final Remarks
References
Chapter 9: Genome and Environment Based Prediction Models and Methods of Complex Traits Incorporating Genotype x Environment I...
1 Introduction
2 Historical Timeline of G x E Modeling in Genomic Prediction
3 Genomic-Enabled Prediction Models Accounting for G x E
3.1 Basic Single-Environment Genomic Model
3.1.1 Kernel Methods to Reproduce Genomic Relatedness Among Individuals
3.2 Basic Marker x Environment Interaction Models
3.3 Basic Genomic x Environment Interaction Models
3.4 Illustrative Examples When Fitting Models 2-5 with Linear and Nonlinear Kernels
4 Genomic-Enabled Reaction-Norm Approaches for G x E Prediction
4.1 Basic Inclusion of Genome-Enabled Reaction Norms
4.2 Modeling Reaction-Norm Effects Using Environmental Covariables (EC)
4.3 Inclusion of Dominance Effects in G x E and Reaction-Norm Modeling
4.4 Nonlinear Kernels and Enviromic Structures for Genomic Prediction
4.5 Genomic Prediction Accounting for G x E Under Uncertain Weather Conditions at Target Locations
5 G x E Genome-Based Prediction Under Ordinal Variables and Big Data
5.1 Genomic-Enabled Prediction Models for Ordinal Data Including G x E Interaction
5.2 Illustrative Application Bayesian Genomic-Enabled Prediction Models Including G x E Interactions, to Ordinal Variables
5.3 Approximate Genomic-Enabled Kernel Models for Big Data
6 Open Source Software for Fitting Genomic Prediction Models Accounting for G x E
7 Practical Examples for Fitting Single Environment and Multi-Environment Modeling G x E Interactions
7.1 Single Environment Models with BGLR
Box 1 Loading Bayesian Generalized Linear Regression (BGLR) and Wheat Data
Box 2 Fitting Bayesian Ridge Regression (BRR)
Box 3 Fitting Genomic Best Linear Unbiased Predictor (GBLUP)
7.2 Multienvironment MDs Model with BGLR
Box 4 Fitting Multienvironment Model with a Block Diagonal for Genotype-by-Environment (GxE) Variation
7.3 Multienvironment Factor Analytic Model Using MTM
Box 5 Multi Trait Model (MTM)
7.4 Multitrait or Multienvironment Factor Analytic Model Using MTM
Box 6 Multi Trait Model (MTM) with Factor Analytic (FA) Model
8 What to Expect for the Future of G x E in Genomic Prediction?
8.1 High-Throughput Phenotyping Opens an Avenue for Modeling Functional Traits Under G x E Scenarios
8.2 Accurate Environmental Data and Optimized Experimental Designs Are Essential for Accurately Predicting G x E
8.3 Deep Learning Is a Promising Way of Combining Genomics, HTP and Enviromics
9 Conclusion
References
Chapter 10: Accounting for Correlation Between Traits in Genomic Prediction
1 Introduction
2 Models for Multitrait Genomic Prediction
2.1 Multitrait Linear Model
2.2 Bayesian Multiple-Trait Multiple-Environment (BMTME) Model
2.3 Multitrait Deep Learning Model
3 Implementation of the Models
3.1 Maize Example
3.2 Wheat Example
3.3 Evaluation of Prediction Performance
4 Implementation of Multitrait Linear Model
5 Implementation of BMTME Model
6 Implementation of Multitrait Deep Learning Model
7 Mixed Multitrait Deep Learning Model
8 Binary Multitrait Deep Learning Model
9 Categorical Multitrait Deep Learning Model
10 Count Multitrait Deep Learning Model
11 Continuous Multitrait Deep Learning Model
12 Discussions
13 Conclusions
Appendix A1: R Code to Compute the Metrics for Continuous Response Variables Denoted as PC_MM.R
Appendix A2: Bayesian GBLUP Multitrait Linear Model
Appendix A3: Bayesian Ridge Regression (BRR) Multitrait Linear Model
Appendix A4: BMTME Model
Appendix A5: Multitrait Deep Learning for Mixed Response Variables
Appendix A6: Guide for Tensorflow and Keras Installation in R
Change PATH Environment Variable
In Windows
Tensorflow and Keras Installation
References
Chapter 11: Incorporation of Trait-Specific Genetic Information into Genomic Prediction Models
1 Introduction
2 Differentially Weighted SNPs to Construct the Genomic Relationship Matrix (GRM)
3 Utilizing Linkage Disequilibrium (LD) Among SNPs
4 Partitioning Genomic Variance in Prediction Models Using Biological Prior Information (Genomic Features)
5 Direct Incorporation of Multi-omics Data into Prediction
6 Utilizing Other Information Relevant with Complex Traits into Prediction
7 Perspectives
References
Chapter 12: Incorporating Omics Data in Genomic Prediction
1 Why Other ``Omics´´?
2 Pitfalls for Omics-Based Predictions When Used in the Context of Breeding
3 Data Standardization and Adjustment
4 Omics-Based Prediction in Scientific Literature
5 Statistical Approaches for ``Omics´´-Based Prediction
6 Related Software Packages
7 Summary
References
Chapter 13: Integration of Crop Growth Models and Genomic Prediction
1 Introduction
2 Crop Growth Models
3 Gene-Based Models
4 CGMs and QTL Mapping
5 CGMs and GP
5.1 Overview
5.2 Predictive Ability of GP-Assisted CGMs
5.3 Genotype-Specific Parameters in CGMs
5.4 Parameter Estimation
6 Examples of CGMs Applications
6.1 Overview of Examples
6.2 DVR Model
Box 1 An R Script for the DVR Model (DVRmodel.R)
Box 2 An Rcpp Script for the DVR Model (DVRmodel.cpp)
6.3 Maize Growth Model
Box 3 An R Script for the Maize Growth Model (MaizeGrowthModel.R)
Box 4 An Rcpp Script for the Maize Growth Model (MaizeGrowthModel.cpp)
6.4 Examples of CGM Fitting
Box 5 Fitting and Optimization of the DVR Model
Box 6 Fitting and Optimization of the Maize Growth Model (RunMaizeGrowthModel.R)
6.5 Examples of the Joint Approach
Box 7 An Rcpp Script for the DVR Model Designed for GenomeBasedModel (DVRmodel_GBM.cpp)
Box 8 An Rcpp Script for the Maize Growth Model Designed for GenomeBasedModel (MaizeGrowthModel_GBM.cpp)
Box 9 An R Script for GenomeBasedModel for the DVR Model (RunDVRmodel_GBM.R)
7 Concluding Remarks
Box 10 An R Script to Run GenomeBasedModel for the maize Growth Model (RunMaizeGrowthModel_GBM.R)
8 Script Availability
References
Chapter 14: Phenomic Selection: A New and Efficient Alternative to Genomic Selection
1 Introduction: The Concept of Phenomic Selection and its Relationship with Other Uses of Spectra in Breeding
2 Literature Review on the Use of Spectra in Selection
2.1 Types of Technology
2.2 Preprocessing NIR Spectra
2.3 Statistical Models for Phenotype Prediction
2.4 Relative Performance of PS Versus GS
2.5 Factors Affecting PS Predictive Ability
3 Prospects
3.1 Prebreeding: Screening Diversity Collections at Low Cost
3.2 Sparse Testing: Experimental Design Optimization in Breeding Programs
3.3 Combining Reduction of Generation Time (Speed Breeding) and Performance Prediction (PS) to Increase Genetic Progress
3.4 GEI Prediction
3.5 Making Use of Historical NIRS Data in Prediction
3.6 The Case of Perennials
3.7 Other Applications
3.7.1 Genotype Inference
3.7.2 Hybrid Prediction
3.7.3 Progeny sorting
4 Conclusion
References
Chapter 15: From Genotype to Phenotype: Polygenic Prediction of Complex Human Traits
1 Introduction
2 Present Status: 2020
2.1 Quantitative Trait PGS: Height, Bone, and Blood
2.2 Disease Risks: Polygenic Risk Scores (PRS)
2.3 Sibling Validation
3 Methods, Genetic Architecture, and Theoretical Considerations
3.1 Sparse Learning, L1 Penalization, Phase Transition Behavior
3.1.1 Predictor Training
3.1.2 Score/Validation
3.1.3 Evaluation
3.2 Prediction Across Ancestry Groups, Causal Variants
3.3 Coding Regions, Pleiotropy
3.4 Linearity/Additivity
4 The Future
References
Chapter 16: Genomic Prediction of Complex Traits in Animal Breeding with Long Breeding History, the Dairy Cattle Case
1 Introduction
2 Genetic Evaluation of Dairy Cattle Prior to Genomics
3 Detection of Segregating Quantitative Trait Loci (QTL) in Dairy Cattle, the ``Granddaughter Design´´
4 The Architecture of Quantitative Genetic Variation, Major Genes vs The Infinitesimal Model
5 Computation of Genomic Evaluations in Dairy Cattle Based on Analysis of Sire Evaluations
6 ``Single-step´´ Methods of Genomic Evaluation
7 The Factors that Affect the Accuracy of Genomic Evaluations
8 How Genomics Has Changed the World Dairy Cattle Industry
9 The Future
10 Conclusions
References
Chapter 17: Genomic Selection in Aquaculture Species
1 General Introduction
2 Specificities of Genomic Selection in Aquaculture Species
3 Accuracy of Genomic Prediction for Important Traits in Aquaculture Species
4 Imputation as a Key for Cost-Effective GS in Aquaculture
5 GS to Tackle Genotype-by-Environment Interactions in Aquaculture
6 Genomic Selection and Surrogate Breeders to Reduce Generation Time
7 Prospect of Genomic Selection in Aquaculture
7.1 Is Genomic Selection a Replacement of Conventional Pedigree- and Phenotype-Based Selection?
7.2 Is a Genome Assembly Needed to Apply Genomic Selection?
7.3 Can Genomic Selection be Combined with Other ``omics´´ Approaches?
7.4 Can Genomic Selection be Combined with Genome Editing?
7.5 Will Remote High-Throughput Phenotyping be the Next Revolution for Breeding Programs?
8 Conclusion
References
Chapter 18: Genomic Prediction of Complex Traits in Perennial Plants: A Case for Forest Trees
1 Introduction
2 Some Highlights from Published Empirical Studies
3 Factors that Affect Accuracy of Genomic Prediction
3.1 Prediction Ability and Prediction Accuracy in GS
3.2 Relatedness Between the Training Population and the Prediction Set
3.3 Genotype by Environment and Marker by Environment Interactions
3.4 Genotype by Age Interactions in GS
3.5 Effective Population Size
3.6 Training Population Size
3.7 Impact of Trait Heritability
3.8 Statistical Models in Genomic Selection
3.9 Marker Density
4 Genotyping Technologies
4.1 SNP Arrays
4.2 Sequencing-Based Methods
5 Some Benefits of Genomics in Forest Tree Breeding
6 A Summary of GS Strategy for Pinus taeda in the USA
7 Conclusions and Suggestions for GS Implementation in Forest Trees
References
Chapter 19: Genomic Prediction of Complex Traits in Forage Plants Species: Perennial Grasses Case
1 Introduction
1.1 Biological Specificities
1.2 Typical Breeding Scheme
1.3 Use of Molecular Markers in Forage Crop Breeding
2 Availability of Molecular Markers for GS
3 Examples of Genomic Selection in Forage Grass Species
3.1 Perennial Ryegrass
3.2 Switchgrass
3.3 Timothy
4 Factors Affecting the Accuracy of GS
4.1 Number of Markers and Genetic Architecture of Traits
4.2 Size of the Training Population and Its Link with the Breeding Population
4.3 Genotype by Environment (G x E) Interactions
5 A Possible Practical Implementation of GS in Forage Grasses
6 Beyond Genomic Prediction
7 Conclusion
References
Chapter 20: Genomic Prediction of Complex Traits in an Allogamous Annual Crop: The Case of Maize Single-Cross Hybrids
1 Introduction
2 Single Hybrid Breeding in Maize
2.1 Generating New Parental Lines
2.2 Evaluation of Performances of the Upcoming Hybrids
3 Maize Breeding in the Genomic Era
3.1 Genomic Prediction in the Parental Line Development Pipeline
3.2 Genomic Selection of Crossing Partners to Obtain Superior Hybrids
3.3 Genomic Prediction for Single-Cross Hybrids
4 Factors Affecting the Accuracy of Genomic Predictions in Hybrid Breeding
4.1 Characteristics of Training Population
4.2 Population Structure
4.3 Genetic Architecture of the Target Trait
4.4 Genotype by Environment Interaction
4.5 Marker Density
4.6 Choice of Statistical Model
5 Optimizing GS-Based Single-Cross Hybrid Breeding
6 Maintaining Long-Term Sustainability of Genomic Selection
7 Final Remarks
References
Chapter 21: Genomic Prediction: Progress and Perspectives for Rice Improvement
1 Introduction
2 Genomic Prediction Works in Rice
2.1 General Overview
2.2 Important Findings and Current Limitations for Genomic Prediction in Rice
2.2.1 Important Findings
2.2.2 Current Limitations
3 Integration of Genomic Prediction into Rice Breeding Programs: Key Aspects
3.1 Map the Breeding Strategy
3.2 Reduce the Cycle Time
3.3 Design the Training Set
3.4 Generate and Integrate Good Quality Data
3.5 Take into Account the Costs
4 An Example on IRRI Breeding Program for Irrigated Systems
4.1 The Transition from Pedigree Breeding to Recurrent Genomic Selection
4.2 Description of the Breeding Schemes and Integrating Genomic Prediction
4.3 A Practical Example of the Analytical Pipeline
4.3.1 Selection of the Training Set
4.3.2 Single Trial Analysis
4.3.3 Genomic Predictions
5 Other Applications of Genomic Prediction for Rice Improvement
5.1 Characterization of Genetic Diversity for Pre-breeding
5.2 Definition of Heterotic Groups for Hybrid Breeding
5.3 Integration of High-Throughput Phenotyping and Environmental Information
6 Conclusion: A Point of View of a Rice Breeder
References
Chapter 22: Analyzing the Economic Effectiveness of Genomic Selection Relative to Conventional Breeding Approaches
1 Introduction
2 The Baseline Framework for Comparing the Economic Efficiency of Alternative Breeding Programs
2.1 The Caveats of Comparing Breeding Schemes Representing Different Budgets
2.2 The Microeconomic Framework
2.3 Building the Genetic Gain Function Using Simulation vs Analytical Expression
3 The Basic Principles Illustrated with One Simple Application to Wheat Breeding
3.1 Description of the Two Alternative Schemes
3.1.1 Simplifying the Schemes
3.1.2 Identifying the Basic Operations
3.1.3 Describing the Breeding Schemes
3.1.4 Formulating the Total Cost Equation
3.2 Unit Costs Evaluation
3.2.1 Describing the Sources of Costs Related to Each Basic Operation
3.2.2 Compiling the Unit Costs
3.3 Defining the Dimensions of the Compared Schemes
3.3.1 Strategies for Alternative Schemes
3.3.2 Defining Constraints on Parameters or Relation Among Them
3.3.3 Building Scenarios and Calculating the Number of Progenies for Each Scenario
3.3.4 Cost Distribution Among Operations
3.4 Integration with the Breeding Schemes Models and Results
3.4.1 Modeling the Breeding Schemes
3.4.2 Main Results
3.5 Extensions
3.6 Feedback on the Implementation of the Method
4 Extensions of the Baseline Framework
4.1 Average Cost of Genetic Gain
4.2 Cost Discounting
5 Conclusion
References
Correction to: Genomic Prediction Methods Accounting for Nonadditive Genetic Effects
Index