Skip to main content

Genomic best linear unbiased prediction method including imprinting effects for genomic evaluation

Abstract

Background

Genomic best linear unbiased prediction (GBLUP) is a statistical method used to predict breeding values using single nucleotide polymorphisms for selection in animal and plant breeding. Genetic effects are often modeled as additively acting marker allele effects. However, the actual mode of biological action can differ from this assumption. Many livestock traits exhibit genomic imprinting, which may substantially contribute to the total genetic variation of quantitative traits. Here, we present two statistical models of GBLUP including imprinting effects (GBLUP-I) on the basis of genotypic values (GBLUP-I1) and gametic values (GBLUP-I2). The performance of these models for the estimation of variance components and prediction of genetic values across a range of genetic variations was evaluated in simulations.

Results

Estimates of total genetic variances and residual variances with GBLUP-I1 and GBLUP-I2 were close to the true values and the regression coefficients of total genetic values on their estimates were close to 1. Accuracies of estimated total genetic values in both GBLUP-I methods increased with increasing degree of imprinting and broad-sense heritability. When the imprinting variances were equal to 1.4% to 6.0% of the phenotypic variances, the accuracies of estimated total genetic values with GBLUP-I1 exceeded those with GBLUP by 1.4% to 7.8%. In comparison with GBLUP-I1, the superiority of GBLUP-I2 over GBLUP depended strongly on degree of imprinting and difference in genetic values between paternal and maternal alleles. When paternal and maternal alleles were predicted (phasing accuracy was equal to 0.979), accuracies of the estimated total genetic values in GBLUP-I1 and GBLUP-I2 were 1.7% and 1.2% lower than when paternal and maternal alleles were known.

Conclusions

This simulation study shows that GBLUP-I1 and GBLUP-I2 can accurately estimate total genetic variance and perform well for the prediction of total genetic values. GBLUP-I1 is preferred for genomic evaluation, while GBLUP-I2 is preferred when the imprinting effects are large, and the genetic effects differ substantially between sexes.

Background

Genomic imprinting is an epigenetic process that involves DNA methylation and histone modifications that distinguish the expression of maternal and paternal alleles [1]. The expression of an imprinted gene depends on the parent from which it is inherited. Complete inactivation of an imprinted gene results in functional haploidy, with only one of the two copies of the gene expressed. Well known examples of such imprinted genes are IGF2 (insulin-like growth factor 2) in pigs [2] and the Callipyge gene in sheep [3]. Moreover, imprinting may not entail the complete inactivation of a gene. In a study of peripheral blood leukocytes in humans, four of 38 cases exhibited substantial biallelic expression of IGF2, although the product level of the maternally-derived gene was lower than that of the paternally-derived gene in all cases [4]. Over 70 imprinted genes have been identified in mice [5], and 24 genes with parent-of-origin effects in beef cattle [6]. Furthermore, quantitative traits such as carcass composition, growth, teat number, and litter size have been suggested to exhibit imprinting effects [7-10]. Thus, imprinting effects may substantially contribute to the total genetic variation of quantitative traits.

There are several statistical methods for modeling imprinting effects. Using a mixed model, Schaeffer et al. [11] replaced the numerator relationship matrix with a gametic relationship matrix to calculate the expectation of covariance among relatives with imprinting. Essl and Voith [12] suggested that sire and dam models should be constructed separately to assess differences between paternal and maternal imprinting. Neugebauer et al. [13,14] recently fitted a model with correlated paternal and maternal gametes to simultaneously estimate imprinting variances between sexes in pigs and beef cattle. These methods are based on the traditional best linear unbiased prediction (BLUP) method, which uses only pedigree information. More recently, the genomic BLUP (GBLUP) method was developed by modifying the BLUP method to incorporate single nucleotide polymorphism (SNP) information in the form of a genomic relationship matrix that defines the additive genetic covariance among individuals. GBLUP includes genomic information into breeding value estimation and has been used for genomic selection in dairy cattle [15-18]. Therefore, modeling genetic effects by including imprinting effects is expected to improve the predictive ability of GBLUP. Thus, the objectives of this study were twofold: (1) develop a GBLUP method including imprinting effects (termed GBLUP-I hereafter) and (2) estimate genetic variances and assess the accuracies and unbiasedness of genomic predictions using simulation data with varying degrees of imprinting.

Methods

Genetic model

Spencer [19] first extended the standard two-allele one locus model of quantitative genetics to account for imprinting. Following the approach of Spencer [19], consider an autosomal biallelic locus with alleles A 1 and A 2 at frequencies 1−q and q, respectively, in the population. Allele frequencies of males and females were assumed to be the same and under Hardy-Weinberg equilibrium. By denoting a genotype, A i A j , A i and A j , are the paternally- and maternally-derived alleles, respectively. Following the approach of Spencer [19], the genotypic values for genotypes A 1 A 1, A 1 A 2, A 2 A 1, and A 2 A 2 are then given by a, d 1, d 2, and -a, respectively. In this study, the mean of two heterozygotes and the difference between two heterozygotes were defined as δ and ε:

$$ \delta =\frac{d_1+{d}_2}{2} $$

and

$$ \varepsilon =\frac{d_1-{d}_2}{2} $$

In this model, the heterozygous genotypic values were + ε and −ε deviations from δ, i.e., d 1 and d 2 can be rewritten as δ + ε and δε, respectively (Figure 1). With imprinting, reciprocal heterozygotes differ in their genotypes. For example, in the case of complete inactivation of the maternal allele (i.e., ε = a and δ = 0), the genotypic value of A 2 A 1 is the same as that of A 2 A 2, whereas the genotypic value of A 1 A 2 is the same as that of A 1 A 1. If the paternally-derived A 1 allele randomly combines with maternally-derived alleles from a population, the frequencies of the genotypes produced will be 1−q for A 1 A 1 and q for A 1 A 2. The genotypic values of A 1 A 1 and A 1 A 2 are a and d 1, respectively. Taking in account the proportions at which they occur, the mean value of genotypes produced from the paternally-derived A 1 allele is (1−q)a + qd 1. The mean genotypic value in the entire population (μ) is as follows:

Figure 1
figure 1

Genotypic values for the four genotypes ( A 1 A 1 , A 1 A 2, A 2 A 1, and A 2 A 2 ).

$$ \begin{array}{c}\hfill \mu ={\left(1-q\right)}^2\cdot a+\left(1-q\right)q\cdot {d}_1+\left(1-q\right)q\cdot {d}_2+{q}^2\cdot \left(-a\right)\hfill \\ {}\hfill =\left(1-2q\right)a+2\left(1-q\right)q\delta .\hfill \end{array} $$

Thus, the average effect of the paternally-derived A 1 allele is calculated from the difference between the mean value of the genotypes produced and population mean as follows:

$$ \begin{array}{l}\left(1-q\right)a+q{d}_1-\left\{\left(1-2q\right)a+2\left(1-q\right)q\delta \right\}\\ {}=q\left\{a+\left(2q-1\right)\delta +\varepsilon \right\}=q{\alpha}_m,\end{array} $$

where α m is the average effect of the allele substitution in the paternal gamete and is equivalent to the male breeding value of Spencer [19]. Similarly, the average effect of the maternally-derived A 1 allele is as follows:

$$ q\left\{a+\left(2q-1\right)\delta -\varepsilon \right\}=q{\alpha}_f, $$

where α f is the average effect of the allele substitution in the maternal gamete and is equivalent to the female breeding values of Spencer [19]. The average effects of all alleles are in Table 1.

Table 1 Average effects of paternal and maternal alleles at a QTL with imprinting

The genotypic deviation of a particular genotype can be calculated from the difference between its genotypic value and the population mean. For example, the genotypic deviation of A 1 A 2 is as follows:

$$ \begin{array}{c}\hfill {d}_1-\mu =\left(2q-1\right)a+\left\{1-2\left(1-q\right)q\right\}\delta +\varepsilon \hfill \\ {}\hfill =\left(2q-1\right)\alpha +2\left(1-q\right)q\delta +\varepsilon, \hfill \end{array} $$

where α = a + (2q − 1)δ When there is no imprinting (d 1= d 2 = δ and ε = 0) α is the same as the average effect of the allele substitution [20] and (2q − 1)α and 2(1 − q) are the same as the breeding value and dominance deviation of the traditional genetic model. By using δ and ε, a genotypic deviation can be divided into three terms. Under imprinting, the breeding values and dominance deviations are no longer uncorrelated, which means that the total genetic variance cannot be partitioned into the usual additive and dominance variance [19]. Therefore, in this study, total genetic variance was partitioned into three variances corresponding to α, δ, and ε \( \left({\sigma}_{a\hbox{'}}^2,{\sigma}_{d\hbox{'}}^2,\mathrm{and}\;{\sigma}_{i^{\hbox{'}}}^2\right) \) as follows:

$$ \begin{array}{c}\hfill {\sigma}_g^2=2\left(1-q\right)q{\alpha}^2+{\left\{2q\left(1-q\right)\right\}}^2{\delta}^2+2\left(1-q\right)q{\varepsilon}^2\hfill \\ {}\hfill ={\sigma}_{a^{\hbox{'}}}^2+{\sigma}_{d^{\hbox{'}}}^2+{\sigma}_{i\hbox{'}}^2.\hfill \end{array} $$

When there is no imprinting, \( {\sigma}_{a\hbox{'}}^2 \) and \( {\sigma}_{d\hbox{'}}^2 \) are the same as the additive and dominance genetic variance, respectively. In this case, the covariance between the α and δ terms (σ a '  d ') is equal to 0, as follows:

$$ \begin{array}{l}{\sigma}_{a^{\hbox{'}}{d}^{\hbox{'}}}={\left(1-q\right)}^2\cdot 2q\alpha \cdot \left(-2{q}^2\delta \right)\\ {}+\left(1-q\right)q\cdot \left(2q-1\right)\alpha \cdot 2\left(1-q\right)q\delta \\ {}+\left(1-q\right)q\cdot \left(2q-1\right)\alpha \cdot 2\left(1-q\right)q\delta \\ {}+{q}^2\cdot -2\left(1-q\right)\alpha \cdot \left\{-2{\left(1-q\right)}^2\delta \right\}=0.\end{array} $$

The covariance between the α and ε terms (σ a ' i ') is also equal to 0, as follows:

$$ \begin{array}{l}{\sigma}_{a^{\hbox{'}}{i}^{\hbox{'}}}={\left(1-q\right)}^2\cdot 2q\alpha \cdot 0\\ {}+\left(1-q\right)q\cdot \left(2q-1\right)\alpha \cdot \varepsilon \\ {}+\left(1-q\right)q\cdot \left(2q-1\right)\alpha \cdot \left(-\varepsilon \right)\\ {}+{q}^2\cdot -2\left(1-q\right)\alpha \cdot 0=0.\end{array} $$

Similarly, the covariance between the δ and ε terms is also equal to 0.

Alternatively, paternal and maternal gametic variances (\( {\sigma}_{p_{at}}^2 \) and \( {\sigma}_{m_{at}}^2 \), respectively) can be calculated from the variances of the average effects of paternally- and maternally-derived alleles:

$$ \begin{array}{c}\hfill {\sigma}_{p_{at}}^2=\left(1-q\right)\cdot {\left(q{\alpha}_m\right)}^2+q\cdot {\left\{-\left(1-q\right){\alpha}_m\right\}}^2\hfill \\ {}\hfill =\left(1-q\right)q{\alpha}_m^2,\hfill \end{array} $$

and

$$ \begin{array}{c}\hfill {\sigma}_{m_{at}}^2=\left(1-q\right)\cdot {\left(q{\alpha}_f\right)}^2+q\cdot {\left\{-\left(1-q\right){\alpha}_f\right\}}^2\hfill \\ {}\hfill =\left(1-q\right)q{\alpha}_f^2.\hfill \end{array} $$

The sum of these variances is as follows:

$$ \begin{array}{l}{\sigma}_{p_{at}}^2+{\sigma}_{m_{at}}^2=pq{\alpha}_m^2+pq{\alpha}_f^2\\ {}=2\left(1-q\right)q{\left\{a+\left(2q-1\right)\delta \right\}}^2+2\left(1-q\right)q{\varepsilon}^2\\ {}={\sigma}_{a\hbox{'}}^2 + {\sigma}_{i\hbox{'}}^2.\end{array} $$

Thus, the total genetic variance can be partitioned as follows:

$$ {\sigma}_g^2={\sigma}_{p_{at}}^2+{\sigma}_{m_{at}}^2+{\sigma}_{d\hbox{'}}^2. $$

Statistical model

Two statistical models of GBLUP-I based on genotypic values (GBLUP-I1) and gametic values (GBLUP-I2) are proposed here.

First, GBLUP-I1 is defined as follows:

$$ \mathbf{y}=\mathbf{X}\boldsymbol{\upbeta } +{\mathbf{Z}}_{\mathbf{a}}\mathbf{a}+{\mathbf{Z}}_{\mathbf{d}}\mathbf{d}+{\mathbf{Z}}_{\mathbf{i}}\mathbf{i}+\mathbf{e}, $$

where y is the vector of the phenotypes; β is the vector of the fixed effects; a, d, and i are the vectors of α, δ, and ε terms, respectively; X, Z a , Z d , and Z i are incidence matrices linking the phenotypes to β, a, d, and i, respectively; and e is the vector of errors. The variances of a, d, and i are as follows:

$$ \mathrm{V}\mathrm{a}\mathrm{r}\left(\mathbf{a}\right)={\mathbf{G}}_{\mathbf{a}}{\sigma}_{a\hbox{'}}^2, $$
$$ \mathrm{V}\mathrm{a}\mathrm{r}\left(\mathbf{d}\right)={\mathbf{G}}_{\mathbf{d}}{\sigma}_{d\hbox{'}}^2, $$

and

$$ \mathrm{V}\mathrm{a}\mathrm{r}\left(\mathbf{i}\right)={\mathbf{G}}_{\mathbf{i}}{\sigma}_{i\hbox{'}}^2, $$

where G a , G d , and G i are the genomic relationship matrices relevant to α, δ, and ε terms, respectively. These matrices describe the relationships among genotyped individuals and can be constructed by using the information from genome-wide SNPs. Let A 1j and A 2j be two alleles at the j th SNP and q j be the frequency of A 2j . G a and G d are the same as the genomic relationship matrices for breeding values and dominance deviations without imprinting. Thus, G a and G d can be calculated as described previously [21,22]:

$$ {\mathbf{G}}_{\mathbf{a}}=\frac{{\mathbf{M}}_{\mathbf{a}}{{\mathbf{M}}_{\mathbf{a}}}^{\hbox{'}}}{{\displaystyle {\sum}_j^{N_{snp}}}2{q}_j\left(1-{q}_j\right)}, $$

and

$$ {\mathbf{G}}_{\mathbf{d}}=\frac{{\mathbf{M}}_{\mathbf{d}}{\mathbf{M}}_{\mathbf{d}}^{\hbox{'}}}{{\displaystyle {\sum}_j^{N_{snp}}}{\left\{2{q}_j\left(1-{q}_j\right)\right\}}^2}, $$

where M a and M d are n × N snp matrices (n is the number of genotyped individuals, and N snp is the number of SNPs); the elements of M a and M d for the i th individual at the j th SNP are calculated as follows:

$$ {\mathbf{M}}_{\mathbf{a}\ i,j} = \left\{\begin{array}{l}2{q}_j\ \left({A}_1{A}_1\right)\hfill \\ {}2{q}_j-1\ \left({A}_1{A}_2\ \mathrm{and}\ {A}_2{A}_1\right)\hfill \\ {}2{q}_j-2\ \left({A}_2{A}_2\right)\hfill \end{array}\right., $$

and

$$ {\mathbf{M}}_{\mathbf{d}\ i,j}=\left\{\begin{array}{l}-2{q}_j^2\ \left({A}_1{A}_1\right)\hfill \\ {}2{q}_j\left(1-{q}_j\right)\ \left({A}_1{A}_2\ \mathrm{and}\ {A}_2{A}_1\right)\hfill \\ {}-2{\left(1-{q}_j\right)}^2\ \left({A}_2{A}_2\right)\hfill \end{array}\right.. $$

Similarly, M i is assumed to be a n × N snp matrix, and the element of M i for the i th individual at the j th SNP can be calculated as follows:

$$ {\mathbf{M}}_{\mathbf{i},i,j}=\left\{\begin{array}{r}\hfill 0\ \left({A}_1{A}_1\right)\\ {}\hfill 1\ \left({A}_1{A}_2\right)\\ {}\hfill -1\ \left({A}_2{A}_1\right)\\ {}\hfill 0\ \left({A}_2{A}_2\right)\end{array}\right. $$

The elements of M a , M d , and M i describe the coefficients of the α, δ, and ε. terms in Table 2, respectively. Therefore, i and its variance can be derived as follows:

Table 2 Genotypic values in the two-allele model
$$ \mathbf{i}={\mathbf{M}}_{\mathbf{i}}\boldsymbol{\varepsilon}, $$

where ε is the N snp dimensional vector of which the j th element is ε j . Thus, the variance of i is calculated as follows:

$$ \mathrm{V}\mathrm{a}\mathrm{r}\left(\mathbf{i}\right)={\mathbf{M}}_{\mathbf{i}}{\mathbf{M}}_{\mathbf{i}}^{\hbox{'}}\mathrm{V}\mathrm{a}\mathrm{r}\left(\varepsilon \right), $$
$$ {\sigma}_{i\hbox{'}}^2={\displaystyle \sum_j^{N_{snp}}}2{q}_j\left(1-{q}_j\right)\mathrm{V}\mathrm{a}\mathrm{r}\left(\varepsilon \right). $$

Consequently, G i can be calculated using M i :

$$ {\mathbf{G}}_{\mathbf{i}}=\frac{{\mathbf{M}}_{\mathbf{i}}{\mathbf{M}}_{\mathbf{i}}^{\hbox{'}}}{{\displaystyle {\sum}_j^{N_{snp}}}2{q}_j\left(1-{q}_j\right)}. $$

In general, GBLUP includes only breeding values. The statistical model of GBLUP is as follows:

$$ \mathbf{y}=\mathbf{X}\boldsymbol{\upbeta } +{\mathbf{Z}}_{\mathbf{a}}\mathbf{a}+\mathbf{e}. $$

Therefore, without imprinting and dominance, the GBLUP model is the same as GBLUP-I1.

Second, GBLUP-I2 is defined as:

$$ \mathbf{y} = \mathbf{X}\boldsymbol{\upbeta } +{\mathbf{Z}}_{{\mathbf{p}}_{\mathbf{at}}}{\mathbf{p}}_{\mathbf{at}}+{\mathbf{Z}}_{{\mathbf{m}}_{\mathbf{at}}}{\mathbf{m}}_{\mathbf{at}}+{\mathbf{Z}}_{\mathbf{d}}\mathbf{d}+\mathbf{e}, $$

where p at and m at are the vectors of paternal and maternal gametic effects, respectively; and \( {\mathbf{Z}}_{{\mathbf{p}}_{\mathbf{at}}} \) and \( {\mathbf{Z}}_{{\mathbf{m}}_{\mathbf{at}}} \) are incidence matrices linking phenotypes to p at and m at , respectively. The variances of p at and m at are as follows:

$$ \mathrm{V}\mathrm{a}\mathrm{r}\left({\mathbf{p}}_{\mathbf{at}}\right)={\mathbf{G}}_{{\mathbf{p}}_{\mathbf{at}}}{\sigma}_{p_{at}}^2, $$

and

$$ \mathrm{V}\mathrm{a}\mathrm{r}\left({\mathbf{m}}_{\mathbf{at}}\right)={\mathbf{G}}_{{\mathbf{m}}_{\mathbf{at}}}{\sigma}_{m_{at}}^2, $$

where \( {\mathbf{G}}_{{\mathbf{p}}_{\mathbf{at}}} \) and \( {\mathbf{G}}_{{\mathbf{m}}_{\mathbf{at}}} \) are the genomic relationship matrices of the paternal and maternal gametes, respectively. Let \( {\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}} \) and \( {\mathbf{M}}_{{\mathbf{m}}_{\mathbf{at}}} \) be the n × N snp matrices that specify the coefficients of a m and a f in Table 1; then, the elements of \( {\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}} \) and \( {\mathbf{M}}_{{\mathbf{m}}_{\mathbf{at}}} \) for the i th individual at the j th SNP are calculated as follows:

$$ {\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}\ i,j}\left\{\begin{array}{ll}{q}_j\hfill & \left({A}_1\right)\hfill \\ {}-\left(1-{q}_j\right)\hfill &\ \left({A}_2\right)\hfill \end{array}\right. $$

and

$$ {\mathbf{M}}_{{\mathbf{m}}_{\mathbf{at}}\ i,j}\ \left\{\begin{array}{cc}\hfill {q}_j\hfill & \hfill \left({A}_1\right)\hfill \\ {}\hfill -\left(1-{q}_j\right)\hfill & \hfill \left({A}_2\right)\hfill \end{array}\right.. $$

Therefore, p at and m at are as follows:

$$ {\mathbf{p}}_{\mathbf{at}}={\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}}{\boldsymbol{\upalpha}}_{\mathbf{m}} $$

and

$$ {\mathbf{m}}_{\mathbf{at}}={\mathbf{M}}_{{\mathbf{m}}_{\mathbf{at}}}{\boldsymbol{\upalpha}}_{\mathbf{f}}, $$

where α m and α f are the N snp -dimensional vectors of α m and α f , respectively. The variance of p at is equal to:

$$ \mathrm{V}\mathrm{a}\mathrm{r}\left({\mathbf{p}}_{\mathbf{at}}\right)={\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}}{\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}}^{\hbox{'}}\mathrm{V}\mathrm{a}\mathrm{r}\left({\alpha}_m\right). $$

The variance of the paternal gametic effect \( \left({\sigma}_{p_{at}}^2\right) \) is the sum of the variances of α m at all SNPs as follows:

$$ {\sigma}_{p_{at}}^2={\displaystyle \sum_j^{N_{snp}}}{q}_j\left(1-{q}_j\right)\mathrm{V}\mathrm{a}\mathrm{r}\left({\alpha}_m\right). $$

From this equation, Var(p at ) can be rewritten as follows:

$$ \mathrm{V}\mathrm{a}\mathrm{r}\left({\mathbf{p}}_{\mathbf{at}}\right)=\frac{{\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}}{\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}}^{\hbox{'}}}{{\displaystyle {\sum}_j^{N_{snp}}}{q}_j\left(1-{q}_j\right)}{\sigma}_{p_{at}}^2. $$

Consequently,

$$ {\mathbf{G}}_{{\mathbf{p}}_{\mathbf{at}}}=\frac{{\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}}{\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}}^{\hbox{'}}}{{\displaystyle {\sum}_j^{N_{snp}}}{q}_j\left(1-{q}_j\right)}. $$

Similarly,

$$ {\mathbf{G}}_{{\mathbf{m}}_{\mathbf{at}}}=\frac{{\mathbf{M}}_{{\mathbf{m}}_{\mathbf{at}}}{\mathbf{M}}_{{\mathbf{m}}_{\mathbf{at}}}^{\hbox{'}}}{{\displaystyle {\sum}_j^{N_{snp}}}{q}_j\left(1-{q}_j\right)}. $$

Stochastic simulation

A historical population was simulated to establish mutation-drift equilibrium. The simulated genome comprised 10 chromosomes, each 1 Morgan long, containing 100 000 randomly spaced SNPs and 1000 biallelic quantitative trait loci (QTL). In the first generation of the historical population, the initial allele frequencies of all SNPs and QTL were assumed to be 0.5. A recurrent mutation process was applied with a mutation rate for SNPs and QTL of 1.0 × 10−4 per locus per generation. Recombinations were sampled from a Poisson distribution with a mean of 1 per Morgan and then randomly placed along the chromosome. The historical population evolved over 20 000 generations of random selection and random mating, with a population size of 500 (250 males and 250 females) to reach mutation-drift equilibrium [23].

After 20 000 historical generations, the base population (G0) was generated. In G0, the population size decreased to 300 (150 males and 150 females). 10 000 markers and 200 QTL were randomly selected from the segregating SNPs and QTL with minor allele frequencies greater than 0.05. Therefore, N snp was equal to 10 000. Let Q 1 and Q 2 be two alleles at each QTL. The genotypic values of Q 1 Q 1 , Q 1 Q 2 , Q 2 Q 1 and Q 2 Q 2 , are given by a, d 1 , d 2 and -a, respectively. The value of a was drawn from a gamma distribution with a shape parameter of 0.42 and its sign was drawn at random with equal chance. For QTL with imprinting, the values of d 1 and d 2 were determined as the product of a and the degree of imprinting (τ). Let N m and N f be the number of QTL that are silencing the paternal alleles and maternal alleles. The total number of QTL with imprinting (N i ) was 60 (N m  + N f  = N i ), which were randomly chosen from the 200 QTL. The total genetic effect (g j ) of the j th animal was calculated by summing all QTL genotypic values, and its variance \( \left({\sigma}_g^2\right) \) was calculated from the variance of the genotypic deviations:

$$ \begin{array}{l}{\sigma}_g^2={\displaystyle \sum_{j=1}^{N_{QTL}}}{\left(1-{q}_j\right)}^2{\left\{2{q}_j{a}_j-2\left(1-{q}_j\right){q}_j{\delta}_j\right\}}^2\\ {}+{\displaystyle \sum_{j=1}^{N_{QTL}}}{q}_j\left(1-{q}_j\right){\left[\left(2{q}_j-1\right){a}_j+\left\{1-2{q}_j\left(1-{q}_j\right)\right\}{\delta}_j+{\varepsilon}_j\right]}^2\\ {}+{\displaystyle \sum_{j=1}^{N_{QTL}}}{q}_j\left(1-{q}_j\right){\left[\left(2{q}_j-1\right){a}_j+\left\{1-2{q}_j\left(1-{q}_j\right)\right\}{\delta}_j-{\varepsilon}_j\right]}^2\\ {}+{\displaystyle \sum_{j=1}^{N_{QTL}}}{q_j}^2{\left\{-2\left(1-{q}_j\right){a}_j-2\left(1-{q}_j\right){q}_j{\delta}_j\right\}}^2,\end{array} $$

where N QTL is the number of QTL. To obtain phenotypic values, an environmental effect was added to the true genetic value, which was sampled from the normal distribution, \( \mathrm{N}\left(0,\left(1-{H}^2\right){\sigma}_g^2/{H}^2\right), \) where H 2 is broad-sense heritability; narrow-sense heritability was set to 0.3. The phenotypic variance was finally standardized to be equal to 1.

After G0, the subsequent five generations (G1 to G5) were generated. In G1 to G5, 30 males were selected by BLUP on the basis of estimated breeding values and randomly mated to 150 dams to produce 300 offspring (150 males and 150 females). The reference population with both phenotypes and genotypes comprised 1200 individuals from G1 to G4, and the test population with only genotypes comprised 300 individuals from G5.

The range of d 1 and d 2, the number of QTL with imprinting (N i ), and N snp were varied to investigate their effects on the performance of GBLUP-I. In the base simulation scenario, τ = 1.0, N i  = 60, (N m N f ) = (0, 60), N snp  = 10 000, and paternal and maternal alleles were known. In this scenario, only maternal alleles were silenced. Six alternative scenarios were simulated in addition to the base scenario. In scenario 1, τ = 0.5, 0.75, and 1.0 to meet the condition that − a ≤ d 1d 2 ≤ a. In scenario 2, N i = 20, 60, and 100. In scenario 3, (N m , N f ) = (0, 60), (15, 45), and (30, 30). In scenario 4, H 2 = 0.1, 0.3, and 0.5. In scenario 5, N snp = 2000, 10 000, and 50 000. In scenario 6, the paternal and maternal alleles were assumed to be unknown. Parameter settings are outlined in Table 3. Twenty replicates were simulated for each scenario.

Table 3 Parameters for different scenarios

Outline of the analysis

In the base scenario, the paternal and maternal alleles were assumed to be known. However, such information is unknown when using real data, because only genotypes are available. In scenario 6, the maternal and paternal origins of specific alleles (phase) were predicted using genotype and pedigree information processed by AlphaImpute software [24]. The phasing accuracy was measured as the correlation between true and predicted alleles by origin.

Here, we estimated variance components and genetic values using GBLUP and two types of GBLUP-I. Variance components were estimated by average information restricted maximum likelihood (AI-REML) [25]. The reference population dataset was used to predict the genetic effects of the genotyped individuals in the test population. The accuracy of the estimated total genetic value (ρ) was assessed as the correlation between estimated and true values. The regression coefficients of total genetic value on its estimate (b) was calculated to assess unbiasedness.

Results

Tables 4 and 5 show the estimates of variance components and the predictive abilities of total genetic values with varying values of τ and N i in scenarios 1 and 2. Total genetic variance was underestimated by GBLUP when the degree of imprinting was high. With GBLUP, the estimated total genetic variances were equal to 97.6%, 91.3%, and 82.1% of true variances for τ of 0.5, 0.75, and 1.0, respectively, and 99.3%, 82.1%, and 78.2% of true variances for N i of 20, 50, and 100, respectively. The estimated total genetic variances by GBLUP-I1 and GBLUP-I2 were almost the same as the true variances regardless of the degree of imprinting.

Table 4 Variance component estimates and predictive abilities with varying degrees of imprinting ( τ ) in scenario 1
Table 5 Variance component estimates and predictive ability with varying numbers of QTL with imprinting ( N i ) in scenario 2

The prediction accuracies, ρ obtained with GBLUP-I1 exceeded those obtained with GBLUP by 1.4%, 3.1%, and 7.8% for τ of 0.5, 0.75, and 1.0, respectively, and by 0.2%, 7.8%, and 8.2% for N i of 20, 50, and 100, respectively. Compared to GBLUP-I1, the ρ values obtained with GBLUP-I2 were more affected by the degree of imprinting. When N i was equal to 60 and 100, the ρ values obtained with GBLUP-I2 exceeded those obtained with GBLUP by 6.8% and 11.0%; while, when N i was equal to 20, ρ was smaller with GBLUP-I2 than with GBLUP. For all values of τ and N i the b values obtained with GBLUP-I1 and GBLUP-I2 were closer to 1 than with GBLUP. In scenario 3, the predictive abilities of GBLUP-I1 were not affected by the values of N m and N f whereas the ρ values with GBLUP-I2 decreased as the difference between N m and N f decreased (Table 6).

Table 6 Variance component estimates and predictive ability with varying numbers of QTL silencing paternal and maternal alleles ( N m and N f ) in scenario 3

In scenario 4, for all values of H 2 the estimated variance components obtained with GBLUP-I1 and GBLUP-I2 were close to the true values (Table 7). The performance of GBLUP-I1 and GBLUP-I2 increased with increasing values of H 2. With H 2 of 0.1, 0.3, and 0.5, the ρ values obtained with GBLUP exceeded those obtained with GBLUP-I1 by 5.7%, 7.8%, and 9.2% and those obtained with GBLUP-I2 by 4.1%, 6.8%, and 7.1%. In scenario 5, the predictive abilities of GBLUP, GBLUP-I1, and GBLUP-I2 decreased when N snp decreased from 10 000 to 2000 whereas those were unaltered when N snp increased from 10 000 to 50 000 (Table 8).

Table 7 Variance component estimates and predictive ability with varying broad-sense heritability ( H 2 ) in scenario 4
Table 8 Variance component estimates and predictive ability with varying numbers of SNPs ( N snp ) in scenario 5

In scenario 6, the phasing accuracy was equal to 0.979. Prediction accuracies with GBLUP-I1 and GBLUP-I2 were 1.7% and 1.2% lower when paternal and maternal alleles were predicted than when paternal and maternal alleles were known (Table 9).

Table 9 Accuracies of estimated genetic values with predicted paternal and maternal alleles in scenario 6

Discussion

Performance of GBLUP-I

We present a new GBLUP method that includes imprinting effects for the prediction of total genetic value. For all scenarios, the performance of GBLUP-I1 to estimate variance components was always better than that of GBLUP. Prediction accuracies with GBLUP-I1 and GBLUP-I2 increased with increasing degree of imprinting and broad-sense heritability. Prediction accuracies with GBLUP-I2 were strongly affected by the degree of imprinting (Tables 4 and 5) and the difference between the values of N m and N f (Table 6).

Method GBLUP-I2 assumes that paternal and maternal gametic effects are independent. However, when there is no imprinting, sire and dam are genetically correlated, and thus paternal and maternal gametic effects are not independent [26] and the accuracy by GBLUP-I2 should be reduced. The reduction of accuracy by GBLUP-I2 would be small with a high degree of imprinting because of the low correlation between paternal and maternal gametic effects. Thus, the performance of GBLUP-I2 increases as the degree of the imprinting and the difference in genetic values between paternal and maternal gametes increase. Meanwhile, when the degree of imprinting is low and there is little difference in the genetic values between paternal and maternal gametes, GBLUP-I1 is preferred for genomic evaluation.

In a previous study with bovine data, when the number of SNPs was greater than 50 000, reliabilities of genomic evaluations remained almost unaltered as the number of SNPs increased [27]. In this study, with a N snp of 10 000, the average distance between neighboring SNPs was 0.1 cM, which is similar to the distance between SNPs in a bovine dataset that includes 50 000 SNPs. In scenario 5, when N snp was greater than 10 000, the performances of GBLUP and both GBLUP-I were not affected by various values of N snp . This suggests that high-density and costly chips with more (777 000) SNPs may not be necessary for genomic evaluation, even when imprinting effects exist.

In scenario 6, prediction accuracies obtained with GBLUP-I1 and GBLUP-I2 were higher than those obtained with GBLUP when paternal and maternal alleles were predicted. Thus, both GBLUP-I methods can be applied to real livestock data. The phasing accuracy was improved by increasing sample size [28], number of SNPs [29], and number of high-density genotyped relatives of the individuals to be imputed [24,30], which suggests that the performance of GBLUP-I can be further improved in real livestock data.

Degree of imprinting and number of QTL with imprinting

GBLUP-I1 and GBLUP-I2 could accurately capture the total genetic variance, whereas GBLUP underestimated the total genetic variance. The difference in estimated total genetic variance between GBLUP-I and GBLUP is caused by the imprinting effect. Here, we calculated imprinting variance as the difference in the estimated total genetic variance between GBLUP-I1 and GBLUP. When τ varied from 0.5 to 1.0 and N i from 20 to 100, imprinting variances were equal to 1.4% to 6.0% and 0.4% to 6.9% of the phenotypic variances (1.0), respectively.

de Vries et al. [31] were the first to estimate imprinting variance in livestock and found that approximately 5% and 4% of the phenotypic variance in back fat thickness and growth rate, respectively, were due to imprinting. More recently, imprinting variances were found to range from 5 to 19% of the total genetic variance for 19 pig performance traits [13] and, on average, to be equal to 28% of the total genetic variance for ultrasonic measurements of body composition in Australian beef cattle [26]. The degree of imprinting reported in our study is similar to those reported in the literature [13,26,31].

Effects of QTL parameters

Setting QTL parameters may affect the accuracy of genomic predictions. We investigated the effects of the number of QTL, the distribution of their effects, and their location. The values of N QTL ranged from 50 (N i  = 15) to 1000 (N i  = 300) and QTL were evenly spaced throughout the genome. The value of a was drawn from a normal distribution. In these conditions, the accuracies obtained by GBLUP, GBLUP-I1, and GBLUP-I2 were the almost the same as in the base scenario (Table 10).

Table 10 Accuracies of estimated genetic values with varying numbers of QTL, distributions of homozygous genotypic value, and locations of QTL

Significance of genetic effects in GBLUP-I

This study partitioned the total genetic value into three estimated genetic effects (α, δ, and ε terms) in GBLUP-I1. However, there is no biological meaning for these genetic effects. In order to estimate a breeding value and a dominance deviation, the genetic values should be defined by sex, as presented by Spencer [19]. In such a model, the number of variance components would be doubled and the covariance between the breeding value and dominance deviation would not be equal to 0. These factors would collectively reduce the accuracy of genetic evaluations.

Practical use of GBLUP-I

When a dominance effect exists, assortative mating or mate allocation can boost the field performance of livestock [32,33]. Similarly, when an imprinting effect exists, the performance of livestock can be improved by optimizing matings, because the genotypic values of A 1 A 2 and A 2 A 1 can be distinguished and evaluated accurately. Let pr ij (A1A1) pr ij (A 1 A 2 ) pr ij (A 2 A 1 ) and pr ij (A 2 A 2 ) be the probabilities of the genotypes A 1 A 1 , A 1 A 2 , A 2 A 1 , and A 2 A 2 for the i th offspring of future matings and the j th marker. In GBLUP-I1, the elements of coefficient matrices for these offspring (i.e., M a , M d , and M i ) can be calculated from the products of coefficients and the genotype probabilities. For example, the element of M i for offspring is as follows:

$$ {\mathbf{M}}_{\mathbf{i}\ i,j} = \left\{\begin{array}{l}0\ \left({A}_1{A}_1\right)\hfill \\ {}\ 1\times p{r}_{ij}\left({A}_1{A}_2\right)\ \left({A}_1{A}_2\right)\hfill \\ {}-1 \times p{r}_{ij}\left({A}_2{A}_1\right)\ \left({A}_2{A}_1\right)\hfill \\ {}\ 0\ \left({A}_2{A}_2\right)\hfill \end{array}\right.. $$

Likewise, in GBLUP-I2, the elements of \( {\mathbf{M}}_{{\mathbf{p}}_{\mathbf{at}}} \) and \( {\mathbf{M}}_{{\mathbf{m}}_{\mathbf{at}}} \) for the offspring of future matings can be calculated. Thus, the total genetic effects for the offspring of future matings can be predicted and maximized by using an optimum mating plan.

Conclusions

This study proposed two GBLUP methods i.e., GBLUP-I1 and GBLUP-I2, which include imprinting effects at the genotypic and gametic levels, respectively. The GBLUP-I1 and GBLUP-I2 methods accurately estimated the variance components and improved unbiasedness regardless of parameter settings. The accuracies of estimated total genetic values in GBLUP-I1 and GBLUP-I2 increased with increasing degree of imprinting and broad-sense heritability. Compared to GBLUP, the accuracies of estimated total genetic values obtained with GBLUP-I1 were always higher. Thus, in general, GBLUP-I1 should be applied for genetic evaluation. However, GBLUP-I2 is preferred when the imprinting effect is large and the genetic effects differ substantially between paternal and maternal gametes. After predicting the total genetic value by both GBLUP-I methods, assortative mating or mate allocation could be used to boost the field performance of livestock.

References

  1. Reik W, Walter J. Genomic imprinting: parental influence on the genome. Nat Rev Genet. 2001;2:21–32.

    Article  CAS  PubMed  Google Scholar 

  2. O’Neill MJ, Ingram RS, Vrana PB, Tilghman SM. Allelic expression of IGF2 in marsupials and birds. Dev Genes Evol. 2000;210:18–20.

    Article  PubMed  Google Scholar 

  3. Georges M, Charlier C, Cockett N. The callipyge locus: evidence for the trans interaction of reciprocally imprinted genes. Trends Genet. 2003;19:248–52.

    Article  CAS  PubMed  Google Scholar 

  4. Sakatani T, Wei M, Katoh M, Okita C, Wada D, Mitsuya K, et al. Epigenetic heterogeneity at imprinted loci in normal populations. Biochem Biophys Res Commun. 2001;283:1124–30.

    Article  CAS  PubMed  Google Scholar 

  5. Morison IM, Ramsay JP, Spencer HG. A census of mammalian imprinting. Trends Genet. 2005;21:457–65.

    Article  CAS  PubMed  Google Scholar 

  6. Imumorin IG, Kim EH, Lee YM, de Koning DJ, van Arendonk JAM, Donato MD, et al. Genome scan for parent-of-origin QTL effects on bovine growth and carcass traits. Front Genet. 2011;2:44.

    Article  PubMed Central  PubMed  Google Scholar 

  7. de Koning DJ, Rattink AP, Harlizius B, Groenen MAM, Brascamp EW, van Arendonk JAM. Detection and characterization of quantitative trait loci for growth and reproduction traits in pigs. Livest Prod Sci. 2001;72:185–98.

    Article  Google Scholar 

  8. Quintanilla R, Milan D, Bidanel JP. A further look at quantitative trait loci affecting growth and fatness in a cross between Meishan and Large White pig populations. Genet Sel Evol. 2002;34:193–210.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Hirooka H, de Koning DJ, Harlizius B, van Arendonk JAM, Rattink AP, Groenen MA, et al. A whole-genome scan for quantitative trait loci affecting teat number in pigs. J Anim Sci. 2001;79:2320–6.

    CAS  PubMed  Google Scholar 

  10. Stella A, Stalder KJ, Saxton AM, Boettcher PJ. Estimation of variances for gametic effects on litter size in Yorkshire and Landrace swine. J Anim Sci. 2003;81:2171–8.

    CAS  PubMed  Google Scholar 

  11. Schaeffer LR, Kennedy BW, Gibson JP. The inverse of the gametic relationship matrix. J Dairy Sci. 1989;72:1266–72.

    Article  Google Scholar 

  12. Essl A, Voith K. Genomic imprinting effects on dairy- and fitness-related traits in cattle. J Anim Breed Genet. 2002;119:182–9.

    Article  Google Scholar 

  13. Neugebauer N, Luther H, Reinsch N. Parent-of-origin effects cause genetic variation in pig performance traits. Animal. 2010;4:672–81.

    Article  CAS  PubMed  Google Scholar 

  14. Neugebauer N, Rader I, Schild HJ, Zimmer D, Reinsch N. Evidence for parent-of-origin effects on genetic variability of beef traits. J Anim Sci. 2010;88:523–32.

    Article  CAS  PubMed  Google Scholar 

  15. Dalton R. No bull: genes for better milk. Nature. 2009;457:369.

    Article  CAS  PubMed  Google Scholar 

  16. Lund M, de Ross S, de Vries A, Druet T, Ducrocq V, Fritz S, et al. A common reference population from four European Holstein populations increases reliability of genomic predictions. Genet Sel Evol. 2011;43:43.

    Article  PubMed Central  PubMed  Google Scholar 

  17. Mc Hugh N, Meuwissen TH, Cromie AR, Sonesson AK. Use of female information in dairy cattle genomic breeding programs. J Dairy Sci. 2011;94:4109–18.

    Article  CAS  PubMed  Google Scholar 

  18. Wiggans GR, VanRaden PM, Cooper TA. The genomic evaluation system in the United States: past, present, future. J Dairy Sci. 2011;94:3202–11.

    Article  CAS  PubMed  Google Scholar 

  19. Spencer HG. The correlation between relatives on the supposition of genomic imprinting. Genetics. 2002;161:411–7.

    PubMed Central  CAS  PubMed  Google Scholar 

  20. Falconer DS, Mackay TFC. Introduction to Quantitative Genetics. Addison Wesley Longman: Essex; 1996.

    Google Scholar 

  21. VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.

    Article  CAS  PubMed  Google Scholar 

  22. Nishio M, Satoh M. Including dominance effects in the genomic BLUP method for genomic evaluation. PLoS One. 2014;9:e85792.

    Article  PubMed Central  PubMed  Google Scholar 

  23. Nishio M, Satoh M. Parameters affecting genome simulation for evaluating genomic selection method. Anim Sci J. 2014;85:879–87.

    Article  PubMed  Google Scholar 

  24. Hickey JM, Kinghorn BP, Tier B, van der Werf JH, Cleveland MA. A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation method. Genet Sel Evol. 2012;44:9.

    Article  PubMed Central  PubMed  Google Scholar 

  25. Johnson DL, Thompson R. Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. J Dairy Sci. 1995;78:449–56.

    Article  CAS  Google Scholar 

  26. Tier B, Meyer K. On the analysis of parent-of-origin effects with examples from ultrasonic measures of carcass traits in Australian beef cattle. J Anim Breed Genet. 2012;129:359–68.

    Article  CAS  PubMed  Google Scholar 

  27. VanRaden PM, O’Connell JR, Wiggans GR, Weigel KA. Genomic evaluations with many more genotypes. Genet Sel Evol. 2011;43:10.

    Article  PubMed Central  PubMed  Google Scholar 

  28. Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, Rosenberg NA, et al. Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet. 2009;84:235–50.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  29. Zhang Z, Druet T. Marker imputation with low-density marker panels in Dutch Holstein cattle. J Dairy Sci. 2010;93:5487–94.

    Article  CAS  PubMed  Google Scholar 

  30. Hickey JM, Crossa J, Babu R, de los Campos G. Factors affecting the accuracy of genotype imputation in populations from several maize breeding programs. Crop Sci. 2012;52:654–63.

    Article  Google Scholar 

  31. de Vries AG, Kerr R, Tier B, Long T, Meuwissen TH. Gametic imprinting effects on rate and composition of pig growth. Theor Appl Genet. 1994;88:1037–42.

    Article  PubMed  Google Scholar 

  32. Toro MA, Varona L. A note on mate allocation for dominance handling in genomic selection. Genet Sel Evol. 2010;42:33.

    Article  PubMed Central  PubMed  Google Scholar 

  33. Zeng J, Toosi A, Fernando RL, Dekkers JCM, Garrick DJ. Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action. Genet Sel Evol. 2013;45:11.

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank John Hickey for kindly providing the AlphaImpute software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Motohide Nishio.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MN developed the two GBLUP models that include the imprinting effect, wrote all the computer programs, and drafted the manuscript. MS conceived and set up the study, and helped with writing the manuscript. Both authors have read and approved the final manuscript.

Rights and permissions

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nishio, M., Satoh, M. Genomic best linear unbiased prediction method including imprinting effects for genomic evaluation. Genet Sel Evol 47, 32 (2015). https://doi.org/10.1186/s12711-015-0091-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12711-015-0091-y

Keywords