Abstract
Background
Singlestep methods provide a coherent and conceptually simple approach to incorporate genomic information into genetic evaluations. An issue with singlestep methods is compatibility between the markerbased relationship matrix for genotyped animals and the pedigreebased relationship matrix. Therefore, it is necessary to adjust the markerbased relationship matrix to the pedigreebased relationship matrix. Moreover, with data from routine evaluations, this adjustment should in principle be based on both observed marker genotypes and observed phenotypes, but until now this has been overlooked. In this paper, I propose a new method to address this issue by 1) adjusting the pedigreebased relationship matrix to be compatible with the markerbased relationship matrix instead of the reverse and 2) extending the singlestep genetic evaluation using a joint likelihood of observed phenotypes and observed marker genotypes. The performance of this method is then evaluated using two simulated datasets.
Results
The method derived here is a singlestep method in which the markerbased relationship matrix is constructed assuming all allele frequencies equal to 0.5 and the pedigreebased relationship matrix is constructed using the unusual assumption that animals in the base population are related and inbred with a relationship coefficient γ and an inbreeding coefficient γ / 2. Taken together, this γ parameter and a parameter that scales the markerbased relationship matrix can handle the issue of compatibility between markerbased and pedigreebased relationship matrices. The full loglikelihood function used for parameter inference contains two terms. The first term is the REMLloglikelihood for the phenotypes conditional on the observed marker genotypes, whereas the second term is the loglikelihood for the observed marker genotypes. Analyses of the two simulated datasets with this new method showed that 1) the parameters involved in adjusting markerbased and pedigreebased relationship matrices can depend on both observed phenotypes and observed marker genotypes and 2) a strong association between these two parameters exists. Finally, this method performed at least as well as a method based on adjusting the markerbased relationship matrix.
Conclusions
Using the full loglikelihood and adjusting the pedigreebased relationship matrix to be compatible with the markerbased relationship matrix provides a new and interesting approach to handle the issue of compatibility between the two matrices in singlestep genetic evaluation.
Introduction
Singlestep methods for genetic evaluation [13] have recently become popular because they provide an approach to incorporate genomic information into genetic evaluations that is both coherent and conceptually simple. A singlestep method extends the usual pedigreebased method by replacing the additive relationship matrix constructed from pedigree by an additive relationship matrix that combines the markerbased relationship matrix for genotyped animals with the pedigreebased relationship matrix.
An issue with a singlestep method is compatibility between the markerbased relationship matrix for genotyped animals and the pedigreebased relationship matrix [46]. To handle this problem, it is necessary to determine which allele frequencies should be used in the markerbased relationship matrix and to adjust this matrix to the pedigreebased relationship matrix. In theory, one should use the allele frequencies in the founder population of the pedigree (base animals) for the markerbased and pedigreebased relationships to be compatible, but these allele frequencies are rarely available in practice since base animals are not genotyped. Chen et al. [5] and Forni et al. [7] concluded that using the observed allele frequencies improved accuracy of prediction compared to using allele frequencies equal to 0.5. Studies on how to adjust the markerbased relationship matrix to be compatible with the submatrix of the pedigreebased relationship matrix for genotyped animals have been reported [46]. These adjustments consisted of scaling and adding a number to all elements in the markerbased relationship matrix based on equating means of diagonal and offdiagonal elements in the two matrices, and it was demonstrated that accuracy of prediction increased and bias decreased. In relation to this problem of compatibility between markerbased and pedigreebased relationship matrices, with data from routine evaluations, selection affects the allele frequencies over time, and in principle both observed marker genotypes and observed phenotypes contain information about allele frequencies in the base population. Therefore, studies on the adjustment of the markerbased relationship matrix to the pedigreebased relationship matrix have overlooked the fact that it should in principle incorporate information on observed phenotypes.
This work explores two possibilities to solve the problem, i.e. 1) in which the pedigreebased relationship matrix is adjusted to the markerbased relationship matrix and 2) in which the singlestep genetic evaluation is extended by using a joint likelihood of observed phenotypes and observed marker genotypes. This results in a singlestep method in which the markerbased relationship matrix is constructed assuming all allele frequencies are equal to 0.5 and the pedigreebased relationship matrix is constructed using the unusual assumption that animals in the base population are related and inbreed with a relationship coefficient γ and an inbreeding coefficient γ / 2. The loglikelihood function used for parameter inference contains two terms. The first term is the REMLloglikelihood for the phenotypes conditional on the observed marker genotypes and the second term is the loglikelihood for the observed marker genotypes. The performance of the proposed method was evaluated using two simulated datasets.
Methods
This section presents, first, the statistical model on which the singlestep methods are based along the lines of Christensen and Lund [3], then the proposal on how to adjust the pedigreebased relationship matrix and finally the adjustment of the markerbased relationship matrix as previously used.
Singlestep model
The marker genotypes are summarised into a marker matrix m, where m_{ij} = −1, 0 or 1 if SNP j of individual i is 11, 12, or 22, respectively. In the following, capital M and lowercase m indicate whether marker genotypes are considered as random variables or as nonrandom variables (observed variables or integration variables), respectively.
Let us consider the simple model
where y is the vector of phenotypes, μ is the general mean, 1 is a vector of ones, a is the vector of breeding values, Z is an incidence matrix, and e is the vector of residual errors. The breeding value may be decomposed into a = g + a_{r}, where g is the vector of genomic effects and a_{r} = (a − g) is the vector of residual polygenic effects. The residual polygenic effects , where matrix A is the pedigreebased additive relationship matrix and ω ∈[0;1] is the relative weight on the residual polygenic effect. The genomic values , with , where ρ = (ρ_{1}, … , ρ_{p}) with ρ_{j} being the allele frequency for the jth marker and with v_{j} being defined below. The model for the marker genotypes is that M is a multivariate Gaussian distribution with
where v_{j} = v_{j,j} is a parameter and for j ≠ j^{′}. The crude assumption that the multivariate distribution is Gaussian is crucial in the derivation, whereas the unrealistic assumption that for j ≠ j^{′} is made for simplicity. Dividing marker genotypes m into observed marker genotypes m^{o} and unobserved marker genotypes m^{u}, the joint marginal density of observed phenotypes y and observed marker genotypes m^{o} is
where
By rearranging terms and using that , this becomes
where f(y∣m^{o}) is defined implicitly. The first term f(y∣m^{o}) is the density for the phenotypes given the observed marker genotypes, whereas the second term f(m^{o}) is the multivariate Gaussian density for the observed marker genotypes.
Singlestep methods are based on the density f(y∣m^{o}), which may be written (see [3]) as
where the vector a has a mean zero and a variancecovariance matrix H_{ω} with inverse
with G_{ω} = (1 − ω)G(m^{o}) + ωA_{11}. In the above formula, the subdivision is made according to genotyped and nongenotyped animals. The matrices and A_{11} are the markerbased relationship matrix for the genotyped animals and the submatrix of the pedigreebased relationship matrix corresponding to genotyped animals, respectively. The sparse structure of the matrix in equation (1) is the cornerstone for efficient computing using a singlestep method. Assuming that a∣m^{o} is Gaussian distributed, mixed model equations can be solved for BLUP predictions and AIREML [8] provides REML parameter estimates.
Issues raised by the singlestep method are 1) how are the allele frequencies ρ that are used in G(m^{o}) obtained and 2) how is the required compatibility between G(m^{o}) and A_{11} that is evident in equation (1) reached. To investigate these issues, the joint density of observed phenotypes and marker genotypes, f(y, m^{o}) = f(y∣m^{o})f(m^{o}), is taken as the starting point. From this, the full marginal loglikelihood becomes
where for fixed ω, ρ and is the singlestep REMLloglikelihood for the phenotypes conditional on the observed marker genotypes, and
with n_{1} denoting the number of genotyped animals, is the loglikelihood of the observed marker genotypes. Thus, the parameter ρ enters into both terms of the full loglikelihood in equation (2), and this implies that estimation of allele frequencies ρ should be in principle based on this loglikelihood. In particular, if selection has been performed the phenotypes will contain information about the allele frequencies which may cause bias if ignored. However, estimating ρ and s = s(v) by maximising for all parameters jointly is not feasible in practice since ρ is a very highdimensional parameter. Instead, in general the observed marker genotypes are used to estimate ρ and then they are plugged into , and this loglikelihood is used to estimate the remaining parameters. Estimation of ρ based on observed marker genotypes may consist of simply using the observed allele frequencies [5,7], or may be done by maximising ℓ_{mark}(ρv) as a function of ρ (this is essentially the method of Gengler et al. [9], although, for computational reasons, that method adds a small residual error to the distribution of M). The highdimensional parameter v also enters into both terms of the full loglikelihood, although it only enters into via the scaling parameter s = s(v), and therefore estimation of v should also be in principle based on maximising the full loglikelihood. Usually, estimating is based on the observed marker genotypes, either using , implying as in [2,3], or using s such that the average diagonal of G equals the average diagonal of A_{11} as in [7]. Furthermore, it has been demonstrated that the accuracy of prediction is improved and bias is reduced by adjusting the markerbased relationship matrix as G(m^{o})β + α where α and β are based on the elements in G(m^{o}) and A_{11}; further details are given below. It should be noted that such an adjustment is based on observed marker genotypes only, and lacks a theoretical justification within the framework considered here. To summarise, the allele frequencies in G(m^{o}) and the adjustments necessary for the compatibility of G(m^{o}) and A_{11} should be in principle derived based on both observed marker genotypes and observed phenotypes.
Furthermore, when computing BLUP breeding values with pluggedin parameter estimates, the uncertainty in these parameters estimates is ignored, which is an important issue in the case of the highdimensional parameter ρ. Alternatively, uncertainty in parameter estimates may be incorporated into the predictions by using a Bayesian approach. Demonstration that a Bayesian approach with prior distributions on ρ and v results in an method in which the pedigreebased relationship matrix is adjusted to a markerbased relationship matrix, is presented below.
Adjusting the pedigreebased relationship matrix
The derivation of the proposed adjustment of the pedigreebased relationship matrix is based on assigning priors on the highdimensional parameters ρ and v in the previously described singlestep model, and then considers the first and second order moments of the marginal distribution of M(integrating ρ and v). Appendix A shows that the resulting marker distribution satisfies
with , j = 1, … , p, being a parameter, and when j ≠ j^{′}. Matrix is an additive relationship matrix that satisfies the usual recursions but with the peculiar feature that base animals in the pedigree are related and inbred. Within the population of base animals, the relationship coefficient is γ and the inbreeding coefficient is γ / 2. Table 1 contains data for a small pedigree used to derive matrix as shown in Table 2. The mean and variancecovariance structure of M shown above is of the form in [3], with scaling parameter , and hence the breeding values g have a combined relationship matrix , where the inverse is
with . This construction is a singlestep method for which the individuals in the base population are related and inbred, and the markerbased relationship matrix, , has allele frequencies equal to 0.5.
Based on the derivation above (see also Appendix A), the full marginal loglikelihood function for parameter inference becomes
where for fixed ω, γ and is the singlestep REMLloglikelihood for the phenotypes conditional on the observed marker genotypes, with markerbased relationship matrix and pedigreebased relationship matrix , and
is the loglikelihood of the observed marker genotypes. Using the loglikelihood in equation (4) instead of the loglikelihood in equation (2) makes the estimation of parameters feasible in practice by numeric maximization methods since the highdimensional parameters ρ and v are replaced by two parameters, a parameter γ that determines the relationship and inbreeding of individuals in the base population and a parameter that scales the markerbased relationship matrix.
The computations require algorithms that compute A(γ)^{−1} and A_{11}(γ) efficiently. Computations of inbreeding coefficients with related base animals have been considered by [10,11] in the context of incomplete pedigrees. In Appendix B, algorithms are presented that extend the approaches of Quaas [12] for computing A^{−1} and of Colleau [13] for computing A_{11} to the case where base animals are related and inbred.
Maximisation of the full loglikelihood in equation (4) is done by first specifying a discrete threedimensional grid of values for parameters and then, for each value of , computing the maximum values of the loglikelihood and the loglikelihood of observed marker genotypes . This provides a threedimensional profile loglikelihood , which can then be assessed to find the maximum.
For faster computing, an alternative to using the full loglikelihood is to determine parameters γ and based on observed marker genotypes only, i.e. by maximising , and to estimate the remaining parameter based on for a grid of values for ω, with estimates of γ and plugged in. Setting the derivative of with respect to equal to zero gives
which has the solution
Substituting into equation (5), we obtain
which has to be maximised numerically to estimate γ.
Adjusting the markerbased relationship matrix (Gadjust)
Alternatively, an adjustment of the form G_{a} = Gβ + α is used, where G is the markerbased relationship matrix with allele frequencies equal to the observed ones and scaling parameter , and parameters α and β are determined by fitting G_{a} to A_{11}. This adjustment was used by VanRaden [14], Christensen et al. [6] and Gao et al. [15], with the first paper suggesting that α and β should be estimated by least square estimation, i.e. by minimizing the sum of squares of Gβ + α − A_{11} and the other two papers suggesting that they should be to estimated by equating means of diagonal elements and all elements in the two matrices. Here, the later is applied and α and β are estimated by solving the two equations
for α and β, where and denote means of all elements in the two matrices, and dG and dA denote means of diagonal elements in the two matrices.
Simulated example 1
This example is deliberately simple and very extreme, and constructed for the purpose of showing that parameter estimates of γ and can depend on the observed phenotypes.
The base population consists of two individuals, one sire and one dam with 15 biallelic markers, and their genotypes are simulated assuming independence between markers and with equal allele frequencies. Furthermore, it is assumed that the markers are independently inherited, are all QTL with allele substitution effect equal to 1, and heritability of the phenotype is equal to 1.
The two base individuals produce 100 offspring (generation 1) that all have observed phenotypes. The two individuals in generation 1 with the largest own phenotype value are selected as parents and produce 100 offspring (generation 2) that are all genotyped.
Two different approaches to estimate parameters are compared. In the first approach, all parameters are estimated using the full loglikelihood in equation (4). In the second approach, γ and are estimated based only on the loglikelihood of the observed marker genotypes in equation (5), and the remaining parameters are estimated based on the REMLloglikelihood , with estimates of γand pluggedin.
Simulated example 2
This example is inspired by a pig nucleus breeding scheme and consists of five generations in which all animals have recorded phenotypes. In each generation, 150 boars are mated to 1 500 sows to produce 15 000 offspring (50/50 males/females). For the next generation, boars with a high own phenotype value are chosen and sows are selected at random. The last three generations of selected boars are genotyped and a sixth generation of 300 candidate boars are also genotyped. The breeding value is the sum of 500 independent QTL effects simulated from a Gamma(5.4,0.42) distribution, and the heritability of the phenotype is 0.22. This dataset is described in more detail in Christensen and Lund [3].
Three different approaches are compared. In the first approach, all parameters are estimated using the full loglikelihood in equation (4). In the second approach, parameters γ and are estimated based only on the loglikelihood of the observed marker genotypes in equation (5), and the remaining parameters are estimated based on the REMLloglikelihood , with estimates of γ and plugged in. Finally, the Gadjust approach is used for which α and β are estimated using equation (6) and the remaining parameters are estimated by . For all three approaches, the correlation between predicted breeding value and true breeding value a for the candidate boars is reported as well as the estimated regression coefficient (reg) for the regression of a on , where deviation from one indicates bias.
Results
Simulated example 1
Table 3 shows that both γand were smaller when estimated with the full loglikelihood than with the loglikelihood of the observed marker genotypes.
Table 3. Parameter estimates obtained with simulated dataset 1
Simulated example 2
Table 4 shows that γ and were slightly smaller when estimated with the full loglikelihood than with the loglikelihood of the observed marker genotypes. Parameter ω was about 0.375 whether estimated with the full loglikelihood or with the loglikelihood , with parameter estimates of γ and from pluggedin. When presented in three dimensions, the profile loglikelihood showed a very weak association between and ω. A contour plot of the profile loglikelihood surface for γ and is shown in Figure 1 and the profile loglikelihood function for ω is shown in Figure 2.
Figure 1. Profile loglikelihood for γ and . A contourplot of the profile loglikelihood for parameters γ and based on the full loglikelihood in equation (5); the plot is constructed with values in a discrete grid (explaining the roughness of the plot) that have been standardised such that the maximum value is zero.
Figure 2. Profile loglikelihood for ω. The profile loglikelihood function for parameter ω based on the full loglikelihood in equation (5); the plot is constructed with values in a discrete grid that have been standardised such that the maximum value is zero.
Table 4. Parameter estimates and prediction performance obtained with simulated dataset 2
Estimated parameters obtained with the Gadjust approach are also shown in Table 4, where it should be noted that .
In terms of prediction performance, the correlation between predicted and true breeding value and the regression coefficient were 0.536 and 1.17, respectively, in both cases when the pedigreebased matrix was adjusted, but 0.493 and 1.34, respectively, when the Gadjust approach was used.
Discussion
This paper provides a coherent approach to handle the issue of compatibility between pedigreebased and markerbased relationship matrices in singlestep genetic evaluation. The approach is computationally fast and feasible for large datasets, as is the case for previously developed singlestep methods. Parameter γ can adjust the pedigreebased relationship matrix to the markerbased relationship matrix that is scaled by parameter , and these two parameters should in principle be estimated using the full loglikelihood for observed phenotypes and observed marker genotypes. This is computationally feasible by computing the full loglikelihood values for parameters γ, and ω in a threedimensional grid. However, in practice this can be computationally burdensome and an appealing alternative is to estimate γ and based on the observed marker genotypes only. Analysis of simulated datasets, shows that the estimates of parameters γ and depend on the observed phenotypes as well as on the observed marker genotypes, although this dependence is not too large. A conjecture is that in a scenario with a small number of genotyped animals it is more important to use the full loglikelihood than in a scenario with a large number of genotyped animals. Based on these two simulated datasets only, no general conclusion can be made and further studies are needed to determine in which scenarios it would be safe to base inference on a twostep procedure in which γ and are estimated based on the observed marker genotypes only, and the remaining parameters are then estimated by the loglikelihood of the phenotypes conditional on the observed marker genotypes.
Using the approach developed in this paper may also provide insight on performance of other approaches for making markerbased and pedigreebased relationship matrices compatible in singlestep genetic evaluation. Examples of such approaches are the adjustments of the markerbased relationship matrix reported in [46,14,15] and also investigated here, and the approach by Meuwissen et al. [16]. In [16], first, an average of position specific identical by descent matrices based on linkage information [17] (hereafter denoted G_{FG}) were used instead of the combined relationship matrix (1), and second, the final method consisted of replacing the pedigreebased relationship matrix A by the matrix G_{FG} in matrix (1) and then adjusting the markerbased relationship matrix to the G_{FG} matrix. Based on simple simulated datasets, without selection, the method resulted in high accuracy and low bias. The computation of G_{FG} is computationally burdensome and therefore this approach was not considered here. It should be noted that for these other approaches, the combined relationship matrix is constructed based on observed marker genotypes only, and the inference on remaining parameters is based on the loglikelihood conditional on the observed marker genotypes. The approach in this paper may provide guidelines on when it is safe to base inference on such procedures in which, first the observed marker genotypes are used to construct a combined relationship matrix, and second the remaining inference is based on the loglikelihood for the phenotypes conditional on observed marker genotypes.
Evaluation of accuracy and bias of prediction with the simulated dataset 2 shows that the approaches based on adjusting the pedigreebased relationship matrix, where and reg = 1.17, seem to perform better than the Gadjust approach, where and reg = 1.34. However, is also larger in the second case (0.60) than in the first case (0.375). If ω is set at 0.375 in the second case then , which is close to the value in the first case, but reg is somewhat larger, i.e. 1.30. If ω is set at 0.1, then and reg = 1.00 in the first case, and and reg = 1.12 in the second case, and if ω is set at 0.01, then and reg = 1.05 in the second case. Thus, the first two approaches seem to perform better in terms of estimating a more proper ω and have somewhat less bias, while accuracy is similar for all three approaches.
This paper suggests that the pedigreebased relationship matrix should be adjusted instead of the markerbased relationship matrix. On the one hand, from a practical point of view this would be rather inconvenient since standard software used for REML estimation and BLUP would have to be modified, but on the other hand, it is conceptually simpler and may be easier to extend. For example, when the population consists of a mixture of breeds, it may be simpler to extend the approach in this paper and specify a parametric structure on the relationships of the animals in the base population and estimate those parameters, instead of developing an appropriate way of adjusting the markerbased relationship matrix of the genotyped animals across breeds. An additional issue is the interpretation of the genetic variance parameter . The parameter estimates in Table 1 are when , when , and for Gadjust, where base animals are unrelated, i.e. γ = 0. This may seem counterintuitive since the variance of breeding values for a base animal is . However, when studying the inverse relationship matrix , then the averages of diagonal elements are 3.807, 3.809 and 2.93 for the three cases, respectively, and since 3.807/6.507, 3.809/6.512 and 2.93/5.084 are roughly the same, the parameter estimates actually make good sense, although the interpretation of is unclear. Finally, it should be noted that parameter γ could influence to some extent accuracy and bias of prediction, and neither γ = 0 nor γ estimated to make markerbased and pedigreebased relationships compatible, would be optimal for that purpose. Adjusting the pedigreebased relationship instead of the markedbased relationship matrix to make the two matrices compatible in a singlestep method is an interesting alternative.
For the simulated dataset 1, the estimate of the relative polygenic weight ω was about 0, as was expected since in the simulation the markers were assumed to capture all the genetic variation. However, for the simulated dataset 2, was large when adjusting the pedigreebased relationship matrix, i.e. 0.375, and even larger, i.e. 0.60, when the Gadjust approach was used. For the second dataset, the prediction biases were reduced when ω was set at 0.1. This poses the question whether it is actually possible to estimate ω at a reasonable value from data, or whether ω should be determined manually to control the bias. It should be noted that confidence intervals of this parameter are large in these examples and further studies are needed.
Conclusions
Using the full loglikelihood and adjusting the pedigreebased relationship matrix to be compatible with the markerbased relationship matrix provides a new and interesting approach to handle the issue of compatibility between the two matrices in singlestep genetic evaluation.
Appendix A
The derivation of the proposed adjustment of the pedigreebased relationship matrix is based on assigning priors on the highdimensional parameters ρ and v in the previously described singlestep model. This derivation is presented here.
The model for the marker genotypes, M, given the allele frequencies ρ and variances v, is as in [3],
where v_{j} = v_{j,j} is a parameter and for j ≠ j^{′}. Prior distributions for ρ and v are not specified explicitly, only the necessary assumptions are stated. Assuming that the alleles are randomly labeled 1 and 2, then a priori the expected allele frequency is E[ρ_{j}] = 0.5. Additional assumptions are that η_{1} = Var[ρ_{j}] and η_{2} = E[v_{j}] do not depend on j, and that for j ≠ j^{′}.
With these prior distributions, a marginal distribution of M may be obtained by integrating ρ and v. Using wellknown formulas for conditional expectations and covariances, the mean and covariances become
Defining , and γ = 4η_{1} / (2η_{1} + η_{2}), we obtain
with . It is not difficult to see that satisfies the usual recursions for an additive relationship matrix. Requiring that 0 ≤ γ < 1 is equivalent to making a further assumption that 2η_{1} < η_{2} (this is for example satisfied when ρ_{j}∼U] 0;1 and [v_{j} = 2ρ_{j}(1 − ρ_{j}), where η_{1} = 1/12 and η_{2} = 1/3). The first and second order moments of M are therefore of the form considered in [3], with pedigreebased relationship matrix , allele frequencies , scaling parameter with not depending on j, and markerbased relationship matrix .
Note that the random labelling of alleles as 1 or 2 is not important because the resulting expression does not depend on how alleles are labelled.
Using , the loglikelihood for the observed marker genotypes becomes
This loglikelihood may also be viewed as the loglikelihood for being Wishart distributed .
Appendix B
The aim here is to present algorithms for computing and by recursions. Computations of inbreeding coefficients with related based animals were considered by [10,11]. For simplicity, let’s assume that, either both parents are known or both parents are unknown. This is not an important restriction in practice, since the case with one unknown parent can be handled by assigning an artificial animalid to this unknown parent and by letting both its parents be unknown.
Let us partition matrix (skipping the dependence on γ in the notation from now and onwards) according to whether the animals are base animals (both parents unknown) or not
Similar to the usual A matrix (see [18]) a decomposition exists
where T is a lower triangular matrix with entries T_{ii} = 1, T_{ik} = (T_{f(i)k} + T_{m(i)k}) / 2 when i has known parents f(i) and m(i) and i > k, and T_{ik} = 0 otherwise (k > i or i has unknown parents), and is a diagonal matrix containing the variance of the Mendelian sampling part for matrix , i.e. . Comparing this decomposition to the decomposition for the usual A matrix, then matrix has replaced an identity matrix and has replaced a diagonal matrix containing the Mendelian sampling terms related to A for the animals with known parents.
The inverse matrix becomes
Here T^{−1} is a lower triangular matrix with ones on the diagonal and the only nonzero elements being −1 / 2 for offspringparent entries, and matrix has diagonal elements equal to (1 + (n_{bas}−3/2)γ) / ((1−γ/2)(1 + (n_{bas}−1/2)γ)) and offdiagonal elements equal to − γ / ((1 − γ / 2)(1 + (n_{bas} − 1 / 2)γ)), where n_{bas} is the dimension of .
A procedure to obtain the diagonal of and the inverse can be constructed similar to the algorithm by Quaas [12], which utilises the form where
Matrix is a lower triangular matrix such that . First, matrix is computed and is set to 1 + γ / 2 for the base animals. Second, the remaining part of the algorithm is the same as in Quaas [12], where recursively, rows in L are computed using and . Obtaining at the same time is done by first setting elements in for base animals equal to the elements in , and then adding elements to in the usual way (assuming a halfstored format, add to the (i,i) element, add to the (i,f(i)) and (i,m(i)) elements, and add to the (f(i),f(i)), (m(i),m(i)) and (m(i),f(i)) elements).
Matrix can be obtained by a modification of the Colleau algorithm [13,19,20] by using an algorithm to compute , where x is a vector. The product of matrix and vector x is
Therefore, an algorithm (with notation similar to that in Appendix A in Misztal et al. [19]) consists of first computing r = T^{T}x by solving the sparse system for r, then computing
Competing interests
The author declares that he has no competing interests.
Authors’ contributions
The author has read and approved the final manuscript.
Acknowledgements
The work was performed in a project funded through the Green Development and Demonstration Programme (grant no. 3405110279) by the Danish Ministry of Food, Agriculture and Fisheries, the Pig Research Centre and Aarhus University. The author thanks two anonymous reviewers, an associate editor and editor Jack Dekkers for their comments, and in particular editor Hélène Hayes for her careful and competent language revisions.
References

Legarra A, Aguilar I, Misztal I: A relationship matrix including full pedigree and genomic information.
J Dairy Sci 2009, 92:46564663. PubMed Abstract  Publisher Full Text

Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ: Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluations of Holstein final score.
J Dairy Sci 2010, 93:743752. PubMed Abstract  Publisher Full Text

Christensen OF, Lund MS: Genomic prediction when some animals are not genotyped.
Genet Sel Evol 2010, 42:2. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Vitezica ZG, Aguilar I, Misztal I, Legarra A: Bias in genomic predictions for populations under selection.
Genet Res 2011, 93:357366. Publisher Full Text

Chen CY, Misztal I, Aguilar I, Legarra A, Muir WM: Effect of different genomic relationship matrices on accuracy and scale.
J Anim Sci 2011, 89:26732679. PubMed Abstract  Publisher Full Text

Christensen OF, Madsen P, Nielsen B, Ostersen T, Su G: Singlestep methods for genomic evaluation in pigs.
Animal 2012, 6:15651571. PubMed Abstract  Publisher Full Text

Forni S, Aguilar I, Misztal I: Different genomic relationship matrices for singlestep analysis using phenotypic, pedigree and genomic information.
Genet Sel Evol 2011, 43:1. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Gilmour AR, Thompson R, Cullis BR: Average information REML: an efficient algorithm for parameter estimation in linear mixed models.
Biometrics 1995, 51:14401450. Publisher Full Text

Gengler N, Mayeres P, Szydlowski M: A simple method to approximate gene content in large pedigree populations: application to the myostation gene in dualpurpose Belgian Blue cattle.
Animal 2007, 1:2128. PubMed Abstract  Publisher Full Text

VanRaden PM: Accounting for inbreeding and crossbreeding in genetic evaluation of large populations.
J Dairy Sci 1992, 75:31363144. Publisher Full Text

Colleau JJ, Sargolzaei M: MIM: an indirect method to assess inbreeding and coancestry in large incomplete pedigrees of selected dairy cattle.
J Anim Breed Genet 2011, 128:163173. PubMed Abstract  Publisher Full Text

Quaas RL: Computing the diagonal elements and inverse of a large numerator relationship matrix.
Biometrics 1976, 32:949953. Publisher Full Text

Colleau JJ: An indirect approach to the extensive calculation of relationship coefficients.
Genet Sel Evol 2002, 34:409421. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

VanRaden PM: Efficient methods to compute genomic predictions.
J Dairy Sci 2008, 91:44144423. PubMed Abstract  Publisher Full Text

Gao H, Christensen OF, Madsen P, Nielsen US, Zhang Y, Lund MS, Su G: Comparison on genomic predictions using GBLUP models and two singlestep blending methods with different relationship matrices in the Nordic Holstein population.
Genet Sel Evol 2012, 44:8. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Meuwissen THE, Luan T, Woolliams JA: The unified approach to the use of genomic and pedigree information in genomic evaluations revisited.
J Anim Breed Genet 2011, 128:429439. PubMed Abstract  Publisher Full Text

Fernando RL, Grossman M: Marker assisted selection using best linear unbiased prediction.
Genet Sel Evol 1989, 21:467477. BioMed Central Full Text

Mrode RA: Linear models for the prediction of animal breeding values. Wallingford: CABI Publishing; 2005.

Misztal I, Legarra A, Aguilar I: Computing procedures for genetic evaluation including phenotypic, full pedigree and genomic information.
J Dairy Sci 2009, 92:46484655. PubMed Abstract  Publisher Full Text

Aguilar I, Misztal I, Legarra A, Tsuruta S: Efficient computation of the genomic relationship matrix and other matrices used in singlestep evaluation.
J Anim Breed Genet 2011, 128:422428. PubMed Abstract  Publisher Full Text