Statistical distributions of test statistics used for quantitative trait association mapping in structured populations
1 INRA, UR 631 Station d’Amélioration Génétique des Animaux, Castanet-Tolosan, F-31326, France
2 INRA, UMR 1313 Génétique Animale et Biologie Intégrative, Jouy-en-Josas F-78352, France
Genetics Selection Evolution 2012, 44:32 doi:10.1186/1297-9686-44-32Published: 12 November 2012
Spurious associations between single nucleotide polymorphisms and phenotypes are a major issue in genome-wide association studies and have led to underestimation of type 1 error rate and overestimation of the number of quantitative trait loci found. Many authors have investigated the influence of population structure on the robustness of methods by simulation. This paper is aimed at developing further the algebraic formalization of power and type 1 error rate for some of the classical statistical methods used: simple regression, two approximate methods of mixed models involving the effect of a single nucleotide polymorphism (SNP) and a random polygenic effect (GRAMMAR and FASTA) and the transmission/disequilibrium test for quantitative traits and nuclear families. Analytical formulae were derived using matrix algebra for the first and second moments of the statistical tests, assuming a true mixed model with a polygenic effect and SNP effects.
The expectation and variance of the test statistics and their marginal expectations and variances according to the distribution of genotypes and estimators of variance components are given as a function of the relationship matrix and of the heritability of the polygenic effect. These formulae were used to compute type 1 error rate and power for any kind of relationship matrix between phenotyped and genotyped individuals for any level of heritability. For the regression method, type 1 error rate increased with the variability of relationships and with heritability, but decreased with the GRAMMAR method and was not affected with the FASTA and quantitative transmission/disequilibrium test methods.
The formulae can be easily used to provide the correct threshold of type 1 error rate and to calculate the power when designing experiments or data collection protocols. The results concerning the efficacy of each method agree with simulation results in the literature but were generalized in this work. The power of the GRAMMAR method was equal to the power of the FASTA method at the same type 1 error rate. The power of the quantitative transmission/disequilibrium test was low. In conclusion, the FASTA method, which is very close to the full mixed model, is recommended in association mapping studies.