Open Access Highly Accessed Open Badges Research

Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation

Mahdi Saatchi1, Mathew C McClure23, Stephanie D McKay2, Megan M Rolf2, JaeWoo Kim2, Jared E Decker2, Tasia M Taxis2, Richard H Chapple2, Holly R Ramey2, Sally L Northcutt4, Stewart Bauck5, Brent Woodward5, Jack CM Dekkers1, Rohan L Fernando1, Robert D Schnabel2, Dorian J Garrick16* and Jeremy F Taylor2*

Author Affiliations

1 Department of Animal Science, Iowa State University, Ames, 50011, USA

2 Division of Animal Sciences, University of Missouri, Columbia, 65211, USA

3 Bovine Functional Genomics Laboratory, ARS, USDA, Beltsville, MD 20705, USA

4 American Angus Association, 3201 Frederick Avenue, Saint Joseph, 64506, USA

5 Igenity Livestock Business Unit, Merial Limited, Duluth, 30096, USA

6 Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand

For all author emails, please log on.

Genetics Selection Evolution 2011, 43:40  doi:10.1186/1297-9686-43-40

Published: 28 November 2011



Genomic selection is a recently developed technology that is beginning to revolutionize animal breeding. The objective of this study was to estimate marker effects to derive prediction equations for direct genomic values for 16 routinely recorded traits of American Angus beef cattle and quantify corresponding accuracies of prediction.


Deregressed estimated breeding values were used as observations in a weighted analysis to derive direct genomic values for 3570 sires genotyped using the Illumina BovineSNP50 BeadChip. These bulls were clustered into five groups using K-means clustering on pedigree estimates of additive genetic relationships between animals, with the aim of increasing within-group and decreasing between-group relationships. All five combinations of four groups were used for model training, with cross-validation performed in the group not used in training. Bivariate animal models were used for each trait to estimate the genetic correlation between deregressed estimated breeding values and direct genomic values.


Accuracies of direct genomic values ranged from 0.22 to 0.69 for the studied traits, with an average of 0.44. Predictions were more accurate when animals within the validation group were more closely related to animals in the training set. When training and validation sets were formed by random allocation, the accuracies of direct genomic values ranged from 0.38 to 0.85, with an average of 0.65, reflecting the greater relationship between animals in training and validation. The accuracies of direct genomic values obtained from training on older animals and validating in younger animals were intermediate to the accuracies obtained from K-means clustering and random clustering for most traits. The genetic correlation between deregressed estimated breeding values and direct genomic values ranged from 0.15 to 0.80 for the traits studied.


These results suggest that genomic estimates of genetic merit can be produced in beef cattle at a young age but the recurrent inclusion of genotyped sires in retraining analyses will be necessary to routinely produce for the industry the direct genomic values with the highest accuracy.