The genetic diversity and structure of horses raised in France were investigated using 11 microsatellite markers and 1679 animals belonging to 34 breeds. Between-breed differences explained about ten per cent of the total genetic diversity (Fst = 0.099). Values of expected heterozygosity ranged from 0.43 to 0.79 depending on the breed. According to genetic relationships, multivariate and structure analyses, breeds could be classified into four genetic differentiated groups: warm-blooded, draught, Nordic and pony breeds. Using complementary maximisation of diversity and aggregate diversity approaches, we conclude that particular efforts should be made to conserve five local breeds, namely the Boulonnais, Landais, Merens, Poitevin and Pottok breeds.
During the twentieth century, horse breeding has undergone large changes in Europe. Previously considered as an agricultural, industrial and war tool, horse is now essentially bred for hobby riding. Draught horses, in particular, have been less and less used as utility horses, and many draught breeds have undergone a dramatic decrease in population size: according to the Haras Nationaux, out of the nine French draught breeds, six have annual births below 1000. Measures for in situ conservation have been applied in France for several years but such measures are in general expensive. Therefore, it would be useful to identify priorities among conservation purposes and this requires characterising diversity and genetic relations between breeds .
During the last fifteen years, microsatellite markers have frequently been used to evaluate genetic distances and to characterise local breeds, [2-10]. Some methods have recently been developed to evaluate the genetic contribution of populations to within-breed and between-breed diversities [11,12].
With about 800 000 animals belonging to 50 different breeds (source: Haras Nationaux), France shows a large diversity of horse populations. Among these breeds, 21 have a French origin or have been bred in France for at least a century. According to the FAO, at least 15 populations have disappeared during the last 50 years, and eight indigenous breeds are still considered as endangered or endangered-maintained. Among those breeds, the majority are draught breeds, namely the Ardennais, Auxois, Boulonnais, Poitevin and Trait du Nord breeds, the other ones being the Merens warm-blooded breed and the Landais and Pottock pony breeds. Information on the genetic diversity of French endangered breeds could help breeders and providers, decide where they should place more emphasis.
In the present study, we first analysed the genetic diversity of 39 horse populations reared in France: within-breed diversity, breed relationship and population structure were investigated, using microsatellite data. Then, we focussed on 19 breeds of French origin or having been raised in France for at least a century, and evaluated the conservation priorities between these populations, using different approaches to evaluate within, between and total diversity.
Populations sampled and microsatellite analysis
French nomenclature divides horse breeds into three groups: warm-blooded, draught horses and ponies. In this study, 39 populations were considered (Table 1). These 39 populations comprised 31 recognised breeds (including 13 warm-blooded breeds, nine draught breeds, and nine pony breeds), the primitive Przewalski horse (used as an outgroup), and seven populations originating from the splitting of two recognised breeds, namely the Anglo-Arab (AA) and Selle Français (SF) breeds (divided into four and three groups, respectively). The 2005 studbook rules define those groups according to the proportion of foreign genes that can be found from genealogical analysis: AA6 and AA9 are considered as pure AA, whereas AA5 and AA10 can have ancestors from another origin, the proportion of Arab origin being higher for AA5 and AA6 than the others. SF8 has a large proportion of PS origin and can therefore be used to produce AA, SFA97 constitutes a group closed to direct foreign influences, whereas SFB98 individuals can have a parent from another breed (under some conditions).
Table 1. Basic information on the 39 populations studied
For each of the 39 populations, 23 to 50 animals born between 1996 and 2005, were sampled amounting to 1679 animals. Except for the Przewalski horse, where no pedigree data was available, the sampled animals were known to have no common parents. For the conservation approach, the study focussed on 19 populations, either of French origin, or having been bred in France for at least 100 years (PS, AA and AR breeds). In this approach, 50 animals were randomly sampled among the four and three AA and SF subpopulations, respectively, to constitute two populations.
Eleven microsatellite markers were used to perform the analysis (AHT4, AHT5, ASB2, HMS1, HMS3, HMS6, HMS7, HTG4, HTG6, HTG10, VHL20), with all but two (HMS1 and HTG6) being recommended by the International Society of Animal Genetics for parentage testing and used in similar studies (except HMS1) [7,9,10]. For the entire sample, amplifications and analyses were performed by the same laboratory, using a capillary sequencer (ABI PRISM 3100 Genetic Analyzer, Applied Biosystems).
Allele frequencies, mean number of alleles (MNA), observed (Ho) and non-biased expected heterozygosity (He), were calculated using GENETIX . Wright Fis, Fit and Fst coefficients were also computed using the same software. GENEPOP  was used to evaluate pairwise genetic differentiation between breeds  and departure from Hardy-Weinberg equilibrium, using exact tests and sequential Bonferonni correction  on loci. Global tests on Hardy-Weinberg equilibrium were also performed using GENEPOP. Allelic richness was computed using FSTAT .
The matrix of Reynolds unweighted distances DR  was computed using POPULATION (Olivier Langella; http://bioinformatics.org/~tryphon/populations/ webcite). Regarding the DR distance, a NeighborNet tree was drawn using SPLITSTREE 4.8 . A factorial correspondence analysis (without the Przewalsky horse) was also performed using GENETIX. Finally, the genetic structure of the populations was assessed using Bayesian clustering methods developed by Pritchard (STRUCTURE, ): using a model with admixture and correlated allele frequencies, we made 20 independent runs for each value of the putative number of sub-populations (K) between 1 and 22, with a burn-in period of 20 000 followed by 100 000 MCMC repetitions. Pairwise similarities (G) between runs were computed using CLUMPP .
To evaluate the conservation priorities in a set of populations, taking into account contributions to within-population and between-population genetic diversity, Ollivier and Foulley  have proposed the following method. First, the between-breed contribution (CB) is evaluated, based on the Weitzman  loss Vk of diversity when the population k is removed from the whole set of breeds (in this study we used DR distance). Then, the within-breed contribution (CW) is defined as:
where H(S) is the average internal heterozygosity of the whole set S and H(S/k) the average internal heterozygosity of the set when k is removed. Finally, the aggregate diversity D of a population is defined as:
The cryopreservation potential (CP) could be computed as the product between the breed contribution (CB) and the probability of extinction (Pex) of the breed, assumed to be directly proportional to the inbreeding rate (ΔF). Following Simianer et al. , Pex can be approximated as
where Ne is the effective population size, M and F are the numbers of breeding males and females, respectively, used inside the breed in 2005, and c is a constant, to be chosen. Considering that the effective population size of a breed should not be lower than 50 to avoid extinction in the short term , we considered that Pex = 1 for Ne = 50. Therefore, c was set to 100 (see equation 3).
Caballero and Toro  have developed a parallel approach. The total diversity GDT can be considered as the exact sum of the gene diversity within population GDWS and the gene diversity between populations GDBS considering the following equations:
where n is the number of populations, fij is the average coancestry between populations I and j, and Dij is the Nei minimum distance between populations I and j. The contribution of a population to the diversity is evaluated by computing the loss or gain of diversity ΔGD when the population is removed.
The authors have also proposed to evaluate the contributions (ci) of each population, which can maximise the total diversity at the next generation, using the following equation:
The contributions can be computed by maximising GDTN in equation (7), with the following restrictions: for each population i, ci ≥ 0 and Σi ci = 1.
One hundred and nine alleles were found over all populations and all markers. The average number of alleles per locus was 9.8 ranging from seven (locus HTG4 and HMS1) to 15 (locus ASB2). Some rare alleles in the whole data set were found with a high frequency in the PRW population: for instance, with the HTG6 loci, the two most frequent alleles in the PRW population (70%) were seldom found in other breeds (less than 1%). Heterozygosities, mean number of alleles (MNA) and allelic richness (AR) are presented in Table 2. MNA and AR were highly correlated, (r = 0.98, P < 0.0001). He ranged from 0.43 in the FRI breed to 0.79 in the PFS breed, while Fis per breed ranged from -0.08 (TDN breed) to 0.11 (PRE breed).
Table 2. Values for parameters of polymorphism within the 39 populations studied
Some significant heterozygote deficits after corrections were found, for different loci and populations (see Table 2). Only one test exhibited significant excess (AA5 with HMS1). Using global tests, five populations (AB, AR, AUX, CAM, PRE) and two markers (HMS3 and HTG10) showed significant deficit in heterozygotes (P < 0.01). Other studies have shown similar results for these two markers .
Testing population differentiation, 11 pairs of populations were found non significantly differentiated out of the 741 tests performed: AA5 with AA6, AA9 with AA10, SF8 and PS, PS and SF8, AA10 with SF8 and PS, AB with BA, APPAL with QH, AUX with TDN, SFA97 with SFB98.
The Fis, Fit, and Fst values were 0.019, 0.116 and 0.099, respectively. We found a gene differentiation coefficient GST  of 0.0989.
Breed relationships and clustering
The NeighborNet network (Figure 1) clearly separated draught horses (also including MER, HAF breeds) and warm-blooded horses, whereas most pony breeds were placed between these two groups. Nordic (IS, SHE, FJ) breeds formed a separate group. FRI and PRW populations were isolated from the other breeds, the closest groups being draught horses and Nordic breeds, for the FRI breed and PRW population, respectively.
Figure 1. Neighbour-Net for the 39 horse populations, based on Reynolds DR distance.
In Figure 2, the 38 populations (PRW being excluded) were placed according to the two main axes of the correspondence analysis (accounting for 27.4% and 11.5% of the inertia, respectively). Axis 1 clearly differentiates warm-blooded horses, ponies and draught horses, whereas axis 2 separates Nordic horses (IS, SHE, FJ) from the other ones. The FRI breed seems to be isolated from the other populations, the closest populations being the draught breeds.
Figure 2. Correspondence analysis of allele frequencies for 38 of the populations studied (PRW is not included). The projection is shown on the first two axes.
Neighbornet and FCA approaches were also used on 34 and 33 breeds, respectively (the four samples of AA breed and three samples of the SF breeds being aggregated into two samples of 50 animals each), showing similar results to previous figures (see Additional files 1 and 2).
Format: TIFF Size: 13MB Download file
Format: TIFF Size: 19.6MB Download file
Breed assignment to clusters provides complementary information on genetic relationships between populations. As K increases from 2 to 7, mean similarity coefficients among runs are respectively equal to 0.997, 0.993, 0.993, 0.773, 0.562, and 0.658, respectively. Likelihood increased until K reached 15–18 values (see additional file 3), indicating that the most significant subdivisions were obtained for such values. Since mean similarity coefficients were slightly lower for K = 16 (0.78) or 17 (0.81) than for K = 15 (0.83), the results are shown for this last value. Figure 3 shows the assignment of populations to clusters for each K, using runs having the highest pair-wise similarity coefficients.
Format: TIFF Size: 540KB Download file
Figure 3. Cluster assignment of each of the 39 populations to the K cluster. Among 20 runs, solutions having the most similar pair-wise similarity coefficients are presented here. Breeds not classified in their group according to French nomenclature are in italic.
For K = 2, there was a clear separation between draught and warm-blooded horses, with other populations showing intermediate results. When K reached 3, Nordic/primitive breeds, ponies, and some warm-blooded horses segregated more or less clearly from the two other clusters. As K increases to 4 and 5, the five clusters were constituted of Nordic/primitive breeds, draught horses, ponies, warm-blooded populations close to the AR breed and warm-blooded populations close to the PS breed. Some breeds were shared among the last three clusters, such as LAND between ponies and AR groups, and APPAL among the three clusters. When K reached 6, depending on the runs, FRI or PRW populations were alternately isolated, which led to a decrease of similarity across runs and explains the low similarity coefficient (0.562) in comparison with other K. When K = 7, these two populations were isolated. The different runs highlight some differences among sub-populations of AA and SF breeds, underlining a more important proportion of AR genes in AA6, AA5 and respectively SFA97 and SF98 groups. Some warm-blooded (FRI until K = 6, MER) and pony breeds (HAF) were classified with draught horses, while the CAM warm-blooded breed was clustered with ponies. As K reached 15, most breeds were shared among different clusters. The ARD, AUX and TDN breed constituted a single cluster while FJ/IS and LUS/PRE constituted two others. In a few cases, a single cluster was essentially associated to a single breed (BOUL, FRI, SHE, PRW).
Partition of diversity
In the set of the 19 French breeds, we found a gene diversity within population GDWS of 0.685, a gene diversity between populations GDBS of 0.073, and a total gene diversity GDT of 0.758. Table 3 shows between-breed, within-breed, and total contribution/variation of diversity according to Ollivier and Foulley  and Caballero and Toro  approaches. For within-breed diversity, CW and ΔGDWS ranged from -0.48 to 0.50 and from -0.0055 to 0.0069 respectively. In both cases, the POIT breed showed a particularly low within-breed diversity. CW and ΔGDWS were negatively correlated (r = -0.715, P = 0.001). For between-breed diversity, CB and ΔGDBS ranged from 0.85 to 12.60 and from -0.0041 to 0.0024, respectively. Here, the POIT breed showed a particularly high contribution to the between-breed diversity. The correlation between CB and ΔGDBS was not significant. D and ΔGDT, accounting for total diversity, were negatively correlated (r = -0.53, P < 0.019). They ranged from -0.32 to 1.25 and from -0.0042 to 0.0039, respectively. In both cases, the ARD and PS breeds showed a particularly low and high diversity, respectively.
Table 3. Contributions of the different breeds to genetic diversity according to different approaches
Considering contributions to the between-breed diversity and probabilities of extinction, the BOUL, LAND and POIT breeds showed the highest cryopreservation potentials (2.95, 2.95 and 4.83, respectively).
Contributions of each population for an optimal GDT are given in Table 3: the composite PFS breed should contribute to 70% of the pool, for a total GDT of 0.79. Besides, to maximise the total gene diversity, seven of the 19 breeds should be maintained, namely the BOUL, COBND, LAND, PFS, POT, PS and SF breeds.
Gene diversity and genetic relations among breeds
Differences between breeds explained 10% of the total genetic variation, which is quite similar to other analyses, where values ranged from 8% to 15% [2-4,9]. According to previous studies using microsatellites, expected heterozygosities ranged from 0.47 for the FRI breed  to 0.80 for the Sicilian Indigenous breed . In our study, only one result was found outside this range of values: 0.43 for the FRI breed, i.e. close to the value found by Luis et al. . Plante et al.  recently analysed 22 Canadian and Spanish populations. Our estimated values of He were slightly lower (0.71 on average vs. 0.75, P = 0.048) for the eight breeds shared between their study and the present one. Differences on the within-breed diversity among studies using microsatellites can be explained, on the one hand, by the loci used and, on the other hand, by the populations analysed, incidentally belonging to similar breeds but having different recent histories. In the AR breed, we found a He value of (0.72) with a significant deficit of heterozygotes, which can be explained by the fact that this is an international breed in which mating between close relatives is common . Plante et al.  and Luis et al.  have found similar results for the same breed, but not Aberle et al.  who observed a lower heterozygosity (0.57) without a heterozygote deficit. The PER population seemed to have a particularly high genetic diversity in the Plante study (He = 0.78), in comparison with the French PER population (He = 0.68). Because PER populations have been bred in America since the end of the 19th century, such results should be interpreted bearing in mind that the French PER population has probably suffered from recent bottlenecks due to several modifications of the selection aims.
The three approaches based on genetic relationships (genetic distances, FCA and clustering methods) gave similar results. The populations considered in the present study can be classified into four more or less differentiated clusters: warm-blooded, draught, Nordic and pony breeds. Similar patterns of clustering have been found in other studies [2,3,9,10]. The draught horses constitute a quite homogenous group, including the nine French draught horse breeds and three breeds presently classified as pony (HAF) or warm-blooded (MER and FRI in a lesser extent) breeds. These three breeds were historically used as draught horse breeds and could therefore have been subject to crossbreeding with other draught horse populations in their past history. Pony breeds formed a group in an intermediate position in comparison to the other clusters. It also included the CAM breed, today recognised as a warm-blooded breed, but morphologically considered as a pony . According to our analysis, FRI and PRW populations were found to be genetically isolated, which can be, to some extent, linked to a low genetic variability  due to historical bottlenecks within these breeds [2,29]. Moreover, another parameter explaining isolation of the PRW breed is the presence of rare alleles, which was in agreement with other studies  and expected for a population considered as a primitive wild horse.
Population differentiation tests and Bayesian approaches indicate clear differences between sub-populations of AA and SF. Such results may be largely explained by differences in the proportion of thoroughbred (PS) origins in the gene pool of these sub-populations. Within the AA breed, AA5 and AA6 populations appeared distinct from AA9 and AA10 populations and close to the PS breed. This was in agreement with the studbook rules: on the basis of pedigree data, AA5, AA6, AA9 and AA10 populations were indeed found to have respectively 94%, 89%, 44% and 59% of genes from PS origin (Sophie Danvy, personal communication). Within the SF breed, the SF8 (not differentiated from the PS breed) was distinct from SFA97 and SFB98 populations. This result was in agreement with previous results from pedigree data : the SF8 was found to have 98% of genes from PS origin. The three draught breeds ARD, AUX and TDN, were found to be quite similar, which is linked to a common historical and geographical origin (north of France) . Iberic breeds (LUS and PRE) were also found to be genetically quite close. These results and the fact that according to Bayesian approaches, the likelihood became stable before K reached the number of breeds, indicate that the most relevant division is situated at a level superior to that of the breeds . Such a subdivision of the whole set can be explained by the existing crossbreeding management system in several horse populations.
In the present study, an almost comprehensive sampling of French breeds was achieved. The different approaches used gave an estimation of the contribution of each breed to the whole French horse stock. Petit  has proposed allelic richness as a good parameter to evaluate the genetic diversity of a population, useful as an indicator of past bottlenecks . In our study, the POIT breed was found to have the lowest allelic richness and also one of the lowest within-breed contributions to diversity according to the two other methods used in the study. Because of the strong correlation with the mean number of alleles, the concept of allelic richness interest seemed to be of limited value in our study.
The results given by the aggregate diversity and gene diversity approaches were slightly correlated. By definition, breeds with low contributions to aggregate and total diversities should have related breeds in the data set. Thus, ARD, TDN, and AUX breeds, which were genetically highly related, illustrate quite well such a hypothesis.
According to the approaches of Ollivier and Foulley  and Cabalero and Toro , populations that contributed a lot to the total diversity were mostly non-endangered breeds (AR, PS, SF, TF). There were, however, some differences between the two methods when considering the eight breeds classified as endangered or endangered/maintained by the FAO (ARD, AUX, BOUL, LAND, MER, POIT, POT, TDN). Using the approach of Ollivier and Foulley , contributions to aggregate diversity D of BOUL, MER and POIT breeds were quite high, and taking into account population size, CP was the highest for BOUL, LAND and POIT breeds. Using the approach of Caballero and Toro , GDT decreased only when LAND and POT breeds were removed, and those two breeds plus the BOUL breed should have been kept to optimise GDT. The differences can be explained by the methods used in the two approaches, particularly considering the evaluation of the contributions to between-diversity. Using the approach of Caballero and Toro , some Weitzman criteria, such as the twin property , were not applied: for instance, assuming that two populations are genetically identical but very different from the whole set, removing one of them will largely decrease GDBS, which will not be the case when using the Weitzman approach. However, one advantage of the approach of Caballero and Toro  is the fact that there is no need to give weight to within- and between-diversities to compute total diversity, since by definition GDT is the sum of GDWS and GDBS. In fact, our results outline that both approaches should be considered as complementary to identify which breeds have to be taken into account in a context of genetic resource management. Therefore, conservation priorities should concern particularly BOUL, LAND, MER, POIT and POT breeds.
Another advantage of the method of Caballero and Toro  is the possibility of computing the contribution of each population to optimise total diversity. Such an approach was designed to conserve a large diversity of alleles. Therefore, it is not surprising to notice that the three breeds (PFS, SF, BOUL) that should have the highest contribution to optimise genetic diversity represent the three identified genetic differentiated groups. The importance of the PFS breed is due to the fact that this synthetic pony breed has the largest number of alleles. SF, another composite breed, has a smaller variability but carries alleles representative of the warm-blooded breed group, while the BOUL breed carries alleles seldom present in the two other breeds but frequent in draught horses.
Finally, several considerations have to be taken into account before taking final conservation decisions , such as the special range of performances for given traits, current production systems associated to the breed, socio-cultural value, or dynamics of the group of breeders. Between 1998 and 2003, births remained more or less stable for BOUL, LAND, POIT and POT breeds, but decreased for the MER breed . In the endangered breeds, specific uses should be supported to maintain a demand for such horses (production of mules for the POIT breed, ecotourism for local breeds, draught activities, meat production). Genetic variability should also be managed, especially since some of these breeds constitute a pool of original genes (BOUL, MER and POIT) (see Figure 3). For instance, sires with different origins should be used . When populations of the same breed are raised in other countries (such as the POT breed in Spain ), regular exchanges should be organised between both countries to maintain a relatively large variety of reproducers.
Based on this study, horse breeds raised in France can be clustered into four groups. These groups were found to be meaningful according to the use of breeds, morphological characteristics and/or geographical origins. The combined use of different methods allowed us to identify breeds for which conservation efforts should be a priority, in order to preserve the maximum genetic variability. Since several horse studies have used similar panels of markers [7,9,10], it would be interesting to merge the corresponding data.
The authors declare that they have no competing interests.
JCM carried out the genotyping. AR contributed to the description of the populations and carried out the sampling collection. LC performed the preliminary analysis. GL carried out the computational analysis and prepared the manuscript. XR participated in the computational analysis and preparation of the manuscript. CDB participated in the preparation and the revision of the manuscript. EV participated in the design of the study and the revision of the manuscript. All authors read and approved the final manuscript.
The authors thank the Haras Nationaux for the data provided and Wendy Brand-Williams for linguistic revision.
J Anim Breed Genet 2000, 117:307-317. Publisher Full Text
Marletta D, Tupac-Yupanqui I, Bordonaro S, Garcia D, Guastella AM, Criscione A, Cañon J, Dunner S: Analysis of genetic diversity and the determination of relationships among western Mediterranean horse breeds using microsatellite markers.
Solis A, Jugo BM, Mériaux JC, Iriondo M, Mazon LI, Aguirre AI, Vicario A, Estomba A: Genetic diversity within and among four south European native horse breeds based on microsatellite DNA analysis: implications for conservation.
Conserv Genet 2002, 3:289-299. Publisher Full Text
Livest Prod Sci 2005, 95:247-254. Publisher Full Text
Evolution 1989, 43:223-225. Publisher Full Text
Université de Lausanne, Lausanne, Suisse; 2001.
University of Chicago; 2003.
Quart J Econom 1992, 107:363-405. Publisher Full Text
Ecol Econ 2003, 45:377-392. Publisher Full Text
Genet Sel Evol 1996, 28:83-102. Publisher Full Text
Ducro BJ, Bovenhuis H, Neuteboom M, Hellinga I: Genetic diversity in the Dutch Friesian horse. In Proceedings of the 8th World Congress on Genetics Applied to Livestock Production: 13–18 August 2006. Belo Horizonte, Brazil; 2006.
Dubois C, Ricard A: Etat des lieux de la sélection Selle-Français. Quelle marge de manœuvre possible pour d'autres plans de sélection? In 31ème journée de la recherche équine: 2 March 2005. Les Haras Nationaux, Le Pin-au-Haras; 2005:173-185.
Conserv Biol 1998, 12:844-855. Publisher Full Text
Livest Sci 2006, 101:150-158. Publisher Full Text
Conserv Biol 2003, 17:1299-1311. Publisher Full Text
Verrier E, Loywick V, Donvez J, Blouin C, Joffrin C, Heyman G, Cottrant JF: La gestion génétique des races d'effectifs limités: principes et application aux cas du cheval de trait Boulonnais et de l'âne grand noir du Berry. In 31ème journée de la recherche équine: 2 March 2005. Les Haras Nationaux, Le Pin-au-Haras; 2005:161-171.