Analysis of genetic diversity in Brown Swiss, Jersey and Holstein populations using genome-wide single nucleotide polymorphism markers

Background Studies of genetic diversity are essential in understanding the extent of differentiation between breeds, and in designing successful diversity conservation strategies. The objective of this study was to evaluate the level of genetic diversity within and between North American Brown Swiss (BS, n = 900), Jersey (JE, n = 2,922) and Holstein (HO, n = 3,535) cattle, using genotyped bulls. GENEPOP and FSTAT software were used to evaluate the level of genetic diversity within each breed and between each pair of the three breeds based on genome-wide SNP markers (n = 50,972). Results Hardy-Weinberg equilibrium (HWE) exact test within breeds showed a significant deviation from equilibrium within each population (P < 0.01), which could be a result of selection, genetic drift and inbreeding within each breed. Hardy-Weinberg test also confirmed significant heterozygote deficit in each breed over several loci. Moreover, results from population differentiation tests showed that the majority of loci have alleles or genotypes drawn from different distributions in each breed. Average gene diversity, expressed in terms of observed heterozygosity, over all loci in BS, JE and HO was 0.27, 0.26 and 0.31, respectively. The proportion of genetic diversity due to allele frequency differences among breeds (Fst) indicated that the combination of BS and HO in an ideally amalgamated population had higher genetic diversity than the other pairs of breeds. Conclusion Results suggest that the three bull populations have substantially different gene pools. BS and HO show the largest gene differentiation and jointly the highest total expected gene diversity compared to when JE is considered. If the loss of genetic diversity within breeds worsens in the future, the use of crossbreeding might be an option to recover genetic diversity, especially for the breeds with small population size.


Background
The importance of genetic diversity in livestock is directly related to the need for genetic improvement of economically important traits as well as to facilitate rapid adaptation to potential changes in breeding goals [1]. Estimates of effective population size in commercial dairy populations, including Brown Swiss, Holstein and Jersey are decreasing at alarming rates to be of serious concern to the livestock industry [2]. Recently pedigreebased studies revealed increasing rates of inbreeding and coancestry in Canadian Jersey and Holstein populations [3]. Studies of genetic diversity are useful to the understanding of evolution of breeds, gene pool development and the level of differentiation among breeds [1,4,5]. Such studies are quite important for prioritizing conservation of breeds with critically low levels of diversity.
Hardy-Weinberg Equilibrium (HWE) states that in a large random mating population with no selection, mutation, or migration, the allele frequencies and the genotype frequencies are constant from generation to generation, and, hence, a simple relationship between the allele frequencies and the genotype frequencies exists [6]. The theory of HWE has played an important role in the development of population genetics, and has frequently been used as a basis for genetic inferences [7].
Tests for departures from Hardy-Weinberg proportions are often used to check on random mating in populations, and the deviations from the expected frequency of homozygotes are used to estimate inbreeding coefficients [8]. The same approach was used for estimating the inbreeding coefficient of a population by calculating the excess of homozygotes with respect to Hardy-Weinberg equilibrium expectations [9]. The role that the variance due to differences in gene frequencies among subpopulations play in the total genotypic frequencies from amalgamating subpopulations has been demonstrated by several studies [10,11]. Fixation indices (F is , F it and F st ) are the most widely used parameters for studying the genetic differentiation of populations. These indices have been originally defined in terms of the correlations of two uniting gametes [12][13][14]. Accordingly, F it is the correlation between uniting gametes that generate an individual relative to the gametes of the total population. F is is the average over all subpopulations of the correlation between uniting gametes that generate an individual relative to the gametes of their own subpopulation. F st is the correlation between random gametes within subpopulations, relative to gametes of the total population. For example, in this study, an ideally amalgamated population of Brown Swiss and Jersey bulls would have each breed as a subpopulation. Furthermore, the relationship between fixation indices and measures of identity by decent have been illustrated in previous studies [15,16].
Fixation indices can also be formulated entirely in terms of the allelic and genotypic frequencies in the population [11,17,18]. In this case the fixation indices can be expressed in terms of ratios of heterozygosities. The F st is equal to 0 when the same allele is fixed in all populations [11]. Allelic and genotypic frequencies may fluctuate because of finite subpopulation sizes or random variation in evolutionary forces [6]. In view of different factors affecting probabilities of gene identity in subdivided populations, the fixation indices were redefined in terms of the observed and expected heterozygosity based on allelic and genotypic frequencies in a population [17]. In addition, measures of inter-population gene differences and coefficients of gene differentiation (D st and G st , respectively) have been extensively used to describe the level of genetic diversity [11,17].
The objective of this study was to assess the status of genetic diversity within and between BS, JE and HO breeds, using bulls genotyped with a dense SNP marker map through detailed analyses carried out via GENEPOP and FSTAT software.

Methods
Genome-wide SNP data for the three breeds were received from the Animal Improvement Programs Laboratory, USDA (Beltsville, MD, USA) in November 2009. The data consisted of 900, 2,922 and 3,535 Brown Swiss (BS), Jersey (JE) and Holstein (HO) bulls, respectively, all genotyped with the Illumina BovineSNP50K BeadChip (Illumina Inc., San Diego, CA) as part of the North American collaboration in genomic prediction in dairy cattle [19]. Genotypes for a total of 50,972 SNPs were available for the analyses, which included all the SNPs with useable calls, without any exclusion due to minor allele frequency or correlation between SNPs. The bulls included in the analyses represented a sample of most BS and JE proven/sampled bulls in North America and a large sample of proven HO bulls in North America.

Genetic diversity analysis
Estimates of genetic diversity and statistical analyses were performed using the software GENEPOP, version 4.0 [20]. The exact tests for deviations from HWE [9] were also performed using the GENEPOP package. GENEPOP uses a Markov Chain (MC) algorithm (dememorization = 10,000, batches = 100, and iterations per batch = 5,000) to estimate the P-value of the exact HWE tests [20]. Significance levels were calculated per locus, per breed, and over all loci and pairs of breeds combined. Genetic diversity within breeds was also measured as the frequency of private alleles (PA, breed-specific alleles), the observed heterozygosity (H o ), and the expected heterozygosity (H e ) under HWE. The significance of breed differences was tested using the exact test of population differentiation in GENEPOP software based on allele frequencies.
Genetic differentiation between breeds was also estimated using the F st coefficient proposed by Wright [18] and computed by GENEPOP.
The software FSTAT [21] was used to compute F-statistic [12], and to test them using randomisation methods. The F st was estimated by a "weighted" analysis of variance [21]. The most common computational formula for F st is: Where: δ p 2 the sample variance of allele frequencies over populations [11]. F st can therefore be described as the amount of allele frequency variance in a sample relative to the maximum possible variance. F st can also be defined as follows [14]: Where: F it is the correlation between uniting gametes that generate an individual, relative to the gametes of the total population; F is is the average, over all subpopulations, of the correlation between uniting gametes that generate an individual relative to those of their own subpopulation.
The amount of heterozygosis (Y t ) in the total population was also defined regardless of structure of the population, in terms of total population gene frequency (q t ) [14]: Indirect estimates of gene flow were implemented in FSTAT [21] according to the method demonstrated by [22]. The effective number of migrants (N m ) was estimated, assuming the n-island model of population structure, on the basis of the relationship: Furthermore, FSTAT was used to calculate inter-population gene differences and coefficients of gene differentiation that are either dependent (D st and G st ) or independent (D st ' and G st ') of the number of subpopulations [11]. D st is the average gene diversity between subpopulations. The gene diversity in the total population is equivalent to the sum of gene diversities within each subpopulation. Coefficient of gene differentiation (G st ) was computed as the ratio of D st to the total population diversity.

Results
The exact test for Hardy-Weinberg Equilibrium (HWE) within breeds showed a significant deviation in each breed (P < 0.01). Moreover, results of the exact test for HWE showed lower observed heterozygosity (H o ) than expected heterozygosity (H e ) in each breed ( Figure 1). The Holstein bull population showed the highest average marker diversity between individuals within breeds in terms of H e compared to BS and JE breeds (0.31, 0.27, and 0.26, respectively). Jersey showed higher percentage of loci with fixed alleles followed by BS and HO ( Table 1). The HWE test has also confirmed significant heterozygote deficit (≥90%) in each population over several loci.
Average gene diversity over all loci, per chromosome, in BS, JE and HO, expressed in terms of H o , are shown in Figure 2. Holsteins showed consistently higher H o than JE and BS across all chromosomes. BS and JE had similar overall H o , however, depending on the chromosome, one or another of the two breeds had higher H o . Higher H o for HO than BS and JE and similar overall H o for BS and JE is consistent with the effective population sizes of this three breeds, which is higher for HO and lower and similar for BS and JE [3] Moreover, average heterozygosity of Holsteins showed a declining trend over the last four generations considering the generation interval of 5 years ( Figure 3). Accordingly, H o in HO has reduced from 0.361, when 4 generations were traced back in the pedigree, down to 0.3534, when one generation was traced back.
Population genetic differentiation of BS, JE and HO, as measured by F st (Figure 4) showed that the breeds are genetically differentiated at each chromosome. For example, the average measure of F st in an ideally amalgamated population of BS, JE and HO on Chromosome 18 showed that the breeds are differentiated with an average value of F st equal to 0.16. Higher value of F st indicates the presence of higher genetic differentiation between subpopulations, which implies that pairs of genes between individuals within subpopulations are more related than those of individuals between subpopulations. The differentiation than the other pairs (Table 2). However, the F st values among the last four generations in the HO were below 0.1, suggesting that there was no considerable genetic differentiation in the HO bull population in the last four generations (data not shown). This may indicate the fact that there has not been new outbred genetic material introduced to the bull population over the last four generations, except the use of commonly used sires of good genetic merit over generations. The relatedness between individuals within breed in BS vs. HO populations relative to the total population was higher (0.28) than that in BS vs. JE (0.23) and JE vs. HO (0.22), which also implies that BS and HO gene pools are more differentiated compared to the other pairs of breeds.
A summary of allelic richness, average fixation indices, frequency of private alleles per breed pair are presented in Table 2. Higher frequency of private alleles (alleles that are present in one of the breeds, but not in another) was observed in BS vs. HO followed by JE vs. HO and BS vs. JE populations. This result is also in agreement with population differentiation results, as measured by F st values. In addition, indirect estimates of gene flow indicated the presence of higher effective number of migrants (N m ) between populations of JE and HO followed by BS and JE, while BS and HO populations showed the least N m . This might be one explanation for the higher values of population differentiation measures, in particular higher F st between BS and HO.
The exact-test for population differentiation of each breed pair across all loci showed highly significant differences among breeds regarding the distributions from which the alleles and genotypes were drawn from. Accordingly, the majority of loci have alleles or genotypes drawn from different distributions in the three breeds. However, there are some loci with alleles or genotypes drawn from the same distribution in all the breeds. For example, loci with alleles drawn from the same distribution in BS, JE and HO are shown for Chromosome 14 and 18 ( Figure 5). This implies that alleles of those loci may not have been differentiated by selection, drift and inbreeding in the three bull populations. Moreover, the comparison of each pair of the three breeds with respect to the origin of their alleles is also presented. Accordingly, on average, alleles of 7.8, 5.5 and 3.1% of loci could be drawn from the same distribution in JE vs. HO, BS vs. JE and BS vs. HO populations, respectively (Table 3). Similar results were obtained for the percentage of loci with genotypes drawn from the same distribution ( Table 4).
The amalgamated BS vs. HO population showed the highest average inter-population gene differentiation both dependent (D st = 0.03) and independent (D st ' = 0.07) on the number of subpopulations, and also the highest expected total heterozygosity (Y t = 0.33) compared with the ideally amalgamated populations of BS vs. JE, or JE vs. HO. Similarly, the highest G st and G st ' (12.5 and 19.7%, respectively) were also observed in an ideally amalgamated population of BS vs. HO. Therefore, the

Discussion
The recent decline in diversity is sufficiently rapid that loss of diversity should be of concern to animal breeders [23]. Several authors e.g. [24] demonstrated different models to describe deviations from Hardy-Weinberg proportions. The exact test for Hardy-Weinberg disequilibrium [9,25] within breeds showed a significant deviation in each breed in this study (P < 0. 01). The populations also showed several loci with a significant heterozygote deficit (P < 0.01) but no loci with significant heterozygote excess, which implies the application of genetic selection and inevitably the role of random genetic drift and inbreeding in each breed. Generally, the results have showed that there are some loci (from 3.1 to 7.8%) with alleles drawn from the same distribution in all the populations. This may suggest the fact that, over time and through forces like selection and random genetic drift, the allele frequencies have been largely changed in the breeds, where very little of the original genomes are preserved. Each breed showed considerable difference between the observed and expected number of heterozygous individuals across loci. However, in the ideally amalgamated pairs of the populations, the difference between the observed and expected number of heterozygous individuals appears to be smaller suggesting that crossbreeding could be carefully Generations back Average heterozygosity considered for increasing diversity in the future if needed. In livestock species, heterozygote deficiencies can be interpreted as the consequence of many factors, such as selection, population subdivision, or inbreeding [26].
Populations are said to be undifferentiated if F st [27]. In this study each pair of breeds showed higher values, which implies that the populations have different gene pools. However, F st among the last four generations in HO was below 0.1, suggesting that there has been no significant introduction of more outbreed gene pool into the HO population over the last four generations. These results imply measures of population differentiation based on F st have been described as reliable. For example, pair-wise F st values were significantly correlated between bi-allelic loci and microsatellite datasets in Atlantic salmon, and similar result was found with regard to the overall heterozygosity [27].
The highest proportion of total genetic variation attributed to between breed differentiation was observed in BS vs. HO. The proportion of between breed genetic variation observed in this study was comparable to the average between breed variation (7.03%) reported in nine populations of Argentinean Creole cattle populations [28]. Studies in the past demonstrated that Wright's F st results were reliable and most consistent with Reynold's distances, Nei's minimum distance measures and eight other genetic distance measures for ordering populations, which are widely used and well-established measures of genetic differentiation [29]. In this study, the mean F st indicated that BS vs. HO population has higher genetic diversity than BS vs. JE and JE vs. HO ideally amalgamated populations. The average estimates, based on microsatellites, of F st in 20 Northern European cattle breeds was 0.11 ± 0.01 [30], which is comparable with the findings in this study.  To summarize, Brown Swiss, Jersey and Holstein bull populations have substantially different gene pools. An interesting result was the heterozygote deficit observed in each of the populations in this study. In livestock species, heterozygote deficiencies can be explained by several factors, such as selection, population subdivision, drift and inbreeding. Each breed showed a considerable difference between the observed and expected number of heterozygous individuals across loci. However, in the ideally amalgamated pairs of the populations, the difference between the observed and expected number of heterozygote individuals across loci appears to be smaller, suggesting that crossbreeding could be carefully considered for increasing diversity if needed in the future. At the present level of genetic diversity, crossbreeding is not a necessity, however if loss of genetic diversity within each breed worsens in the future, crossing can be considered as an option to increase total genetic diversity within breeds.

Conclusions
The results suggested that the within population genetic diversity accounts for a higher proportion of the total genetic diversity in ideally amalgamated populations than  the diversity between populations. The results of private alleles frequencies in this study indicated that each breed might contain unique genes or gene combinations that are absent in another breed. The study demonstrates that even with a much smaller population size, BS showed similar gene diversity to the Jersey breed, while Holstein showed higher gene diversity than both breeds in agreement with their reported effective population sizes. BS and HO seem to have higher population differentiation (F st ) compared to the other pairs (BS vs. JE and JE vs. HO). If BS and HO were to be amalgamated, higher total expected gene diversity would be obtained as compared to the other pairs of breeds (BS vs. JE and JE vs. HO). If the loss of genetic diversity within breeds worsens in the future, the use of crossbreeding might be an option to recover genetic diversity, especially for the breeds with small population size. F is is the average over all subpopulations of the correlation between uniting gametes relative to those of their own subpopulation. F st is the correlation between random gametes within subpopulations relative to gametes of the total population, which is a measure of subpopulations differentiation. F it is the correlation between uniting gametes that generated the individual relative to gametes of the total population; subscripts is, st and it stand for individual relative to subpopulation, subpopulation relative to total population, and individual relative to total population, respectively.