Genetic diversity of Chamaecrista fasciculata (Fabaceae) from the USDA germplasm collection

Objective Chamaecrista fasciculata is a widespread annual legume across Eastern North America, with potential as a restoration planting, biofuel crop, and genetic model for non-papillinoid legumes. As a non-Papilinoid, C. fasciculata, belongs to the Caesalpiniod group in which nodulation likely arose independently of the nodulation in Papilinoid and Mimosoid legumes. Thus, C. fasciculata is an attractive model system for legume evolution. In this study, we describe population structure and genetic diversity among 32 USDA germplasm accessions of C. fasciculata using 317 AFLP markers developed from 12 primer pairs, to assess where geographically there is the most genetic variation. Results We found that the C. fasciculata germplasm collection fall into four clusters with admixture among them. After correcting for outliers, our analysis shows two primary groups across Eastern and Central North America. To better understand the population biology of this species, further sampling of the full range of this widespread species is needed across North America, as well as the development of a larger set of markers providing denser coverage of the genome. Further sampling will help clarify geographical relationships in this widespread temperate species. Electronic supplementary material The online version of this article (10.1186/s13104-019-4152-0) contains supplementary material, which is available to authorized users.


Introduction
Genetic diversity of germplasm collections serves as an important resource for the conservation and maintenance of both wild and cultivated plants and can be particularly useful for the development of new potential crops. One such species is Chamaecrista fasciculata, or partridge pea, which is a member of the economically important Leguminosae family. The species belongs to the subfamily Caesalpinioideae; the common ancestor of Papilionoid legumes (soybean, Medicago, and Lotus) which diverged approximately 60 million years ago (Legume Phylogeny Group, [20,21] from these groups. There is growing interest in implementing Chamaecrista as a complementary model for legume evolution due to its relatively small genome size, phylogenetic position, ability to form nodules, and flower development; all of which would provide fundamental knowledge on the evolutionary origins of legume traits [29]. A genome sequencing project is currently underway for C. fasciculata (Steve Cannon, Pers. Comm.), which is one of the only annual temperate species with a compact growth form in the large genus of ~ 330 mostly long-lived tropical tree and shrub species.
The partridge pea (C. fasciculata), is a North American annual legume with a widespread distribution that ranges from the Northern Great Plains to Central Mexico. In the U.S. C. fasciculata, can be found growing from southern New England to Florida and westward into New Mexico and Oklahoma [15]. It is self-compatible and has a high outcrossing rate of 80% [10,12]. The plant produces large yellow flowers that are exclusively pollinated by carpenter bees and bumblebees [1]. Seeds are dispersed short distances from parents (< 2.5 m) via explosive dehiscence [10]. Below ground, C. fasciculata forms nodules in response to nitrogen fixing bacteria known as rhizobium [22]. Unlike other legume crops, the genus Chamaecrista has not undergone any whole genome duplications [2] since its divergence from the Papilionoideae and has a generally smaller genomes (ca. 650 Mb in C. fasciculata). Working with fewer copies of genes in a model system such as C. fasciculata makes genetic approaches substantially easier, potentially enhancing the rate of discovery

BMC Research Notes
*Correspondence: ebishopv@uvm.edu 1 Plant and Soil Science, University of Vermont, Burlington, VT, USA Full list of author information is available at the end of the article in legume crops. As the only temperate annual in a large tropical tree genus, a wealth of information exists on the ecology of C. fasciculata including the characterization of locally adaptive traits in response to climate change, key pollinators, and gene flow and genetic structure among naturally occurring populations [4-6, 10, 11, 13, 14, 30, 31]. Additionally, the genus Chamaecrista has independently evolved the ability to form nodules, thereby creating a unique opportunity to investigate the origins of nodulation and mutualistic interactions in Leguminosae [3]. Therefore, expanding on the genomics of C. fasciculata as a non-papilionid model legume is a key step into understanding the evolution of legume traits.
Here, we characterize genetic variation in the USDA collection of C. fasciculata comprising of 32 accessions originating from a range of populations in the U.S. that span its geographic distribution. Using Amplified Fragment Length Polymorphism (AFLP) markers [33], we show that there are four clusters in the germplasm collection with minimal genetic differentiation among groups.

Germplasm collection
Accessions were selected from the USDA GRIN repository. In total, we assembled a total of 32 accessions which is a representative of all available accessions in the repository. Because the samples were donated to USDA prior to 1992, they lack precise location information. Thus, we were only able to determine the U.S. state from which they originated. All samples were of C. fasciculata var fasciculata, as C. fasciculata var macrospermum is restricted to Virginia, a state with no samples in this dataset.

AFLP marker development
Freeze-dried, leaf tissue samples from 32 accessions were pulverized in a SPEX SamplePrep 2000 Geno/Grinder ® , and DNA was extracted using the Wizard ® Magnetic 96 DNA Plant System (Promega). Amplified Fragment Length Polymorphism (AFLP) markers were generated using locally developed procedures based on technology by Vos et al. [33] and following modifications in Johnson et al. [18] and Greene et al. [16]. We performed a restriction double digest in 25 µl reactions containing 250 ng of DNA, 1X Purified BSA, 5.0 U each of EcoRI and MseI restriction enzymes (New England BioLabs) and 1X NE Buffer 4. To verify complete digestion, re ran 15 µl of the restriction digest reaction on a 1.5% agarose gel.
Following previous procedures in Johnson et al. [18] and Greene et al. [16], we performed a ligation step at 20° C for 2 h in a 20 µl reaction containing 10 µl of the remaining restriction digest, 5 pMoles EcoRI adapter, 50 pMoles MseI adapter, 0.5 mM ATP, 80 cohesive end Units of T4-ligase, and 1X T4 Ligase Buffer (New England Bio-Labs). We diluted the completed ligation reaction to 10:1 for pre-amplification. Both pre-amplification and selective amplification were done using an ABI 9700 thermocycler using cycling programs described by Vos et al. [33] in 10 µl reactions. Two millilitre of the diluted pre-amplification product (10:1) was used for selective amplification. We used twelve separate primer pairs for selective amplification ( . We scored marker loci as either present or absent based on printed images.

Data analysis
We created a graphical display of accession relationships with NTSys-pc software [27] using Jacard's coefficient. The tree was constructed using Q-values that were outputted from a STRU CTU RE analysis (see below) at K = 4 and Prevosti's distance coefficient [25] which substitutes Q-value fractions for allele frequencies at a single AFLP locus.
To examine population structure we used STRU CTU RE v2.3.3 [8,9,26] and the widely applied technique developed by Evanno et al. [7]. Ten replications with a burn-in of 20,000 iterations followed by 20,000 additional iterations were used at each K level until results indicated lowered and less erratic values for P(X|K). The parameter set included the ADMIXTURE model with allele frequencies correlated, and a RECESSIVE ALLELES model that is essential for dominant loci like AFLPs. Average Q-plots over the ten replications were calculated using the associated software CLUMPP [17], and graphic displays of population structure were developed from the q-frequencies of the mean of 10 runs using DISTRUCT software [28]. We analyzed genetic diversity in Genalex 6.5 [23,24] and checked them in AFLP-SURV 1.1 [32] (not shown). Lastly, we performed a Principal Components Analysis (PCA) for clustering using binary assignments in Genalex.

Analyses of population structure
AFLP analysis resulted in a total of 317 polymorphic loci. STRU CTU RE analysis combined with the technique of Evanno et al. [7] indicated the most probable number of distinct populations at K = 4 (Figs. 1 and 2, Table 1, Additional file 1: Figure S1a, b). Separation was, for the most part, based on latitude with some anomalies. Consequently, we named these groups Central (US), South, AK/MS, and Texas. While the accessions from Kansas, Nebraska, New Jersey, and Minnesota (Central US group) were mostly separated from those of Arkansas and Mississippi, two accessions from Arkansas, and one from Mississippi were grouped apart from the others, and then placed into our AR/MS cluster. A sample from Texas also formed a separate group, although some samples from other states, such as Minnesota, showed some admixture with this group.
We identified seven individuals as considerably admixed among at least two of the groups. A Principal Component Analysis (PCA, Fig. 2) showed the three individuals from the AR-MS group differentiated on the first axis, and differentiation along a latitudinal axis on the second axis. Although STRU CTU RE combined the more Northern accessions to the first two groups (our Central and South groups), the PCA suggests a subtle latitudinal cline in diversity, overwhelmed by differentiation among multiple groups in the Southern US. This pattern of greater Southern diversity and differentiation is consistent with glacial refugia in the Southern U.S. during the last glacial maxima, and admixture as populations migrated back to deglaciated areas in the more Northern US.

Genetic diversity analysis
Overall, we found some genetic differentiation among the four groups in the USDA Chamaecrista fasciculata germplasm collection. In total, we analyzed the genetic variability of 317 loci from 32 C. fasciculata accessions ( Table 2). The overall Pairwise genetic distance PhiPT value was 0.207 (P = 0.001). The Analysis of Molecular Variance (AMOVA) based on PhiPT values indicated that 79% of the variance comes from within populations (estimated variance = 11.84) while 21% of the variance comes from among populations (estimated variance = 3.11). Mean Shannon's diversity index across all populations was 0.24 (± 0.11).

South (US)
Texas AR/MS Fig. 1 Phenogram of 32 Chamaecrista fasciculata accessions from 317 AFLP loci using Jacard's Coefficient. Results of STRU CTU RE analysis at K = 4 superimposed on the phyogenetic tree using DISTRUCT software. Each STRU CTU RE group is represented by a different color, which mixed colors for individuals indicating admixture. We define the groups as Central 1 (yellow), South 2 (orange), AR-MS (for Mississippi and Arkansas, pink) and Texas (blue). The two letters after each accession indicate the US state from which it originates Discussion AFLP markers were used to estimate genetic diversity among 32 C. fasciculata accessions sampled across its geographical distribution. The patterns of differentiation we observed in C. fasciculata likely result in part from migration in response to repeated patterns of glacial activity. The differentiation found in the more Southern US states is likely a result of differentiation in glacial refugia, such as on different sides of the Appalachian mountain chain or Ozark mountains, with more Northern populations resulting from post-glacial advances northward and possible admixture from different glacial refugia. A similar AFLP analysis of Phaseolus polystachios, the North American Wild Kidney Bean, and the only Phaseolus species native to temperate North America set apart an accession from Texas which was later given species status as Phaseolus texensis ([19], and unpublished).
Chamaecrista fasciculata is a very widespread plant in eastern and central North America, occurring in a variety of habitats from mixed prairies to disturbed habitats, to unique local ecosystems such as mid-Atlantic serpentine barrens and South Florida Karstic pine rocklands. Such widespread occurrence and broad adaptation could make it useful as a component of mixed biofuel plantings as well as habitat restoration plantings and ecological and evolutionary studies. Based on our findings, the current collection, although diverse, likely does not capture the full range of variation present in this ecologically diverse species. In particular, more precise sampling from particular habitats, may show unique patterns of differentiation. Similarly, more thorough sampling at the edge of the geographic range of the species may find outlying populations, or uncover introgression with more tropical Chamaecrista species, such as C. nictitans or C. lineata var. keyensis, which is endangered in the Florida Keys. The outlying Texas group may be consistent with rangeedge differentiation of populations. Thus, we recommend further collecting to improve the value of this collection for a variety of uses, from research to restoration, to biofuels.

Limitations
The AFLP markers that were used in this study have several limitations such as being dominant rather than co-dominant, occurring at random locations in the genome that are difficult to tie to a genomic region and being limited to a few hundred total loci. New technologies, such as genotyping-by-sequencing and next generation sequencing based approaches that develop single nucleotide polymorphisms do overcome these challenges. Secondly, the set of lines examined is small in total number, with 32 being marginal for inference about population genetic patterns. Third, the USDA collection was assembled before 1992, when GPS units became available. Consequently, the passport data Table 1

Group assignments, based on STRU CTU RE output analyzed in DISTRUCT
Our STRU CTU RE analysis detected four groups, or populations, which we have named Central (group 1, yellow in Fig. 1), South (group 2, orange in Fig. 1), AR/MS (group 3, Arkansas/Mississippi, pink), and Texas (TX, group 4, blue). We give the percent membership of each accession to each STRU CTU RE group to show the extent of admixture for the accessions we assessed is limited to U.S. State, rather than more precise locations. Our work suggests that efforts to expand the USDA germplasm collection for Chamaecrista and improve the associated passport data would be quite useful for a number of research applications.

Additional file
Additional file 1: Figure S1. Plots from the software STRU CTU RE of A) lnP(X|K) indicating the highest probability at K = 4, and (B) graph of dK vs K from technique of Evanno et al. [7] indicating most probable population subdivisions at K = 2 and K = 4. Based on the Evanno et al [7] technique, we find 4 to be the best number of populations.