Microsatellite marker development from next-generation sequencing in the New England cottontail (Sylvilagus transitionalis) and cross-amplification in the eastern cottontail (S. floridanus)

The New England cottontail (Sylvilagus transitionalis) is a species of high conservation priority in the Northeastern United States, and was a candidate for federal listing under the Endangered Species Act until a recent decision determined that conservation actions were sufficient to preclude listing. The aim of this study was to develop a suite of microsatellite loci to guide future research efforts such as the analysis of population genetic structure, genetic variation, dispersal, and genetic mark-recapture population estimation. Thirty-five microsatellite markers containing tri- and tetranucleotide sequences were developed from shotgun genomic sequencing of tissue from S. transitionalis, S. obscurus, and S. floridanus. These loci were screened in n = 33 wild S. transitionalis sampled from a population in eastern Massachusetts, USA. Thirty-two of the 35 loci were polymorphic with 2–6 alleles, and observed heterozygosities of 0.06–0.82. All loci conformed to Hardy–Weinberg Equilibrium proportions and there was no evidence of linkage disequilibrium or null alleles. Primers for 33 of the 35 loci amplified DNA extracted from n = 6 eastern cottontail (S. floridanus) samples, of which nine revealed putative species-diagnostic alleles. These loci will provide a useful tool for conservation genetics investigations of S. transitionalis and a potential diagnostic species assay for differentiating sympatric eastern and New England cottontails.


Introduction
The New England cottontail (Sylvilagus transitionalis) is a narrow niche specialist that relies on the dense understory vegetation of early successional or shrubland habitats [1]. These ephemeral habitats have declined steeply in recent decades in the Northeastern United States [2][3][4]. As a result, many shrubland species, including the New England cottontail, face severe population declines [5]. Remnant New England cottontail populations are found today in < 14% of the species' historical range and occur in five geographically and genetically distinct populations located in southern Maine and southeastern New Hampshire, central New Hampshire, eastern Massachusetts on Cape Cod, eastern Connecticut and Rhode Island, and western Connecticut and New York [5]. Within these areas, cottontails face the consequences of population isolation and loss of genetic diversity [6,7].
As a result of uncertainty in long-term species' viability, the New England cottontail is a high priority for conservation in the Northeastern United States. To help recover Open Access BMC Research Notes *Correspondence: aaunins@usgs.gov 2 Natural Systems Analysts, Leetown Science Center, 11649 Leetown Road, Kearneysville, WV 25430, USA Full list of author information is available at the end of the article the species, state and Federal natural resource managers collaborated with academicians and other stakeholders to develop and implement a conservation strategy to improve the outlook for the species [8]. As a result, the United States Fish and Wildlife Service determined listing of the species under the Endangered Species Act was no longer warranted [9]. Despite this decision, much uncertainty remains about the species' viability and population status [7,10]. To evaluate the effectiveness of the conservation strategy and inform adaptive modifications to improve outcomes, extensive population monitoring and research into population structure and genetic diversity are needed [8]. To this end, our goal was to develop a suite of microsatellite markers to aid future conservation genetics research on the New England cottontail as well as other Sylvilagus species.

Methods
For microsatellite marker development, total genomic DNA was extracted from tissue samples obtained from 13 New England cottontail, one eastern cottontail (S. floridanus), and two Appalachian cottontail (S. obscurus) using the DNEasy Blood and Tissue kit (Qiagen, Germantown, MD). We chose to sequence multiple species to maximize the number of microsatellites available for screening in the target species S. transitionalis. Sequencing libraries were created from these genomic DNA samples for sequencing on the Ion Torrent PGM and Ion Proton (ThermoFisher Scientific, Frederick, MD). For each sequencing library, the extracted DNA was quantified with a Nanodrop spectrophotometer (ThermoFisher Scientific), and 100 ng of DNA from each individual was used for construction of multiple 200 base pair libraries following the manufacturer's protocol. For libraries with multiple individuals in one run, equimolar proportions of each library were pooled prior to chip loading.
Sequence reads from each completed run were imported into Qiagen CLC Genomics Workbench (ver. 7) for processing. Reads were length trimmed to a minimum size of 30 bp, and the quality trimming threshold was set to 0.01 corresponding to a Phred score of 20. Sequence reads from each of the 5 separate runs were then screened individually for all possible di-, tri-, tetra, penta-, and hexanucleotide microsatellite repeat motifs with a minimum repeat length of five with the program QDD (ver. 3) [11]. Primers were designed for each putative microsatellite locus within QDD using the integrated PRIMER 3 code [12]. Seventy-five tri-and tetra nucleotide loci identified by QDD with predicted amplified lengths between 100 and 250 bp were selected for screening within a collection of n = 8 wild-caught New England cottontails sampled from across the species range.
A universal M13 sequence was added to the 5′ end of either the forward or reverse primer of each primer pair enabling an M13 FAM labeled fluorescent dye complementary to the universal tail to be incorporated into the polymerase chain reaction (PCR) product [13]. Polymerase chain reactions were performed in 25 μl volumes, consisting of 10 ng of DNA, 1 X PCR Buffer (Promega, Madison, WI), 0.25 μM of labeled forward primer, 0.5 μM of unlabeled reverse primer, 0.1 μM of labeled M13, 2.0 mM MgCl 2 , 0.2 mM of each dNTP, 0.25 units/μl Bovine Serum Albumin (New England Biolabs, Ipswich, MA), and 0.06 units/μl of Taq polymerase (Promega), using the following cycling conditions: 94 °C for 15 min, 29 cycles of 94 °C for 1 min, 58 °C for 45 s, and 72 °C for 45 s, 5 cycles of 94 °C for 1 min, 52 °C for 45 s, and 72 °C for 45 s, all followed by 72 °C for 10 min. PCR products for each locus were electrophoresed separately on an ABI 3130 Genetic Analyzer (ThermoFisher Scientific) automated DNA sequencer. Alleles were called using Gen-eMapper (ver. 4) (ThermoFisher Scientific) following the protocols described in [14].
Loci that amplified consistently and were easily scoreable on the sample of n = 8 wild New England cottontails were then tested on DNA extracted from an additional sample of n = 33 New England cottontails from a wild population in eastern Massachusetts, on Cape Cod. In addition, DNA extracted from a sample of n = 6 sympatric eastern cottontails, also collected from eastern Massachusetts were tested for cross-species amplification.

Data analyses
Genotype data from the Cape Cod New England cottontail collection (n = 33) were analyzed for null alleles, large allele dropout, and scoring errors using MICRO-CHECKER (ver 2.2.3) [15]. Genetic diversity and heterozygosity were quantified using GenAlEx 6.5 [16,17]. Exact tests in GENEPOP [18] were used to determine if genotypes at each locus conformed to Hardy-Weinberg equilibrium (HWE). Multi-locus tests of conformance to HWE were completed using Fisher's method in GENEPOP. Linkage disequilibrium (LD) was tested for all pairs of loci using contingency tables in GENEPOP. All tests of HWE and LD tests in GENEPOP used the default Markov chain parameters. Significance levels for HWE and LD tests were adjusted using the sequential Bonferroni correction [19].
To evaluate the utility of these loci for population genetic studies, multiple analyses were performed with the n = 33 New England cottontail collection. All multilocus genotypes were subjected to analysis via GENECAP [20] to identify matching samples, calculate match probabilities, and estimate the sibling probability of identity (PI sibs ) [21]. To assess the randomness of the collection (e.g., to ensure the collection did not consist of a small number of families), we analyzed for the presence of fullsibling families using the program COLONY v2.0 [22]. Settings for COLONY analyses included the assumption of male and female polygamy, no per locus genotyping error information, no inbreeding, long run length with full likelihood analysis, high likelihood precision, no allele frequency updates, and no sibship prior. Individual New England cottontails were analyzed as offspring without assignment of individuals as candidate males (fathers) or females (mothers), as these data were not available. While the inference of family relationships is weakened in this situation with no sex, age, relationship information, and the assumption of polygamy for both sexes, COLONY is predicted to be more accurate than pairwise estimates of relationships [22]. Statistical significance of any full-sib relationships was assessed through the P values reported by COLONY. Within a sample of individuals taken at random (with respect to kin) from a population, the frequencies of full and half sib dyads can be used to estimate the current effective size (N e ) of the population. Therefore, COLONY was also used to estimate N e and associated 95% confidence interval utilizing the estimates from the sibship assignment full likelihood method. To estimate whether the N e has remained constant (i.e., achieved mutation-drift equilibrium; see [23]), we conducted analyses with BOTTLENECK [24], implementing a two-phased model of mutation (5% IAM; 95% SMM; [25]). Statistical significance of the BOTTLE-NECK analyses was determined with a Wilcoxon signedrank test performed by the software.

Results and discussion
Within the 44,084,987 million sequence reads analyzed across five runs of the Ion Torrent PGM and Ion Proton platforms, 1,420,351 million were identified by QDD software as containing microsatellites using our chosen filtering criteria. Of the 75 loci selected from among these sequences for screening in the sample of n = 8 New England cottontails, all but three amplified consistently. An additional 24 loci were monomorphic. Of the remaining 48 loci, 17 had two alleles, 16 had three alleles, 10 had four alleles, three had five alleles, and two had seven alleles. From this set of loci, we selected the 35 most polymorphic, including all with three to seven alleles and a few with two alleles for additional testing in the wild collection of n = 33 New England Cottontails and n = 6 eastern cottontails from the same population to test for cross-specific amplification. The primer sequences of these 35 loci are provided in Table 1.
Microchecker reported no instances of large allele dropout or scoring errors. Analyses of family structure in COLONY indicated the presence of no full sibling dyads, and therefore all individuals from the n = 33 collection were retained for subsequent analyses. Three loci were monomorphic in the Massachusetts wild New England cottontail population, while the remaining 32 loci averaged 3.1 alleles per locus (range 2-6; Table 1). Unique multilocus genotypes were generated for each individual with the probability of 6.4 × 10 −8 that two siblings would share identical genotypes (PI sibs ; [21]). Expected heterozygosity (H e ) ranged widely across loci from 6.0% (StrQ23, StrQ48) to 75.0% (StrQ30) and averaged 47.0% for the Massachusetts wild New England cottontail population. Tests for conformance to HWE revealed that locus StrQ19 was not in HWE (P = 0.016), though this result did not remain significant after sequential Bonferroni correction. In addition, no statistically significant linkage disequilibrium (GENEPOP) was detected after sequential Bonferroni correction.
COLONY estimated the N e (and 95% confidence limits) for the Massachusetts collection to be 48 (30-80), and BOTTLENECK indicated a statistically significant heterozygote excess for the Massachusetts collection (P < 0.001; Wilcoxon signed-rank test), suggesting a recent reduction in the effective population size. These results are consistent with previous work, showing that New England cottontails in eastern Massachusetts have low effective population sizes and have undergone a recent population decline [6,26].
All but two loci amplified in the eastern cottontail samples, and an additional three loci were monomorphic (including two that were also monomorphic in New England cottontails). The remaining 30 loci had two to six alleles in eastern cottontail. Eight polymorphic loci (StrQ23, Str25, Str41, Str42, Str43, Str44, Str48, and Str63) had species-specific amplification patterns with no overlapping alleles across the screened individuals. In additional, StrQ72 was monomorphic for a different allele in each species. These results suggest a subset of these loci will be useful as a diagnostic screening tool for samples collected during noninvasive genetic monitoring surveys [10,27]. However, more extensive testing of samples throughout the range where New England cottontail and eastern cottontail are sympatric is needed to confirm these results.
In conclusion, the markers developed in this study could prove useful for future population monitoring and conservation genetics research of New England cottontail. The high discriminatory power of these loci indicate that they will be particularly useful for noninvasive genetic surveys, where robust individual identification from a small panel of markers is required. Further, the cross amplification in eastern cottontails will facilitate use of these markers in this species as well as possibly others in the Sylvilagus genus.