Characterization of 29 polymorphic microsatellite markers developed by genomic screening of Sumatran rhinoceros (Dicerorhinus sumatrensis)

Objective The Sumatran rhinoceros is critically endangered, with fewer than 100 individuals surviving across its current range. Accurate census estimates of the remaining populations are essential for development and implementation of conservation plans. In order to enable molecular censusing, we here develop microsatellite markers with amplicon sizes of short length, appropriate for non-invasive fecal sampling. Results Due to limited sample quantity and potential lack of genome-wide diversity, Illumina sequence reads were generated from two Sumatran rhinoceros samples. Genomic screening identified reads with short tandem repeats and loci that were polymorphic within the dataset. Twenty-nine novel polymorphic microsatellite markers were characterized (A = 2.4; HO = 0.30). These were sufficient to distinguish among individuals (PID < 0.0001), and to distinguish among siblings (PID(sib) < 0.0001). Among rhinos in Indonesia, almost all markers were established as polymorphic and effective for genotyping DNA from fecal samples. Notably, the markers amplified and displayed microsatellite polymorphisms using DNA extracted from 11 fecal samples collected non-invasively from wild Sumatran rhinoceros. These microsatellite markers provide an important resource for a census and genetic studies of wild Sumatran rhinos. Supplementary Information The online version contains supplementary material available at 10.1186/s13104-021-05522-x.

barcode and sample pooling, and DNA sequencing on the Illumina MiSeq v3 platform.
Resulting paired-end reads from Illumina MiSeq v3 sequencing of Dsu-33 and Dsu-35 with overlapping sequence (within each individual) were merged using FLASH 1.2.8. A program was written in C to take large genome-scale, unassembled high-throughput sequences and return microsatellite-containing reads for subsequent analysis. In this program, relaxed criteria for extraction of reads containing microsatellite motifs were used, thus filtering uninformative reads out of the working databases. The program selected reads with di-, triand tetra-nucleotide motifs containing at least four repeats. Reads with the tandem repeats in the first or last 50 nucleotides were removed to allow sufficient flanking region for primer design. The full sequence read was required to be a minimum of 120 bp in length.
A total of 30,556,224 sequencing reads were obtained, with 16,813,030 reads from Dsu-33 (average length of 410 bp) and 13,743,194 reads from Dsu-35 (average length of 440 bp). After paired-end sequences were joined, databases were created of 7,399,098 reads for Dsu-33 and 5,993,320 reads for Dsu-35. A total of 176,357 reads (2.4%) from Dsu-33 and 167,849 reads (2.8%) from Dsu-35 were found to contain microsatellite motifs with four or more tandem repeats.
To search microsatellite containing reads for potential polymorphisms, a Pythonbased script was written to combine all reads containing microsatellites from both rhinos into one database and subsequently omit the microsatellite motif from each read, leaving a set of flank-pairs (i.e., a pair of flanks from the same original read, one from either side of the microsatellite motif). A MegaBLAST pair-wise analysis, requiring 99% sequence identity and an ungapped alignment, was completed to identify matching flank-pair sequences. The sequences of matching flank-pairs were aligned and those containing microsatellite motifs with a differing number of tandem repeats were retained. For each alignment, a stitched sequence maximizing the length of the combined flanks was used as the representative sequence for the locus.
The representative sequences of potentially polymorphic loci were further analyzed in MSATCOMMANDER 1.0.8 [2]. Sequences were again screened for di-, tri-, and tetra-, microsatellite motifs, this time with a minimum of six tandem repeats. Primers were designed in MSATCOMMANDER through an interface with PRIMER3 software [3] to meet the following criteria: amplification of a target product in the 75 to 150 bp size range (inclusive of the two primer lengths), optimal primer length of 20 base pair (range 18 to 22 base pair), optimal melting temperature of 60.0°C (range of 58.0°C to 62.0°C), optimal GC content of 50%, inclusion of at least 1 bp GC clamp, low self or pair complementarity and a maximum end stability of 8.0 [2].
Once the set of potential loci was identified, a number of quality checks and screening criteria were implemented before selection of primer pairs for testing in the laboratory. To determine if the designed primer pairs would produce amplicons of varying size (as expected at a polymorphic locus), the IPCRESS program [4] was used to run in silico PCR. Each primer pair was computationally "amplified" against the joined paired-end sequencing databases from Dsu-33 and Dsu-35. IPCRESS identified "amplicons" that would potentially be produced from each individual during PCR with no priming mismatches, one priming mismatch, and two priming mismatches. Primer sets that were expected to produce amplicons of a single size, or of more than four varying lengths in the in silico PCR step were not further considered, to avoid monomorphic loci or repetitive elements, respectively.
Additionally, loci that exhibited broad size ranges (more than 20 bp difference between alleles) were excluded, to preclude potential non-specific amplification or amplification of loci containing indels or many null alleles. Remaining primer sequences and full amplicon sequences were searched against the non-redundant BLAST database (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Any locus showing evidence of being part of a repetitive element (e.g., LINEs or SINEs), or that closely matched sequences of human DNA (a conceivable contaminating factor) were screened out. After screening the reads containing microsatellite motifs, 861 potentially usable polymorphic loci were identified. Suitable priming regions were present for 229 of the loci; after the quality checks noted above were conducted, a set of 55 potentially polymorphic loci was identified as suitable for laboratory testing. For each of the 29 loci, sequences were queried using NCBI nucleotide blast (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&L INK_LOC=blasthome). Each locus sequence ( Figure S2) was used as query for blastn, selecting the "Whole-genome shotgun contigs" database, with "Rhinocerotidae (taxid:9803)" selected as "organism." In Table S1, the highest scoring matches to genome scaffolds are listed for two subspecies of the Sumatran rhinoceros (Dicerorhinus sumatransis sumatransis and D. s. harrissoni) for which genome sequences are available (one individual of each subspecies).

Fecal DNA Extraction and Genotyping at the University of Illinois
Fecal samples were provided by the Cincinnati Zoo from the two Sumatran rhinoceros in North America at the time of this study; these samples were used for preliminary evaluation of genotyping success at UIUC. At UIUC, a number of modifications were tested to improve both fecal DNA extraction protocols and PCR amplification results.
The QIAmp DNA Stool Kit protocol [1] was used, with the following modifications initially tested at UIUC: each fecal sample was thoroughly homogenized in 20% DMSO salt saturated buffer by vortexing for 5 minutes; the initial sample volume was increased to 800 µl; samples were digested overnight in 1 mL of ASL buffer and 1 mg of proteinase K at 56°C; vortex times throughout were increased (especially for the InhibitEx step which was vortexed for 5 minutes); and final elution was done twice with 50 µl of elution buffer each time and a minimum 30 minute incubation at room temperature. DNA concentrations were measured using the Qubit 2.0 Fluorometer for the modified extraction protocol; resulting values were compared to those obtained with the standard protocol. Subsequent PCR results suggested that these modifications to the extraction methods increased the amplification success, and subsequent optimization steps used DNA isolated following the modified protocol.
Various PCR conditions were also tested, and those that appeared to be more effective at improving amplification success were adopted. These tests included additional treatments to remove PCR inhibitors, various dilutions of the DNA extract, use of different DNA polymerases, variations in concentration of MgCl2, varying the annealing temperature range of touchdown PCRs, the number of cycles at each temperature, and the lengths of the elongation step, and the amount of DNA extract used. With the caveat that the number of Sumatran rhino fecal samples was too limited to report results quantitatively, the following UIUC protocol incorporates the factors that appeared to improve success.
Genotyping was carried out using the following PCR conditions with the same primer Genotype data from fecal samples were compared to those generated using the higher quality DNA from blood; instances of allelic dropout and false allele rate, which produce incorrect genotypes, were recorded.
Because the markers were intended for use in populations of Sumatran rhinoceros in their home range, additional tests of the markers were conducted in Indonesia at the Eijkman Institute for Molecular Biology.
Detailed Methods at the Eijkman Institute for Molecular Biology (EIMB)

Overview of PCR modifications
In Indonesia, additional troubleshooting and modifications were conducted, to account for the use of different platforms, and given that plants eaten by rhinoceros (which may contain inhibitors to PCR) would be different from those eaten by rhinoceros in North American zoos. Protocols were tested at the Eijkman Institute for Molecular Biology, for use with DNA from blood and fecal samples from the Sumatran Rhino Sanctuary (SRS), Way Kambas National Park, Indonesia and fecal samples from Burkit Barisan Selatan National Park (BBS), Indonesia. One modification was testing the use of reagents from the Qiagen Mutiplex PCR Kit. The PCR cycling used for the multiplex kit followed the manufacturer's recommendation and was different from the touchdown PCR algorithm applied when using AmpliTaq Gold Polymerase. The changes are incorporated into the protocol shown below.
For two of the markers (Disu100 and Disu593), PCR on the same samples was conducted separately using either AmpliTaq Gold DNA Polymerase or using the Qiagen Multiplex PCR Kit, with results for each shown in Figure S1, below. For both markers, using the Qiagen Multiplex PCR Kit yielded a better or stronger signal than the AmpliTaq Gold polymerase ( Figure S1, below). Results for other markers are available upon request.

Sample Collection and DNA Extraction
Blood samples of captive individuals at SRS were collected during regular veterinary care. Whole blood was stored in EDTA tubes and refrigerated until DNA extraction (ca. 1 month). DNA was extracted using a salting out procedure [5]: Non-nucleated cell contamination was minimized by washing with red blood cell lysis solution three times.

DNA amplification and genotyping
PCR amplification was performed in a DNA-free laminar hood to prevent contamination. PCR that used Amplitaq Gold Polymerase followed the UIUC procedure described above, but this procedure performed poorly. Thus, the Qiagen® Multiplex PCR kit was employed, which generated improved results for genotyping. PCR products were fluorescently labeled using M13 forward (M13F)-tailed forward primer and the same primer mix as stated above for UIUC. The sample was amplified in 10 µL reaction consisting of PCR-grade water, 1x Master Mix, Q-Solution, primer mix and 100-200 ng DNA template for fecal samples or 5 ng DNA template for blood samples. The difference in the amount of DNA used for dung and blood samples was intended to adjust the endogenous DNA amount and also to yield similar heights of peaks in genotyping results, because peaks from the blood sample DNA would tend to be higher. Amplification was carried out in the GeneAmp® CR

Primer testing using human or tapir DNA
Tests using human and using tapir DNA were conducted for each primer pair using the Qiagen Multiplex PCR Kit. These were conducted because it is difficult to distinguish fecal samples from rhinoceros and tapir (Tapirus indicus) in the field. Also, human DNA may pose a risk of contamination during sampling or extraction. All the primer pairs were tested using DNA from one human and from one tapir sample. The methods used for this assessment are the same as for the singleplex rhinoceros PCR protocol mentioned immediately above.

Testing of paired blood-fecal samples at EIMB and UIUC
While not required for a preliminary examination of amplification success as a pilot study, the genotyping of DNA from dung samples for a census of rhinos would benefit if each sample for each marker was genotyped multiple times, e.g., as described by Okello and colleagues [1].
To evaluate the precision and accuracy of genotypes generated with the new markers on fecal samples, paired blood and fecal samples were available at EIMB for one SRS rhino (Andalas) and at UIUC for two Cincinnati Zoo rhinos (Dsu-28 and -44) ( Table 1). At EIMB, the fecal and blood DNA samples from Andalas were included in the testing of 13 randomly chosen loci using the SRS samples. For these 13 markers, genotypes were completely consistent in the paired blood-fecal samples, with no allelic dropout or false alleles in the fecal genotypes.
While the number of available paired blood-fecal samples was low, the results suggested that accurate genotypes were obtained from the fecal samples.

Detailed protocol
A step-by-step PCR protocol using the Qiagen Multiplex PCR Kit is included below.

PCR protocol for Sumatran rhino microsatellite analysis (using Multiplex PCR Kit)
1. All the microsatellite primers were provided by UIUC. The primers were eluted into 100 µM home stock and 20 µM working stock. 2. Markers Disu100 and Disu733 were employed to test this protocol. The primer mix was prepared using the same procedure described above for UIUC. 3. This protocol is for singleplex PCR using the Qiagen® Multiplex PCR kit. 4. Mixing of PCR reagents (components are listed in step 6) and applying the cocktail to the reaction tubes or plates were conducted in a Biosafety Cabinet exclusively for PCR setup. 5. DNA template was added to the tubes or plates prepared in step 4 in a separate laminar hood used for DNA only 6. Singleplex PCR mix is listed here:  11. Finally, the result from electrophoresis visualization was interpreted, using GeneMapper v. 4. For each microsatellite locus sequence, the best scoring match is given for two subspecies of Sumatran rhinoceros: Dicerorhinus sumatrensis sumatransis (PEKH01) and D. s. harrissoni (JABWHU01). In no case did a locus show the large number of matches indicative of a repetitive element. The few not listed had short and repetitive sequences: Disu076 did not have a Blastn match to the Sumatran rhinoceros, but did produce a match to the white rhinoceros, Ceratotherium simum. Figure S1 (below). Results for two Sumatran rhino microsatellite makers, each amplified using two methods. The chromatograms are shown for markers Disu100 (first page) and Disu593 (second page). DNA samples were amplified for both markers using both AmpliTaq Gold polymerase (top of each panel) and the Qiagen Multiplex PCR Kit (bottom of each panel). The blood sample and fecal sample from SRS were both from Sumatran rhinoceros Andalas (Table 1). The fecal samples from the wild were from free-ranging rhinoceros samples BBS-3-019 (Disu100) and BBS-3-005 (Disu593). For both markers, the Qiagen Multiplex PCR Kit yielded a better or stronger signal than the AmpliTaq Gold polymerase. The AmpliTaqGold DNA polymerase produced lower quality signals (with no signal in one case using fecal DNA). The scale used may differ across the traces shown.