Evolution and homoplasy at the Bem6 microsatellite locus in three sweetpotato whitefly (Bemisia tabaci) cryptic species

Background The evolution of individual microsatellite loci is often complex and homoplasy is common but often goes undetected. Sequencing alleles at a microsatellite locus can provide a more complete picture of the common evolutionary mechanisms occurring at that locus and can reveal cases of homoplasy. Within species homoplasy can lead to an underestimate of differentiation among populations and among species homoplasy can produce a misleading interpretation regarding shared alleles and hybridization. This is especially problematic with cryptic species. Results By sequencing alleles from three cryptic species of the sweetpotato whitefly (Bemisia tabaci), designated MEAM1, MED, and NW, the evolution of the putatively dinucleotide Bem6 (CA8)imp microsatellite locus is inferred as one of primarily stepwise mutation occurring at four distinct heptaucleotide tandem repeats. In two of the species this pattern yields a compound tandem repeat. Homoplasy was detected both among species and within species. Conclusions In the absence of sequencing, size homoplasious alleles at the Bem6 locus lead to an overestimate of alleles shared and hybridization among cryptic species of Bemisia tabaci. Furthermore, the compound heptanucleotide motif structure of a putative dinucleotide microsatellite has implications for the nomenclature of heptanucleotide tandem repeats with step-wise evolution.


Background
Satellite DNA was originally described as the bands produced by genomic DNA in a CsCl buoyant density gradient that fell outside the principal band [1]. These bands were found to be common in eukaryotes, have a GC content different from the principal band (bulk single copy DNA), and consist of tandemly repeated sequences [2]. Tandem sequence repetition has since become synonomous with the term satellite DNA [3]. Tandemly repetitive sequences have two parameters, the sequence motif, and the copy number (e.g. ATTAT TATT contains three copies of the ATT trinucleotide motif ). The diversity/variability within each parameter has lead to the creation of multiple classification schemes [3][4][5][6][7] and synonyms [8][9][10][11][12] within tandem-repeat nomenclature, particularly when copy number is less than 10 3 .
Tandem repeat marker loci contain the repeat region together with conserved flanking sequence and are usually non-coding. Evolution inside the repeat region is relatively fast due to a mutation rate that is 10 3 -10 5 times higher than that of the genome as a whole [12][13][14]. Evolution of the tandem repeats occurs primarily by DNA slippage during replication (microsatellites) [15,16] or by gene conversion and crossover during meiosis (minisatellites) [17] but see Richard and Paques [18]. In the majority of cases, slippage causes an allele to change size by one repeat motif at a time in a stepwise fashion while gene conversion can cause copy number changes in larger multiples [18]. Tandem repeats tend to mutate faster with increasing copy number [19,20], and tend to expand when copy number is low and contract when copy number is high [21][22][23]. This observation is consistent with upper size constraints on copy number [24].
Mutations can also occur in the area flanking a tandem repeat [25], or into the tandem repeat itself, thereby causing an imperfect or interrupted repeat [26]. These can cause size homoplasy; cases where alleles among individuals have the same-size fragments (identical in character state), but arose in different lineages and are thus not identical by descent.
Bemisia tabaci Gennadius is a cryptic species complex composed of at least 24 morphologically identical species [27,28]. Most of these species are regionally endemic, but two are globally invasive agricultural pests, infesting and feeding on hundreds of crop plants in many diverse agroecosystems [29]. These species originated in the region bordering the Mediterranean Basin (MED) and in the Middle East/Asia Minor (MEAM1) region, and have also been referred to extensively in the literature as biotype Q and biotype B, respectively [28]. The recent arrival of MED to the United States in 2004 [30] raised concerns over possible hybridization with MEAM1, established in the United States since the mid 1980's [31]. This outcome at present seems an unlikely prospect given that MEAM1 and MED are almost completely reproductively isolated [32]. Laboratory hybrids can occur, but are both rare and sterile [33], while field hybrids are also rare and do not persist [34]. Monitoring the spread of MED within the U.S. and distinguishing molecularly among MED and MEAM1 has been a principal aim of the biotype Q taskforce [35]. During this effort, a partial sequence of the mtCO1 gene [36,37] has been the gold standard for molecular identification, but two microsatellite loci, Bem6 and Bem23 [38] have also been used as diagnostic markers [31,34]. Because of the importance of the Bem6 nuclear locus in B. tabaci cryptic species determination in North America, its diversity among economically important species and the origin of shared alleles is of interest.
The Bem6 microsatellite was described as an imperfect (CA 8 ) imp tandem repeat where CA is the dinucleotide motif and 8 is the copy number [38] (Table 1). However, it has been noted that among U.S. samples, alleles at this locus occur in multiples of 7 base pairs rather than the expected 2 base pair multiples [31]. In addition, apparent hybrids and shared alleles appeared to be present among MED, MEAM1, and endemic New World (NW) whitefly samples in the dataset [31,34]. Because of this, alleles at this locus were sequenced in order to 1) better characterize the tandem repeat nature of this marker, and 2) evaluate the possibility of hybrids and shared alleles at this locus among cryptic species.

Methods
Between 2006 and 2011, 63 Bem6 fragments representing 11 genotyped alleles were sequenced from 60 whiteflies ( Table 2) as a supplementary part of large state and continent-wide surveys of Bemisia tabaci whiteflies [31,34]. The sequencing effort included samples from lab colonies as well as samples from Spain, Columbia, Israel, and Morocco. This represented between 1-2% of the whiteflies genotyped at this locus. Alleles not sequenced were very rare. Allele names are based on the average of estimated microsatellite fragment lengths calculated by Genemapper 4.0 (Applied Biosystems, Foster City, CA) rounded to the nearest base pair ( Figure 1). These estimates are based on genotyping and form the basis for potential homoplasy, as alleles are rarely sequenced in practice [39]. The actual allele size as determined via sequencing may be different. Individuals were identified to species using MEAM1, MED, and NW specific primers based on the mitochondrial CO1 gene [36] with reference to the species-level groups identified by Dinsdale et al. [28].
Alleles were PCR amplified using the unlabeled primers described by De Barro et al. [38]. Amplified alleles were direct sequenced in most cases but were also cloned using an Invitrogen TOPO TA® Cloning Kit (Life Technologies, Carlsbad, CA) when it was necessary to sequence both alleles from a heterozygote. Sequencing reactions were run on an Applied Biosystems 3730XL DNA analyzer using a BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). To be certain both alleles were sequenced from heterozygotes, a minimum of 16 colonies were sampled. Sequences were aligned first using the large gap setting in Sequencher 4.7 (Genecodes, Ann Arbor, MI) and then manually in Sequencher 4.7 and in Mesquite 2.74 [40].

Homoplasy among whitefly species
In the three species, four perfect heptanucleotide tandem repeats designated H1-H4 (Figure 2a) were found Table 1 Nucleotide sequence of the Bem6 (CA) 8 imperfect microsatellite and flanking region [38] Position DNA Sequence

C C T T T T C A T T A G T A T A G C A T T A C T C C T A A C A C A 133 A G T T C A T A A T T A G T A T T A T A A C A T A A G C C A T C
The repeat region is in bold and imperfections in the CA motif are italicized.
preceding the 3′ flanking region of the Bem6 microsatellite described by DeBarro et al. [38]. From 63 fragments representing 11 genotyped alleles (based on estimated size), 16 unique alleles (based on sequencing) were found ( Figure 3). The sequence of the MEAM1 allele 216 was identical in 17 individual whiteflies collected from five U.S. States and contained a compound heptanucleotide repeat of motifs H1-H3 ( Figure 2). The sequence of the NW allele 216 was identical in two whiteflies, one each from Texas and Mexico and is characterized by the presence of the H4 motif ( Figure 2). The sequence of the MED allele 217 was identical in two whiteflies from Florida and one from Spain and contained a compound heptanucleotide repeat of motifs H1 and H2 ( Figure 2). Instances of potential homoplasy among MEAM1 allele 216, NW allele 216, and MED allele 217 were caused by an overlapping distribution of allele size estimates ( Figure 1). Additional cases of potential homoplasy (≤1 base pair estimated length difference) among cryptic species were found at alleles 209/ 210 and 195/196 ( Figure 3). In MED whiteflies, allele 210 is the most common in North America with a frequency of 91%. Only two MEAM1 individuals had this allele meaning there would be a very low incidence (~1 in 2500) of MEAM1 individuals incorrectly identified. As reported previously [34], there was no evidence of allele sharing or hybridization among cryptic species at the Bem6 locus ( Figure 3).

Homoplasy within whitefly species
Within species homoplasy was found for MED at allele 210 ( Figure 3) and for MEAM1 at alleles 210 and 223 ( Figure 3). In MED, two different 210 alleles were sequenced from 20 individuals from four countries and six U.S. States. The alleles differ from MED allele 217 by a single deletion or from MED allele 203 by a single insertion. This was the most frequent case of intraspecific homoplasy detected in the data. The allele designated "Q2" in Figure 3 was found in seven individuals collected in Guatemala and the U.S. States of Georgia and Oregon. The mitochondrial haplotypes of all seven individuals clustered with the haplotype Q2 described in McKenzie et al. [31] indicating maternal ancestry in the western Mediterranean region [41][42][43] (data not shown). The second allele 210 in MED was found in 13 individuals collected from Israel, Morocco, and the U.S. states of California, Pennsylvania, Florida, and Michigan. Of these 13, 11 had mitochondrial haplotypes that clustered with the haplotype Q1 described in McKenzie et al. [31] indicating maternal ancestry in the eastern Mediterranean region [41][42][43]. The other two individuals, a whitefly from Morocco and a whitefly from a California lab colony founded from individuals collected in Spain, had mitochondrial haplotypes consistent with western Mediterranean maternal ancestry (data not shown).  Two different MEAM1 210 alleles were sequenced, each from a single MEAM1 individual. Both alleles differ by a single heptanucleotide deletion from the common allele 216. Two different alleles with an estimated size of 223 base pairs were sequenced from three MEAM1 individuals. Both differ by a single heptanucleotide insertion from the common allele 216. The allele labeled "Arizona" in Figure 3 was sequenced from two individuals collected in Arizona while the other allele 223 was sequenced from an individual collected in Oregon. Allele 223 is only common in MEAM1 individuals from Arizona and New York greenhouse populations [43].

Tandem repeat evolution
Bem6 in all three cryptic species of whiteflies appears to evolve via insertions and/or deletions (indels) of four different heptanucleotide motifs and that these are generally repeated in tandem ( Figure 3). Within each species, only a single indel is required to change from one allele to a progressively longer or shorter allele suggesting stepwise mutation occurred in the tandem repeat region within each cryptic species. The number of copies of heptanucleotide H3 is constant among individuals from both invasive species, MED (1 copy) and MEAM1 (3 copies), with evolution inferred as indels of the heptanucleotides designated H1 and H2 (Figure 3). Within NW, evolution is best explained by tandem indels of the heptanucleotide designated H4 (Figure 3). However, a non-tandem indel of H3 separates North American NW (alleles 195 and higher) and Columbian NW (allele 189). This is consistent with the divergent phylogenetic placement of North American and Columbian mtCO1sequences [44,45]. A single male individual collected in Florida carrying the allele 182 could not be identified to species due to insufficient sequence length in the The corresponding region of the originally described microsatellite [38] for the Australia species is shown for comparison. Four nucleotides absent from the originally described microsatellite precede the repeat region in MEAM1, MED, and NW whiteflies.   mitochondrial barcode, but the sequence at the Bem6 locus is most similar to other NW alleles. Bem6 alleles in MEAM1 and MED contain compound tandem repeats of H1-H3 while NW features only H4 in tandem (Figure 3). Within each species, the H3 motif appears to be the most stable, never showing copy number variation in tandem. This motif, AACACAC, is the most similar to the (CA 8 ) imp originally described by De Barro et al. [38]. Singleton copies of this motif are also found in both flanking regions of the originally described microsatellite (residues 45-51 and 94-100 in Table 1) and these persist in all three species studied here. The flanking region itself also appears relatively stable though a few species specific polymorphisms are present (Table 3).

Discussion
Generally the design of new microsatellites for each species separately is recommended [39], but this is not always possible when cryptic species are present or species level taxonomy is in flux. It should be noted that the Bem6 microsatellite was originally isolated from the Australia species (P. De Barro, personal communication) see Dinsdale et al. [28] so it would be interesting to see if a similar pattern of heptanucleotide tandem repeat evolution is also present in this cryptic species. Determining the ancestral state of the Bem 6 locus within each cryptic species is not possible due to low allele number and paucity of home range sampling. Columbian NW are apparently ancestral to North and Central American NW [44] so allele 189 may be the ancestral state for NW. This should be investigated further with increased sampling throughout the home range of NW. It is not possible to speculate about the ancestral state of the Bem6 locus within MED and MEAM1 due to the small number of alleles sampled in the native range. Further exploration of MED diversity in particular, including its apparently ancestral sub Saharan sub group [42], could help resolve this.

Heptanucleotide repeats
Heptanucleotide tandem repeats have not received a lot of attention and this could be due, in part, to their exclusion from some classical definitions of both microsatellites and minisatellites e.g. [3]. While many accept some general rules for classifying tandemly repetitive DNA sequences [16,39], several authors have argued that some of these rules may be arbitrary [7,46]. Heptanucleotide repeats have been shown to be more common than tetra-and penta-nucleotide repeats in several plant taxa [47] suggesting that defining microsatellites as motifs between 1 and 6 nucleotides in length may also be unnecessarily exclusive. The data presented here is consistent with stepwise evolution, indicating that the four heptanucleotide motifs probably evolve like microsatellites [48] rather than minisatellites [17]. In addition, the alleles described may also have arisen via length independent slippage [49,50] expected when copy number is low.

Compound microsatellites
For the species in this study, a microsatellite originally described as an imperfect dinucleotide repeat is characterized as a series of perfect heptanucleotide repeats. This change seems to be a function of the particular species in question as the motifs described here never occur in tandem in the original microsatellite ( Figure 2). 'Imperfections' or 'interruptions' are commonly invoked in the tandem repeat literature [16,51,52] and are characterized this way because they break up a long perfect microsatellite into two shorter microsatellites [53]. But an interruption can also lead to the creation of a new 'proto' microsatellite [54,55]. If the newly created sequence motif is duplicated adjacent to the original repeat, it will form a compound microsatellite. This may be a common occurrence as compound microsatellites are 15 times more abundant in genomic DNA than random microsatellite distribution expectations [55].
To our knowledge, this is the first report of a compound heptanucleotide tandem repeat. While the pattern both within and among species is consistent with stepwise evolution of tandem repeats, choosing among alternative alignments was difficult and probably exacerbated by the absence of guanine in the repeat area. In the end, the alignment selected was the only one found where withinspecies size-sequential alleles were only separated by one mutation step. The next best alignment also featured four different heptanucleotide motifs, but had two additional point mutations separating NW alleles 189 and 195. One possible mutation pathway among the four identified motifs is given in Figure 4 with either one or two point mutations separating each motif. Kofler et al. [55] found that the motifs comprising most compound microsatellite pairs differed by a single mutation and suggested that such mutations represent the dominant mechanism underlying the origin of compound microsatellites. The evolutionary fate of a given microsatellite 'interruption' will probably depend on the structural properties of the new motif [56], but may ultimately yield a compound microsatellite. In the evolution via nucleotide substitution scenario shown in Figure 4, the H3 motif is hypothesized to be the ancestral motif since it is present in all three cryptic species studied here as well as the reference Australia species. However, the H1 motif is common to MEAM1, MED, and NW suggesting it either arose independently in each species or was found in the ancestor of the three species and gave rise to the H4 motif in NW and the H2 motif in the common ancestor of MEAM1 and MED. Untangling the phylogenetic history and demonstrating the phylogenetic utility of this compound microsatellite will require sequencing alleles from additional members of the cryptic species complex.

Homoplasy and cryptic species
Mutations occurring in either the flanking region or the repeat region can lead to detectable size homoplasy in tandem repeat markers [57,58]. In addition, back mutations can lead to undetectable size homoplasy e.g. a 210 allele can arise either from a 203 allele or a 217 allele and distinguishing among these alternatives is impossible without parentage. The false signal given by apparently shared, but size homoplasious alleles in microsatellites is not generally considered a problem when many loci with high variability are used in population genetics [59], but see Balloux et al. [60]. Homoplasy can overestimate the frequency of shared alleles among species [51] and this has the potential to be misleading when those species are cryptic [61]. In this study, homoplasious alleles would have lead to an overestimate of alleles shared and hybridization among cryptic species if not sequenced and if not used in combination with mitochondrial DNA sequences. After sequencing, no evidence of interspecific hybridization was found at this locus.
Relative to the number of mitochondrial haplotypes found, very few B. tabaci haplotypes have spread globally [42]. Low genetic diversity is also apparent at the Bem6 nuclear locus in North American samples of invasive MEAM1 and MED. This locus initially appeared diagnostic for the cryptic species MEAM1 and MED. Even after genotyping several thousand samples, 99% of MEAM1 whiteflies had the 216 and/or 223 alleles and 98% of MED samples had the 203 and/or 210 alleles ( Table 2). But rare alleles and putative hybrids surfaced as sample size increased and, without sequencing, would have lead to misinterpretation of allele sharing among both native and invasive cryptic species and an overestimation of hybrid frequency. These pitfalls might be even greater in the native range of MEAM1 and MED due to higher genetic diversity. Researchers should therefore use caution when using microsatellites for diagnostic purposes.

Conclusions
Sequencing 11 alleles at the putatively dinucleotide Bem6 locus from three members of a cryptic species complex revealed four different heptanucleotide tandem repeat motifs. The sequencing data is consistent with step-wise evolution, suggesting the locus evolves like a microsatellite rather than a minisatellite. In addition, the alleles described may also have arisen via length independent slippage expected when copy number is low. In two of the species, a compound heptanucleotide repeat is formed and, to our knowledge, this is the first such report.
Homoplasious alleles at the Bem6 locus would have lead to an overestimate of alleles shared and hybridization among cryptic species if not sequenced. After sequencing, no evidence of interspecific hybridization remained. These results highlight the need for caution when using microsatellites for cryptic species discrimination and diagnostics.

Competing interest
Authors declare that they have no competing interests and are responsible for the content of this paper.
Authors' contributions AMD conceived and conducted experiments, analyzed data, and drafted the manuscript. PMH conceived and conducted experiments and analyzed data. RGS conceived experiments, analyzed data, and edited the manuscript. CLM initiated and conceived experiments. All authors read and approved the manuscript.

Acknowledgements
The study was funded by Lance S Osborne and CLM with funding from the nursery and floriculture initiative. We thank Paul De Barro, Greg O'Corry-Crowe, and one anonymous reviewer for their suggestions on an earlier draft of this manuscript. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. The USDA is an equal opportunity provider and employer.