Sequence characterization, molecular phylogeny reconstruction and recombination analysis of the large RNA of Tomato spotted wilt virus (Tospovirus: Bunyaviridae) from the United States

Tomato spotted wilt virus (TSWV; Tospovirus: Bunyaviridae) has been an economically important virus in the USA for over 30 years. However the complete sequence of only one TSWV isolate PA01 characterized from pepper in Pennsylvania is available. The large (L) RNA of a TSWV WA-USA isolate was cloned and sequenced. It consisted of 8914 nucleotides (nt) encoding a single open reading frame of 8640 nts in the viral-complementary sense. The ORF potentially codes for RNA-dependent RNA polymerase (RdRp) of 330.9 kDa. Two untranslated regions of 241 and 33 nucleotides were present at the 5′ and 3′ termini, respectively that shared conserved tospoviral sequences. Phylogenetic analysis using nucleotide sequences of the complete L RNA showed that TSWV WA-USA isolate clustered with the American and Asian TSWV isolates which formed a distinct clade from Euro-Asiatic Tospoviruses. Phylogeny of the amino acid sequence of all tospoviral RdRps used in this study showed that all the known TSWV isolates including the USA isolate described in this study formed a distinct and a close cluster with that of Impateins necrotic spot virus. Multiple sequence alignment revealed conserved motifs in the RdRp of TSWV. Recombination analysis identified two recombinants including the TSWV WA-USA isolate. Among them, three recombination events were detected in the conserved motifs of the RdRp. Sequence analysis and phylogenetic analysis of the L RNA showed distinct clustering with selected TSWV isolates reported from elsewhere. Conserved motifs in the core polymerase region of the RdRp and recombination events were identified.


Background
Tospoviruses (family Bunyaviridae, genus Tospovirus) are important pathogens that cause considerable economic losses to field and horticultural crops worldwide [1,2]. Tospoviruses are transmitted by thrips (Thysanoptera, Thripidae) in a persistent and circulative manner [3]. Tomato spotted wilt virus (TSWV) is one of more than 29 known Tospoviruses, and is considered as one of the top ten important plant viruses worldwide [4]. The TSWV virions are quasi-spherical and are 80-100 nm in diameter. The TSWV genome comprises of three single-stranded RNAs that are individually encapsidated by a nucleoprotein and collectively packaged inside a glycoprotein envelope. The tripartite genome of TSWV is characterized by Large (L), Medium (M) and Small (S) RNAs. The L RNA segment encodes RNA dependent RNA polymerase (RdRp) in the viral complementary sense [5]. The distinctive feature of Tospoviruses is the presence of ambisense strategy in the S and M genomic segments. The S and M genomic RNAs code for the nonstructural proteins (NSs and NSm) respectively in the viral sense, whereas the N and G N /G C are coded for in the viral complementary sense [6][7][8].
Despite the importance of TSWV as one of the most widely prevalent and persistent viral pathogens in the US, sequence information of the complete L RNA segment from the US is available for one TSWV PA01 isolate characterized from pepper in Pennsylvania [9]. The virion packages a few molecules of RdRp which ensures transcription of virion RNA to positive sense (translatable RNA) during early stages of virus infection. Thus, the tospoviral RdRp is also referred to as L-protein and performs many conserved functions with regard to virus genome replication inside the host cell. Moreover, the RdRp-encoding regions of plant pathogenic viruses have been shown to display considerable genetic variability and recombination events [10][11][12]. A recent report of the broad spectrum transgenic resistance against several Tospoviruses made use of the conserved tospoviral RdRp gene sequences [13]. Here we report the complete nucleotide sequence features of L RNA of a TSWV isolate from the USA, and searched for the conserved amino acid sequences in the core polymerase region of RdRp and potential recombination events. Results of the phylogenetic analysis of L RNA of TSWV, its RdRp, along with identification of recombination events and their implications are also discussed.

Virus source, RNA extraction, RT-PCR and sequencing
The TSWV WA-USA was maintained on tomato plants under controlled greenhouse conditions. Total RNA was extracted from infected leaves using TRIzol reagent (Invitrogen, USA) following the manufacturer's protocol and was used for cDNA synthesis using primer pairs listed in Table 1. Oligonucleotides were designed based on the sequences available in GenBank and were used to amplify the complete L RNA segment of TSWV as overlapping fragments (Fig. 1). Following RT-PCR, the resulting overlapping amplicons were cloned into pGEMT-Easy (Promega, Madison, USA). Recombinant clones were selected and the plasmid DNA was prepared and sequenced (ELIM Biopharma, Hayward, USA). At least three clones were sequenced for each of the virus genome fragments that were cloned as overlapping fragments.

Sequence annotation and analysis
The complete L RNA genomic component of TSWV WA-USA isolate was assembled and reconstructed from the overlapping clones using Bio-Edit sequence alignment editor software [14]. Pair-wise and multiple alignments were done using CLUSTAL-W in MEGA 6 [15]. Phylogenetic tree was drawn using the maximum-likelihood statistical method based on the Tamura-Nei model [16]. The test of phylogeny was performed by bootstrap method with 1000 bootstrap replications.

Detection of recombination
Recombination Detection Program-4 (RDP 4 Beta 4.27) [17] was used to identify potential recombination events among the L RNA sequences of the Tospoviruses used in this study. Recombination detection analysis was performed with default settings for all the methodologies available in RDP 4 [17], however, in the recombination detection option, the highest acceptable p value was at 0.01.

Complete L RNA segment of TSWV WA-US isolate
The complete L RNA of TSWV WA-USA isolate was 8914nts in length [GenBank Accession no. KP827649]  [2].

Molecular phylogeny
The complete L RNA sequences of known Tospoviruses were analyzed to study its molecular phylogeny. The evolutionary history was inferred by using the maximum likelihood method based on the Tamura-Nei model (Fig. 2). The phylogenetic tree showed that all known L RNA sequences of Tospoviruses formed two distinct clades that consisted of sub-clades (Fig. 2).  Table S1 and another TSWV isolate (PA01) characterized from pepper reported from USA [9]. Molecular phylogenetic relationships of RdRp amino acids sequences were inferred employing maximumlikelihood statistical method. Phylogeny reconstruction revealed that the RdRp encoded by TSWV WA-USA isolate was part of a group consisting of other TSWV isolates (Fig. 3). The phylogenetic study of RdRp sequences showed the presence of three distinct clades consistent with the distinct evolutionary lineages proposed for Tospoviruses, SVNV and BeNMV [18]. Phylogenetic analysis of the complete L RNA nucleotide sequences showed the presence of only two distinct lineages in which BeNMV clustered with TSWV isolates and SVNV clustered with Tospoviruses belonging to the Euro-Asiatic group (Fig. 2).

Conserved RdRp motifs
Sequence analysis of the core polymerase region of RdRp of the TSWV WA-USA isolate showed the presence of all the five conserved motifs (Fig. 4) characteristic of Tospovirus RdRps. These include motif A (DxxKWS), motif B (QGxxxYxSS), motif C (SDD), motif D (TxxxKK), and motif E (EFxSE) [13,19]. These conserved regions were used in transgenic research to provide broad spectrum resistance against Tospoviruses [13].

Recombination detection
Analysis of the 23 complete L RNA sequences (including that of the TSWV WA-USA and TSWV PA01 isolates) revealed ten recombination events. Details of recombination detection study including the events, major and minor parents involved, beginning and end of break-points are provided in Additional file 1: Table S4. Among the ten recombination events, one event identified isolate TSWV WA-USA as a recombinant arising from the major parent (HM581937), a TSWV-Pepper isolate from South Korea and minor parent (KC261971) another South Korean TSWV isolate (TSWV-17). The event was localized at positions 4534 and 5536 in the alignment (Fig. 5). Further, seven among the nine algorithms used (RDP, Chimaera, BootScan, 3Seq, GENE-CONV, MaxChi, SiScan and LARD, PhylPro), detected this event except LARD and PhylPro. Moreover, the recently described Korean isolate KM076651, was found to be a recombinant evolved from nine different recombination events (Fig. 5). Among these nine recombination events, three events (Event numbers 1, 7 and 9) that  Table S2 resulted in two putative recombinants (TSWV WA-USA and KM076651) were found to be involved in conserved RdRp motifs (Fig. 4 and Additional file 1: Table S4).
Despite recombination events within the conserved RdRp motifs, consensus amino acid sequences in the core polymerase region of RdRp are maintained among various TSWV isolates. This indicates the role of the functional significance of these conserved amino acid sequences in shaping viral genome evolution.

Conclusions
TSWV continues to be a production constraint to several crops in the US and elsewhere in the world, and with this study, the complete genomic sequence of L RNA of an isolate from the US is now available. Besides providing insights into the molecular phylogeny of the TSWV L RNA genomic segment, we analyzed the role of genetic recombination in the evolution of the L RNA. The results of phylogenetic analysis and recombination studies imply TSWV isolates from Euro-Asiatic region as a potential origin of TSWV USA isolate. Molecular characterization of complete genomes of other TSWV isolates prevalent in USA and their phylogenetic analysis would clearly depict the evolution of USA isolates. Identification of recombination events within the conserved region of RdRp in this study further supports the role of genetic  recombination in driving the evolution of TSWV. Conserved motifs found in the core polymerase region of the RdRp would not only be useful for further studies on the genetic diversity and genome evolution, but also on the structure-function studies of the L RNA and the RdRp.

Availability of data and materials
Tospovirus sequence alignment and corresponding phylogeny trees were submitted to TreeBASE (Accession number S18894) which can be accessed from the URL.
[http://purl.org/phylo/treebase/phylows/study/ TB2:S18894]. The data sets supporting the results of this article are included within the article and its additional files. The nucleotide sequence features described in this study is accessioned in GenBank, NCBI [Accession no. KP827649].