Open Access

Phylogenetic prediction of cis-acting elements: a cre-like sequence in Norovirus genome?

  • Matías Victoria1,
  • Rodney Colina2,
  • Marize Pereira Miagostovich1,
  • José P Leite1 and
  • Juan Cristina2Email author
BMC Research Notes20092:176

DOI: 10.1186/1756-0500-2-176

Received: 29 May 2009

Accepted: 7 September 2009

Published: 7 September 2009

Abstract

Background

Discrete RNA structures such as cis-acting replication elements (cre) in the coding region of RNA virus genomes create characteristic suppression of synonymous site variability (SSSV). Different phylogenetic methods have been developed to predict secondary structures in RNA viruses, for high-resolution thermodynamic scanning and for detecting SSSV. These approaches have been successfully in predicting cis-acting signals in different members of the family Picornaviridae and Caliciviridae. In order to gain insight into the identification of cis-acting signals in viruses whose mechanisms of replication are currently unknown, we performed a phylogenetic analysis of complete genome sequences from 49 Human Norovirus (NoV) strains.

Findings

The complete coding sequences of NoV ORF1 were obtained from the DDBJ database and aligned. Shannon entropy calculations and RNAalifold consensus RNA structure prediction identified a discrete, conserved, invariant sequence region with a characteristic AAACG cre motif at positions 240 through 291 of the RNA dependant RNA polymerase (RdRp) sequence (relative to strain [EMBL:EU794713]). This sequence region has a high probability to conform a stem-loop.

Conclusion

A new predicted stem-loop has been identified near the 5' end of the RdRp of Human NoV genome. This is the same location recently reported for Hepatovirus cre stem-loop.

Findings

Internal base pairing that creates stem-loops and other RNA structures places constraints on sequence variability in bases required for structure formation in the genome of RNA viruses. For instance, the Hepatitis C virus (HCV) genome has a marked suppression of synonymous codon variability within several evolutionary conserved stem-loops in the core and NS5B coding regions that demonstrate their role in virus replication [13]. Discrete RNA structures such as cis-acting replication elements (cre) in the coding region of human enteroviruses (HEVs) [4] and other viruses also create characteristic suppression of synonymous site variability (SSSV), similar to that observed in HCV [5, 6]. Different phylogenetic methods have been developed to predict secondary structures in RNA viruses, like PFOLD [2, 7] or Alifold [8], for high-resolution thermodynamic scanning, and like UNAFold [9] for detecting SSSV [2]. These methods have permitted to identify suitable genome regions for an in-depth experimental analysis allowing establishing the role of the identified secondary RNA structures in translation or replication. This approach has permitted to raise the hypothesis that when SSSV (i.e. highly conserved synonymous sites in a RNA virus genome sequence alignment) takes place in a sequence region with a high probability of conforming a secondary structure (i.e. high probability of base pairing to generate a stable stem-loop), a cis-acting signal can be identified. This hypothesis has been successfully tested in different members of the family Picornaviridae, like Hepatitis A virus (HAV), Avian Encephalitis virus (AEV) and Rhinovirus [10, 11] and in members of the family Caliciviridae, like Norovirus, Sapovirus, Vesivirus and Lagovirus [12].

In order to gain insight into the identification of cis-acting signals in viruses whose mechanisms of replication are currently unknown; we tested the above hypothesis for a group of 49 Human Noroviruses (NoV), for whom complete genome sequences have been recently obtained.

The complete codes of ORF1 of 49 Human NoV were obtained from the DDBJ database and aligned using the MUSCLE program [13] (for strain names and accession numbers see Additional File 1). Once aligned, the Shannon entropy at each position of the sequence dataset was calculated [14]. This permitted to measure the relative variation in each site of a sequence alignment. The results of these studies are shown in Fig. 1.
https://static-content.springer.com/image/art%3A10.1186%2F1756-0500-2-176/MediaObjects/13104_2009_Article_314_Fig1_HTML.jpg
Figure 1

Shannon entrophy in ORF1 of Human NoV strains. The Shannon entropy at each position of the alignment is shown. Location of positions 3922 through 3973 of the alignment (which includes the AAACG motif) in a sequence region with zero entropy is shown by a red circle. A scheme showing the position of each gene in the NoV genome is shown below the graph. The location of the stem-loop is indicated sl.

Only few, discrete, genome regions in the ORF1 of Human NoV have a Shannon entropy of zero indicating that they are invariants among all NoV sequences introduced in this analysis (Fig. 1). Interestingly, one of these discrete regions has an AAACG cre sequence motif [10] at position 3948 to 3952 of the alignment. These positions correspond to positions 240 through 291 of the RNA dependant RNA polymerase (RdRp) sequence (relative to strain [EU794713]) (Fig. 1).

In order to observe if this cre motif is situated in a sequence region that has a high probability to conform a secondary structure [8, 15], we used the RNAalifold Web Server [16]. First, we obtained a graphical representation of the secondary structure in a plot of height versus position, where the height m(k) is given by the number of base pairs enclosing the base at position k in the structure (i.e. loops correspond to plateaus, hairpin loops are peaks and helices to slopes). The results of these studies are shown in Fig. 2. The AAACG cre sequence motif is embedded in a sequence region with a very high probability to conform a stem-loop structure (see Fig. 2).
https://static-content.springer.com/image/art%3A10.1186%2F1756-0500-2-176/MediaObjects/13104_2009_Article_314_Fig2_HTML.jpg
Figure 2

"Mountain plot" of a NoV ORF1 sequence region with an entrophy equals to zero. A mountain plot representing a secondary structure in a plot of height versus position is shown. Sequences from positions 3922 through 3973 of the alignment are shown at the bottom of the figure. Numbers at the top of the figure show site position in the alignment. Colors correspond to the Vienna RNA conservation coloring schema [16] (see also Fig. 3). Note that the AAACG motif (underlined in red) is predicted to be located in a loop of the secondary structure.

Then, the consensus RNA structures of the selected alignment region were folded using the RNAalifold program [16]. The resulting structures as well as the alignments were color-coded according to a coloring scheme for highlighting the mutational pattern with respect to the structure (Vienna RNA conservation coloring schema) [16]. The results of these studies are shown in Fig. 3.
https://static-content.springer.com/image/art%3A10.1186%2F1756-0500-2-176/MediaObjects/13104_2009_Article_314_Fig3_HTML.jpg
Figure 3

Conservation of a predicted secondary structure in NoV ORF1 sequences. Multiple sequence alignment across 49 Human NoV ORF1 sequences shows a consensus secondary structure from positions 3922 through 3973 of the NoV genome (positions 240 through 291 of the RdRp sequence, relative to strain [EMBL:EU794713]). The structure is shown in the dot-bracket format [16] above the alignment. Each corresponding bracket represents consensus base pairs of the alignment columns beneath. Sequences are color-coded according to consistent and compensatory mutations in the aligned sequences regarding the conserved structure (see figure text box). Pale colors indicate that a base-pair cannot be formed in some sequences of the alignment. The sequence conservation profile is shown in gray bars bellow the alignment. The conserved predicted secondary structure, color-coded according to the different types of base pairs in the corresponding alignment columns, is shown on the right side of the figure.

Using the RNAalifold program [16] we have identified the presence of a unique and conserved stem-loop near the 5' end of the RdRp coding region for the 49 Human NoV genomes analyzed (Figs. 1 and 3). This predicted structure contains a cre sequence motif (Fig. 3). Interestingly, the stem-loop predicted for Human NoV is situated near the 5' end of the RdRp (Figs. 2 and 3). This is the same location recently reported for Hepatovirus cre stem-loop [10] (Fig. 1).

RNA structure predictions are consistent with previous analyses based on the thermodynamic folding of individual sequences [12, 17, 18]. Although RNA structure is clearly not the only cause of SSSV, occurring for example also in overlapping gene sequences [19], there is an impressive co-localization of the major sites of SSSV and thermodynamically predicted secondary structures [4, 11, 12].

NoV belong to the family Caliciviridae, and they are non-enveloped viruses with positive, single-stranded RNA genomes. They also share other important features with picornaviruses, like having a VPg protein covalently linked to the 5' of the genomic RNA [20]. Nevertheless, in contrast with picornaviruses, NoV express a downstream sub-genomic (sg) transcript encoding structural genes [20]. NoV are the leading cause of outbreaks of acute gastroenteritis in humans worldwide [21].

Despite the importance of these outbreaks, our understanding of the RNA structures or sequences required for NoV replication has been limited. Previous reports have identified the poly-pyrimidine tract-binding protein (PTB), poly-A binding protein (PAB) and La autoantigen to interact with the 3' untranslated region of the Norwalk virus genome [17]. Very recent studies have identified cis-acting signals in the 5' and 3' regions as well as at the start of the sg RNA transcript of NoV [12].

As a member of the family Caliciviridae, NoV are thought to replicate in a manner typical of positive-stranded RNA viruses, through the synthesis of a full-length anti-genomic strand (reverse complement copy) using the viral RdRp translated initially from the RNA genome entering the cell [22]. The minus strand then acts as a template for the synthesis of full-length genomic RNA from which non-structural proteins are translated, including the RdRp. Features of the RdRp common to all positive-sense RNA viruses support this idea [23].

Although the presence of this new putative cis-acting signal predicted in this study was not yet investigated in vitro due to the lack of a standard cell culture to grow these viruses, the probability that this predicted structure will acts as a functional element may open new avenues to our understanding of molecular mechanisms of NoV replication.

Extensive mutagenesis studies performed in members of the family Picornaviridae, like Poliovirus (PV) and Human Rhinovirus 14 (HRV-14), revealed a critical conserved AAACA/G cre sequence motif in the 5' half of the loop sequence that is essential for its function [10]. Similar conserved motifs are present within the loops of the cre elements of other picornaviruses and are important for RNA replication [10, 24, 25].

Paul and colleagues (2003) have shown that the PV cre act as the template for VPg uridylylation through a "slide-back" mechanism catalyzed by the 3Dpol (RdRp) [24, 26, 27]. The uridylylation of VPg leads to the production of VPg-pUpU, which serves as the protein primer for new RNA synthesis [28].

Interestingly, recent studies have shown that incubation of VPg with NoV 3Dpol (RdRp) generates VPg-poly(U) and that this uridylylated VPg can prime the replication of polyadenylated RNA [29]. In contrast, replication of antigenomic RNA was not primer dependent. Moreover, on nonpolyadenylated RNA, NoV RdRp initiated RNA synthesis de novo [29]. These findings clearly show that initiation of replication of the NoV genome by the RpRp requires a VPg-protein-primed initiation of replication of polyadenylated genomic RNA and a de novo initiation of replication of antigenomic RNA [29]. Besides, very recent studies revealed that the NoV RdRp is a typical template-dependent RNA polymerase [30].

It is possible that the predicted stem-loop identified near the 5' end of the NoV RdRp coding region, which share a cre-like sequence motif with members of the family Picornaviridae [10], will be capable to perform the uridylylation of VPg. If that is the case, this will permit VPg to act as a primer for the synthesis of the minus strand RNA, in agreement with the results outlined above [30].

Declarations

Acknowledgements

We acknowledge support by CAPES (Brazil) and Universidad de la República (Uruguay) through project No. 018/08.

Authors’ Affiliations

(1)
Laboratório de Virologia Comparada e Ambiental, Instituto Oswaldo Cruz
(2)
Laboratorio de Virología Molecular. Centro de Investigaciones Nucleares, Facultad de Ciencias

References

  1. McMullar LK, Grakoui A, Evans MJ, Mihalik M, Puig AD, Branch AD, Feinstone SM, Rice CM: Evidence for a functional RNA element in the hepatitis C virus core gene. Proc Natl Acad Sci USA. 2007, 104: 2879-2884. 10.1073/pnas.0611267104.View ArticleGoogle Scholar
  2. Tuplin A, Evans DJ, Simmonds P: Detailed mapping of RNA secondary structures in core and NS5B-encoding sequences of hepatitis C virus by RNase cleavage and novel bioinformatic prediction methods. J Gen Virol. 2004, 85: 3037-3047. 10.1099/vir.0.80141-0.View ArticlePubMedGoogle Scholar
  3. You S, Stump DD, Branch AD, Rice CM: A cis-acting replication element in the sequence encoding the NS5B RNA-dependent RNA polymerase is required for hepatitis C virus RNA replication. J Virol. 2004, 78: 1352-1366. 10.1128/JVI.78.3.1352-1366.2004.PubMed CentralView ArticlePubMedGoogle Scholar
  4. Gerber K, Wimmer E, Paul AV: Biochemical and genetic studies of the initiation of Human Rhinovirus 2 RNA replication: identification of a cis-replicating element in the coding sequence of 2Apro. Virology. 2001, 75: 10979-10990. 10.1128/JVI.75.22.10979-10990.2001.View ArticleGoogle Scholar
  5. Simmonds P, Tuplin A, Evans DJ: Detection of genome-scale ordered RNA structure (GORS) in genomes of positive-stranded RNA viruses: implications for virus evolution and host persistence. RNA. 2004, 10: 1337-1351. 10.1261/rna.7640104.PubMed CentralView ArticlePubMedGoogle Scholar
  6. McKnight KL, Lemon SM: The rhinovirus type 14 genome contains an internally located RNA structure that is required for viral replication. RNA. 1998, 4: 1569-1584. 10.1017/S1355838298981006.PubMed CentralView ArticlePubMedGoogle Scholar
  7. Knudsen B, Hein J: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics. 1999, 15: 446-454. 10.1093/bioinformatics/15.6.446.View ArticlePubMedGoogle Scholar
  8. Hofacker IL, Fekete M, Stadler PF: Secondary structure prediction for aligned RNA sequences. J Mol Biol. 2002, 319: 1059-1066. 10.1016/S0022-2836(02)00308-X.View ArticlePubMedGoogle Scholar
  9. Zuker M: Calculating nucleic acid secondary structure. Curr Opin Struct Biol. 2000, 10: 303-310. 10.1016/S0959-440X(00)00088-9.View ArticlePubMedGoogle Scholar
  10. Yang Y, Yi M, Evans DJ, Simmonds P, Lemon SM: Identification of a Conserved RNA Replication Element (cre) within the 3Dpol-coding sequence of Hepatoviruses. J Virol. 2008, 82: 10118-10128. 10.1128/JVI.00787-08.PubMed CentralView ArticlePubMedGoogle Scholar
  11. Tapparel C, Junier T, Gerlach D, Cordey S, Van Belle S, Perrin L, Zdobnov EM, Kaiser L: New complete genome sequences of human rhinoviruses shed light on their phylogeny and genomic features. BMC Genomics. 2007, 8: 224-10.1186/1471-2164-8-224.PubMed CentralView ArticlePubMedGoogle Scholar
  12. Simmonds P, Karakasiliotis I, Bailey D, Chaudhry Y, Evans DJ: Goodfellow: Bioinformatic and functional analysis of RNA secondary structure elements among different genera of human and animal caliciviruses. Nucleic Acids Res. 2008, 36: 2530-2546. 10.1093/nar/gkn096.PubMed CentralView ArticlePubMedGoogle Scholar
  13. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
  14. Korber BT, Kuntsman K, Patterson B, Furtado M, McEvilly M, Levy R, Wolinsky S: Genetic differences between blood- and brain-derived viral sequences from human immunodeficiency virus type 1-infected patients: evidence of conserved elements in the V3 region of the envelope protein of brain derived sequences. J Virol. 1994, 68: 7467-7481.PubMed CentralPubMedGoogle Scholar
  15. Mathews DH, Sabina J, Zucker M, Turner H: Expanded sequence dependence of thermodynamic parameters provides robust prediction of RNA secondary structure. J Mol Biol. 1999, 288: 911-940. 10.1006/jmbi.1999.2700.View ArticlePubMedGoogle Scholar
  16. Gruber AR, Lorenz R, Bernhart SH, Neubock R, Hofacker IL: The Vienna RNA websuite. Nucleic Acids Res. 2008, W70-74. 10.1093/nar/gkn188. 36 Web Server
  17. Gutierrez-Escolano AL, Vazquez-Ochoa M, Escobar-Herrera J, Hernandez-Acosta J: La, PTB and PAB proteins bind to the 3' untranslated región of Norwalk virus genomic RNA. Biochem Biophys Res Commun. 2003, 311: 759-766. 10.1016/j.bbrc.2003.10.066.View ArticlePubMedGoogle Scholar
  18. Seal BS, Neill JD, Ridpath JF: Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes. Virus Genes. 1994, 8: 243-247. 10.1007/BF01704518.View ArticlePubMedGoogle Scholar
  19. Mizokami M, Orito E, Ohba K, Ikeo K, Lau JY, Gojobori T: Constrained evolution with respect to gene overlap of hepatitis B virus. J Mol Evol. 1997, 44: S83-S90. 10.1007/PL00000061.View ArticlePubMedGoogle Scholar
  20. Jiang X, Wang M, Wang K, Estes MK: Sequence and genomic organization of Norwalk virus. Virology. 1993, 195: 51-61. 10.1006/viro.1993.1345.View ArticlePubMedGoogle Scholar
  21. De Wit MA, Widdowson MA, Vennema H, de Bruin E, Fernandes T, Koopmans M: Large outbreak of Norovirus: the baker who should have known better. J Infect. 2007, 55: 188-193. 10.1016/j.jinf.2007.04.005.View ArticlePubMedGoogle Scholar
  22. Clarke IN, Lambden PR: The molecular biology of caliciviruses. J Gen Virol. 1997, 78: 291-301.View ArticlePubMedGoogle Scholar
  23. Koonin EV: The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. J Gen Virol. 1991, 72: 2197-2206. 10.1099/0022-1317-72-9-2197.View ArticlePubMedGoogle Scholar
  24. Rieder E, Paul AV, Kim DW, Van Boom JH, Wimmer E: Genetic and biochemical studies of poliovirus cis-acting replication element cre in relation to VPg uridylylation. J Virol. 2000, 74: 10371-10380. 10.1128/JVI.74.22.10371-10380.2000.PubMed CentralView ArticlePubMedGoogle Scholar
  25. Yin J, Paul AV, Wimmer E, Rieder E: Functional dissection of a poliovirus cis-acting replication element [PV-cre(2C)]: analysis of single and dual-cre viral genomes and proteins that bind specifically to PV-cre RNA. J Virol. 2003, 77: 5152-5166. 10.1128/JVI.77.9.5152-5166.2003.PubMed CentralView ArticlePubMedGoogle Scholar
  26. Paul AV, Yin J, Mugavero J, Rieder E, Liu Y, Wimmer E: A "slide-back" mechanism for the initiation of protein-primer RNA synthesis by the RNA polymerase of poliovirus. J Biol Chem. 2003, 278: 43951-43960. 10.1074/jbc.M307441200.View ArticlePubMedGoogle Scholar
  27. Paul AV, Rieder E, Kim DW, Van Boom JH, Wimmer E: Identification of an RNA hairpin in poliovirus RNA that served as the primary template in the in vitro uridylylation of VPg. J Virol. 2000, 74: 10359-10370. 10.1128/JVI.74.22.10359-10370.2000.PubMed CentralView ArticlePubMedGoogle Scholar
  28. Paul AV, Van Boom JH, Filippov D, Wimmer E: Protein primed RNA synthesis by purified poliovirus RNA polymerase. Nature. 1998, 393: 280-284. 10.1038/30529.View ArticlePubMedGoogle Scholar
  29. Rohayem J, Robel I, Jager K, Scheffler U, Rudoph W: Protein-Primed and De Novo initiation of RNA synthesis by Norovirus 3Dpol. J Virol. 2006, 80: 7060-7069. 10.1128/JVI.02195-05.PubMed CentralView ArticlePubMedGoogle Scholar
  30. Hogbom M, Jager K, Robel I, Unge T, Rohayem J: The active form of the norovirus RNA-dependent RNA polymerase is a homodimer with cooperative activity. J Gen Virol. 2009, 90: 281-291. 10.1099/vir.0.005629-0.View ArticlePubMedGoogle Scholar

Copyright

© Cristina et al; licensee BioMed Central Ltd. 2009

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement