- Short Report
- Open Access
Influenza B virus has global ordered RNA structure in (+) and (−) strands but relatively less stable predicted RNA folding free energy than allowed by the encoded protein sequence
BMC Research Notes volume 6, Article number: 330 (2013)
Influenza A virus contributes to seasonal epidemics and pandemics and contains Global Ordered RNA structure (GORS) in the nucleoprotein (NP), non-structural (NS), PB2, and M segments. A related virus, influenza B, is also a major annual public health threat, but unlike influenza A is very selective to human hosts. This study extends the search for GORS to influenza B.
A survey of all available influenza B sequences reveals GORS in the (+) and (−)RNAs of the NP, NS, PB2, and PB1 gene segments. The results are similar to influenza A, except GORS is observed for the M1 segment of influenza A but not for PB1. In general, the folding free energies of human-specific influenza B RNA segments are less stable than allowable by the encoded amino acid sequence. This is consistent with findings in influenza A, where human-specific influenza RNA folds are less stable than avian and swine strains.
These results reveal fundamental molecular similarities and differences between Influenza A and B and suggest a rational basis for choosing segments to target with therapeutics and for viral attenuation for live vaccines by altering RNA folding stability.
In contrast to influenza A, a zoonotic pathogen that infects multiple host species, influenza B primarily infects humans and, rarely, seals [1, 2]. Influenza B also differs from influenza A by having a lower mutation rate and fewer antigenic serotypes . Though its lack of antigenic diversity bars pandemic outbreaks, influenza B contributes to seasonal occurrences of influenza, which can result in serious infections costing thousands of lives and billions of dollars [4, 5]. Influenza B has been of increasing concern lately, due to the rise in circulation of two distinct lineages of the virus: Victoria and Yamagata, which stimulated the recent switch from a trivalent vaccine (against one influenza B and two influenza A serotypes) to a quadrivalent vaccine including both influenza B serotypes [6, 7]. The viral genome is comprised of eight negative sense, or (−)RNA, segments. Segments NS, M1/BM2, and NA encode multiple protein products via alternative initiation, termination-reinitiation, and splicing, respectively .
RNA secondary structure plays important roles in the biology of many viruses: for example, in gene expression , splicing , molecular stability/life-time , and control of host gene expression . Some RNAs, such as compact viral genomes, can encode both protein information and functional RNA secondary structures . The importance of RNA structure in influenza virus protein coding regions, or (+)RNA, is now being revealed. For influenza A, structures have been described towards the 5′ end  and at the 3′ splice site [15, 16] of segment NS (+)RNA. Both structures may have a role in the regulation of splicing. When many sequences are available, predicted folding stabilities can identify RNA regions likely to have structure . A survey of all influenza A coding sequences found evidence for multiple sites with probable locally conserved RNA structure in the (+)RNA . Similar to segment NS, structures were discovered in the 5′ region and 3′ splice site of segment M. The structure at the 3′ splice site can switch between pseudoknot and hairpin conformations, respectively, burying or revealing the splice site and other splicing signals . Thus, this structure may have a role in regulation of segment M splicing.
In addition to locally conserved RNA structure, a survey of all influenza A sequences revealed global ordered RNA structure (GORS) that extends throughout (+) and (−) RNA for the NP, NS, PB2, and M1 genes (an error in our previous calculations of GORS in influenza A (−)RNA  gave the incorrect result that this orientation lacked conserved structure. Correction of this mistake revealed that genes with GORS in the (+)RNA also possessed GORS in the (−)RNA. GORS is revealed by predicting “excess” thermodynamic stability of wild-type RNA sequences versus random RNA of the same composition, as represented by a z-score :
Here ΔG°37, wild-type is the predicted folding free energy of the wild-type sequence, μ is the average predicted folding free energy of the dinucleotide randomizations, and σ is the standard deviation of the randomized population. GORS is defined as a significant negative shift in the median z-score away from an ideal non-structured RNA population (i.e. a normal distribution centered at zero). Thus, segments with a median z-score below −0.67 are considered to have GORS.
While free energy minimization has limited accuracy and, in most algorithms, forbids pseudoknots , it can on average correctly predict roughly 73% of base pairs . Estimating free energies is an easier problem. For example, structures with greater than 86% of correctly predicted base pairs typically differ from the minimum free energy structure by an average of only 5% in their ΔG°37 values . Thus, good estimations of the relative thermodynamic stability within the same segment and between wild-type and matched randomized controls is achievable.
Many RNA viruses have negative shifts in z-scores for (+)RNAs relative to unstructured sequences [25, 26], implying widespread RNA structure. Studies in bacterial mRNAs found similar patterns . Influenza A has GORS in both orientations of the NP, NS, PB2, and M gene segments. Generally in influenza A, avian strains are the most stable, followed by swine and then human . A similar trend was found for the z-scores of NP, NS, and PB2 gene segments. The exact role of GORS is unclear, but may be a mechanism for evasion of the host innate immune system  or for controlling mRNA life-time/stability . Identification of segments with and without GORS could help guide discovery of targets for small molecules and oligonucleotide therapeutics against influenza virus, since these approaches require structured and unstructured RNA targets, respectively.
This study extends to influenza B the search for global trends in RNA structure. Because only human influenza B strains are available, the folding free energies and z-scores of influenza B sequences are compared to folding free energies and z-scores of synonymous codon mutations (i.e. sequences that code for the same protein as wild-type influenza B sequences) generated in silico. Additional comparisons are made between results for influenza A and B. Similarities and differences are observed, which imply that influenza B has a distinctly different biology from influenza A.
Materials and methods
The research in our lab, including the content of this manuscript, has been performed with the approval of the University of Rochester’s research ethics committee.
Coding regions for all unique influenza B mRNAs were downloaded from the NCBI Influenza Virus Resource Page . Truncated sequences or those with ambiguous nucleotides were removed, leaving 4110 sequences: 370 in NP, 519 in NS, 363 in PB2, 339 in PB1, 350 in M1, 832 in HA, 354 in PA, and 983 in NA. RNA folding free energies for the entire coding regions were predicted by minimizing the ΔG°37 with the program RNA fold . Z-scores  were calculated for all sequences by comparing the free energy of wild-type sequences to sets of ten randomized sequences, which preserved dinucleotide content using the Simmonics Sequence Editor [31, 32]. A negative z-score implies GORS . In this work, a population of single sequences with a median z-score below −0.67 is considered to possess GORS. We will apply the same definition to a reanalysis of our previous results for influenza A .
To generate sets of synonymous codon mutants for comparison with folding free energies and z-scores of wild-type sequences, one coding region for each of the eight segments was mutated in silico to produce eight sets of 500 synonymous mutant sequences. Five hundred randomizations of one sequence from each segment was considered sufficient because the protein sequences are ~100% conserved in the available influenza B sequences. Synonymous codon mutations were made with a PERL script that randomly selected codons and made synonymous substitution at those sites, including substituting the same codon (no change). Folding free energy and z-scores were calculated as described above for wild-type. Specifically, ten dinucleotide randomizations of each of the 500 synonymous codon mutants were used for calculating 500 z-scores for each influenza B segment.
Box plots were constructed for each population of predicted free energies and z-scores. The box on each plot represents the interquartile range (IQR) which is defined as the difference between the 75th percentile (Q3) and 25th percentile (Q1) of each population. Upper and lower bounds for each plot (bars extending from the box) represent the largest and smallest data values within 1.5 × IQR of the Q3 and Q1, respectively. Values outside of this area are considered anomalous for that population.
Clear evidence for influenza B GORS is found in the (+) and (−) strands of segments NP, NS, PB2, and PB1, with NP having the most favorable median z-score (Table 1). Distributions of z-scores for these sequences were almost entirely in the negative region (Figure 1 and Additional file 1: Figure S1). The remaining coding regions have average z-scores close to zero or positive (Table 1). The z-score distributions for the sequences that did not show GORS generally centered near zero or trended towards the positive (Figure 2 and Additional file 2: Figure S2).
With the exception of HA, distributions of predicted free energies for influenza B are shifted towards more stability in the (+)RNA versus the (−)RNA (Figure 3), so (+)RNAs have more favorable predicted average folding free energies than (−)RNAs (Table 1). Free energy of folding also favored the (+)RNA for all segments in influenza A .
Unlike influenza A, there are no avian or swine sequences available to compare the relative predicted stabilities of folding in other species for each segment of influenza B. To simulate this comparison, sets of synonymous codon mutants were generated. The in silico synonymous codon mutant sets provide distributions of free energies for each influenza B coding region where the only constraint is to maintain the encoded protein product. They thus represent the potential RNA folding free energy landscape allowed by the encoded amino acid sequence. Predicted ΔG°37 indicates that wild-type sequences in the (+)RNA sense generally have less stable secondary structure than sequences with codon mutants (Table 1). Only NP breaks this trend, where the in silico (+)RNA mutants are on average less stable by 1.0 kcal/mol at 37°C. Distributions of free energies for the mutant sequences have greater spread than wild-type sequences and are also generally shifted towards more favorable thermodynamic stability versus the wild-type sequences (Figure 3). Evidently, the average thermodynamic stability of wild-type sequences is less favorable than allowed by protein coding constraints, even though global RNA structure is present in at least four coding regions. The wild-type sequences occupy a small part of the range of free energies allowed by the amino acid sequence and are distributed towards less favorable stability (Figure 3). An examination of nucleotide frequencies reveals that synonymous codon mutants have at least 2% higher GC content than wild-type sequences (Table 2).
Z-scores were also calculated for the synonymous codon mutant sets. Compared to distributions of the four wild-type sequences with evidence of GORS, all but the NS segment mutants still possess GORS. In the three cases, however, the median z-scores for mutants were more positive than for wild-type sequences (Table 1, Figures 1 and 2).
Predictions of GORS can partition RNA sequences into regions with or without strong secondary structure. Such partitioning should be helpful in identifying regions easier to target with therapeutics. For example, small molecules will bind specifically to structured regions, whereas oligonucleotide based therapeutics will bind more tightly to unstructured regions. Prediction of regions with GORS may also facilitate genome-wide probing of secondary structure [33–35] by focusing searches to regions likely to have conserved structure.
For influenza B, three of the four gene segments with GORS have homologs in influenza A that also show GORS : NP, NS, and PB2. Unlike influenza A, there is no evidence for GORS in the influenza B M1/BM2 gene. A possible explanation for this lack of GORS is that in influenza A, segment M encodes both the M1 (matrix protein) and M2 (ion channel) proteins, which are alternatively spliced, whereas in influenza B the BM2 open reading frame directly follows M1 and is translated via termination-reinitiation [36, 37]. In influenza A, local RNA structures have been described that have implications for splicing [15, 18, 19]. Perhaps GORS is absent in influenza B M1/BM2 because there is no need for RNA structures important for splicing.
In influenza B, the PB1 coding region shows strong evidence of GORS (median z-score of −1.5), in contrast to influenza A where the average z-scores are equal to or more positive than −0.5 . This suggests PB1 of influenza B must maintain structure to stabilize mRNA for some yet unknown reason that is not present for influenza A PB1. Interestingly, the (−)RNA z-score for this region is more favorable than the (+)RNA. This suggests an important role for structure in the genomic RNA for this segment, with structure in the (+)RNA representing a structural “echo”.
The result of less favorable relative thermodynamic stability of influenza B sequences when compared with a set of randomly generated synonymous codon sequences is consistent with the human host species specificity of influenza B. For influenza A, sequences specific to humans have less favorable thermodynamic stability than swine and avian species, even though protein sequence is largely conserved . However, any changes in thermodynamic stability in synonymous codon mutants for all segments appears to be independent of GORS because the average z-score for the mutants was close to zero. A decrease of CpG dinucleotide frequencies in human influenza viruses has been established . As seen in Table 2, synonymous codon mutants acquired increased GC content, which increased their predicted thermodynamic stability, compared to wild-type sequences. This is consistent with the increased GC content of avian influenza A strains compared to human influenza A strains . It appears that evolution, acting to reduce CpG frequency or other factors related to the human host, restricts the thermodynamic stability of influenza B sequences to a small portion of the available folding landscape. Thus, this thermodynamic difference may distinguish human-adapted influenza strains from strains that replicate in other host species.
This work elucidates some of the thermodynamic and structural constraints that may be acting on influenza B RNA sequences and human influenza viruses in general. Some characteristics are shared between influenza B and A: GORS is seen in NS, NP, and PB2 RNAs of both viral species. With the exception of influenza B HA, ΔG°37 favors folding in the (+)RNA over the (−)RNA, and the human-specific wild-type influenza B sequences have less favorable thermodynamic stability than allowed by the amino acid sequence. This latter trend was also seen in human influenza A viruses when compared to swine and avian strains . Differences with influenza A are also apparent: For influenza B, the PB1 RNA shows GORS, while influenza A has GORS in the M gene segment. These results imply differences in the role of RNA folding in the two viral groups. A better understanding of the constraints acting on influenza B sequences may aid in the rational attenuation of viral strains for use in vaccines, as has been recently shown with the influenza B NP segment .
Availability of supporting data
The data supporting the results of this article are included within the article (and its additional files).
Global ordered RNA structure.
Osterhaus AD, Rimmelzwaan GF, Martina BE, Bestebroer TM, Fouchier RA: Influenza B virus in seals. Science. 2000, 288: 1051-1053. 10.1126/science.288.5468.1051.
Geraci JR, St Aubin DJ, Barker IK, Webster RG, Hinshaw VS, Bean WJ, Ruhnke HL, Prescott JH, Early G, Baker AS, et al: Mass mortality of harbor seals: pneumonia associated with influenza A virus. Science. 1982, 215: 1129-1131. 10.1126/science.7063847.
Hay AJ, Gregory V, Douglas AR, Lin YP: The evolution of human influenza viruses. Phil Trans R Soc Lond B Biol Sci. 2001, 356: 1861-1870.
Molinari NA, Ortega-Sanchez IR, Messonnier ML, Thompson WW, Wortley PM, Weintraub E, Bridges CB: The annual impact of seasonal influenza in the US: measuring disease burden and costs. Vaccine. 2007, 25: 5086-5096. 10.1016/j.vaccine.2007.03.046.
Thompson WW, Shay DK, Weintraub E, Brammer L, Cox N, Anderson LJ, Fukuda K: Mortality associated with influenza and respiratory syncytial virus in the United States. JAMA. 2003, 289: 179-186. 10.1001/jama.289.2.179.
Lee BY, Bartsch SM, Willig AM: The economic value of a quadrivalent versus trivalent influenza vaccine. Vaccine. 2012, 30: 7443-7446. 10.1016/j.vaccine.2012.10.025.
Ambrose CS, Levin MJ: The rationale for quadrivalent influenza vaccines. Hum Vaccin Immunother. 2012, 8: 81-88.
Bouvier NM, Palese P: The biology of influenza viruses. Vaccine. 2008, 26 (Suppl 4): D49-D53.
Pfingsten JS, Kieft JS: RNA structure-based ribosome recruitment: lessons from the Dicistroviridae intergenic region IRESes. RNA. 2008, 14: 1255-1263. 10.1261/rna.987808.
Warf MB, Berglund JA: Role of RNA structure in regulating pre-mRNA splicing. Trends Biochem Sci. 2010, 35: 169-178. 10.1016/j.tibs.2009.10.004.
Mitton-Fry RM, DeGregorio SJ, Wang J, Steitz TA, Steitz JA: Poly(A) tail recognition by a viral RNA element through assembly of a triple helix. Science. 2010, 330: 1244-1247. 10.1126/science.1195858.
Steitz JA, Borah S, Cazalla D, Fok V, Lytle R, Mitton-Fry R, Riley K, Samji T: Noncoding RNPs of viral origin. Cold Spring Harb Perspect Biol. 2011, 3: a005165-10.1101/cshperspect.a005165.
Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J: A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004, 32: 4925-4936. 10.1093/nar/gkh839.
Ilyinskii PO, Schmidt T, Lukashev D, Meriin AB, Thoidis G, Frishman D, Shneider AM: Importance of mRNA secondary structural elements for the expression of influenza virus genes. OMICS. 2009, 13: 421-430. 10.1089/omi.2009.0036.
Gultyaev AP, Heus HA, Olsthoorn RC: An RNA conformational shift in recent H5N1 influenza A viruses. Bioinformatics. 2007, 23: 272-276. 10.1093/bioinformatics/btl559.
Gultyaev AP, Olsthoorn RC: A family of non-classical pseudoknots in influenza A and B viruses. RNA Biol. 2010, 7: 125-129. 10.4161/rna.7.2.11287.
Washietl S, Hofacker IL, Stadler PF: Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A. 2005, 102: 2454-2459. 10.1073/pnas.0409169102.
Moss WN, Priore SF, Turner DH: Identification of potential conserved RNA secondary structure throughout influenza A coding regions. RNA. 2011, 17: 991-1011. 10.1261/rna.2619511.
Moss WN, Dela-Moss LI, Kierzek E, Kierzek R, Priore SF, Turner DH: The 3' splice site of influenza A segment 7 mRNA can exist in two conformations: a pseudoknot and a hairpin. PLoS One. 2012, 7: e38323-10.1371/journal.pone.0038323.
Priore SF, Moss WN, Turner DH: Influenza A virus coding regions exhibit host-specific global ordered RNA structure. PLoS One. 2012, 7: e35989-10.1371/journal.pone.0035989.
Clote P, Ferre F, Kranakis E, Krizanc D: Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA. 2005, 11: 578-591. 10.1261/rna.7220505.
Sperschneider J, Datta A: An introduction to RNA structure and pseudoknot prediction. In Algorithms in Computational Molecular Biology. 2011, John Wiley & Sons, Inc, 521-546. 10.1002/9780470892107.ch24.
Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH: Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci U S A. 2004, 101: 7287-7292. 10.1073/pnas.0401799101.
Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999, 288: 911-940. 10.1006/jmbi.1999.2700.
Simmonds P, Tuplin A, Evans DJ: Detection of genome-scale ordered RNA structure (GORS) in genomes of positive-stranded RNA viruses: implications for virus evolution and host persistence. RNA. 2004, 10: 1337-1351. 10.1261/rna.7640104.
Davis M, Sagan SM, Pezacki JP, Evans DJ, Simmonds P: Bioinformatic and physical characterizations of genome-scale ordered RNA structure in mammalian RNA viruses. J Virol. 2008, 82: 11824-11836. 10.1128/JVI.01078-08.
Katz L, Burge CB: Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res. 2003, 13: 2042-2051. 10.1101/gr.1257503.
Deutscher MP: Degradation of RNA in bacteria: comparison of mRNA and stable RNA. Nucleic Acids Res. 2006, 34: 659-666. 10.1093/nar/gkj472.
Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, Ostell J, Lipman D: The influenza virus resource at the national center for biotechnology information. J Virol. 2008, 82: 596-601. 10.1128/JVI.02005-07.
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P: Fast folding and comparison of RNA secondary structures. Monatsh Chem. 1994, 125: 167-188. 10.1007/BF00818163.
Simmonds P, Smith DB: Structural constraints on RNA virus evolution. J Virol. 1999, 73: 5787-5794.
Simmonds P: SSE: a nucleotide and amino acid sequence analysis platform. BMC Res Notes. 2012, 5: 50-10.1186/1756-0500-5-50.
Lucks JB, Mortimer SA, Trapnell C, Luo S, Aviran S, Schroth GP, Pachter L, Doudna JA, Arkin AP: Multiplexed RNA structure characterization with selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc Natl Acad Sci U S A. 2011, 108: 11063-11068. 10.1073/pnas.1106501108.
Kertesz M, Wan Y, Mazor E, Rinn JL, Nutter RC, Chang HY, Segal E: Genome-wide measurement of RNA secondary structure in yeast. Nature. 2010, 467: 103-107. 10.1038/nature09322.
Underwood JG, Uzilov AV, Katzman S, Onodera CS, Mainzer JE, Mathews DH, Lowe TM, Salama SR, Haussler D: FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. Nat Meth. 2010, 7: 995-1001. 10.1038/nmeth.1529.
Powell ML, Napthine S, Jackson RJ, Brierley I, Brown TD: Characterization of the termination-reinitiation strategy employed in the expression of influenza B virus BM2 protein. RNA. 2008, 14: 2394-2406. 10.1261/rna.1231008.
Horvath CM, Williams MA, Lamb RA: Eukaryotic coupled translation of tandem cistrons: identification of the influenza B virus BM2 polypeptide. EMBO J. 1990, 9: 2639-2647.
Greenbaum BD, Levine AJ, Bhanot G, Rabadan R: Patterns of evolution and host gene mimicry in influenza and other RNA viruses. PLoS Pathog. 2008, 4: e1000079-10.1371/journal.ppat.1000079.
Rabadan R, Levine AJ, Robins H: Comparison of avian and human influenza A viruses reveals a mutational bias on the viral genomes. J Virol. 2006, 80: 11887-11891. 10.1128/JVI.01414-06.
Wanitchang A, Narkpuk J, Jaru-ampornpan P, Jengarn J, Jongkaewwattana A: Inhibition of influenza A virus replication by influenza B virus nucleoprotein: an insight into interference between influenza A and B viruses. Virology. 2012, 432: 194-203. 10.1016/j.virol.2012.06.016.
The authors thank Prof. David H. Mathews for helpful discussions. This work was supported by NIH RO1 GM22939. SFP is a trainee in the Medical Scientist Training Program funded by NIH T32 GM07356. The contents of this work are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
The authors declare that they have no competing interests.
WNM and SFP jointly conceived the experiments, collected and analyzed the data, and wrote the manuscript. DHT analyzed the results and helped write and revise the manuscript. All authors read and approved the final manuscript.
Salvatore F Priore, Walter N Moss contributed equally to this work.
Electronic supplementary material
Additional file 1: Figure S1: Frequency distributions (in percent) of z-scores for influenza coding regions with evidence of global ordered RNA structure: top, middle, and bottom rows are for the (+)RNA, (−)RNA, and synonymous codon mutant (+)RNA, respectively. (PDF 373 KB)
Additional file 2: Figure S2: Frequency distributions (in percent) of z-scores for influenza coding regions with no evidence of global ordered RNA structure: top, middle, and bottom rows are for the (+)RNA, (−)RNA, and synonymous codon mutant (+)RNA, respectively. (PDF 401 KB)
About this article
Cite this article
Priore, S.F., Moss, W.N. & Turner, D.H. Influenza B virus has global ordered RNA structure in (+) and (−) strands but relatively less stable predicted RNA folding free energy than allowed by the encoded protein sequence. BMC Res Notes 6, 330 (2013). https://doi.org/10.1186/1756-0500-6-330
- RNA secondary structure
- Influenza A
- Influenza B
- Structural bioinformatics