Sequence dependent variations in RNA duplex are related to non-canonical hydrogen bond interactions in dinucleotide steps

Background Sequence determines the three-dimensional structure of RNAs, and thereby plays an important role in carrying out various biological functions. RNA duplexes containing Watson-Crick (WC) basepairs, interspersed with non-Watson-Crick basepairs, are the dominant structural unit and form the scaffold for the 3-dimensional structure of RNA. It is therefore crucial to understand the geometric variation in the dinucleotide steps that form the helices. We have carried out a detailed analysis of the dinucleotide steps formed by AU and GC Watson-Crick basepairs in RNA structures (both free and protein bound) and compared the results to that seen in DNA. Further, the effect of protein binding on these steps was examined by comparing steps in free RNA structures with protein bound RNA structures. Results Characteristic sequence dependent geometries are observed for the RR, RY and YR type of dinucleotide steps in RNA. Their geometric parameters show correlated variations that are different from those observed in B-DNA helices. Subtle, but statistically significant differences are seen in roll, slide and average propeller-twist values, between the dinucleotide steps of free RNA and protein bound RNA structures. Many non-canonical cross-strand and intra-strand hydrogen bonds were identified that can stabilise the RNA dinucleotide steps, among which YR steps show presence of many new unreported interactions. Conclusions Our work provides for the first time a detailed analysis of the conformational preferences exhibited by Watson-Crick basepair containing steps in RNA double helices. Overall, the WC dinucleotide steps show considerable conformational variability. Furthermore, we have identified hydrogen bond interactions in several of the dinucleotide steps that could play a role in determining the preferred geometry, in addition to the intra-basepair hydrogen bonds and stacking interactions. Protein binding affects the conformation of the steps that are in direct contact, as well as allosterically affect the steps that are not in direct physical contact.


Background
The double helical structure of nucleic acids exists in various polymorphic sub-states, of which DNA prefers B-form conformation and RNA prefers A-form. The A-form in RNA is a right-handed helix formed by stacking of Watson-Crick (WC) basepairs along with a few non-Watson-Crick (NWC) basepairs. The overall conformation of the helices is dictated by the geometry of successive dinucleotide steps, which in-turn is dictated by the chemical nature of bases involved in forming these step. A-form helix is characterized by large roll angle and negative slide values, compared to B-form and is accompanied by a narrow but deep major groove and a wide, shallow minor groove [1]. Significant progress has been made in the understanding of sequence dependent conformational preference of dinucleotide steps in DNA [2][3][4][5][6], while the geometric preference in RNA helices remains largely unexplored. Information on the geometric preference at the step level, can contribute to the overall understanding of the structural organization of RNA. A database of all possible dinucleotide steps with their step parameter values is available in the public domain [7], but there is very little explicit discussion about the conformational features of various dinucleotide steps observed in RNA.
The helical regions in DNA are comprised almost exclusively of canonical Watson-Crick (AT and GC) basepairs, which form 10 unique dinucleotide steps. On the contrary, an RNA duplex consists of canonical WC basepairs (AU and GC), that are paired along Watson-Crick edge in cis orientation (cisWW family), as well as basepairs involving the Hoogsteen and Sugar edges. In addition, other noncanonical basepairs from the cisWW family e.g. GU, GA, UU also occur frequently in the RNA duplex [8][9][10][11]. Thus, the type of dinucleotide steps that can occur in RNA is much larger and complicates any analysis of their sequence dependent conformations. In the present work, we focus only on the dinucleotide steps formed by the two canonical WC basepairs (AU and GC) of the cisWW family, which constitute a major proportion of the helical steps and compare them to the corresponding A-like and Blike DNA steps.
The diverse RNA structural motifs and equally diverse proteins that interact with RNA suggest that the conformational changes that occur during RNA-protein interaction can be characterized by changes in the protein, the RNA, or both [12]. An analysis of crystal structures of free and RNA bound proteins, suggested that the proteins do not show any significant change on binding to RNA [13]. A number of studies on RNA-protein interaction have focused on the RNA-protein interface and recognition mechanism [14][15][16], but none have analyzed the protein induced conformational change, if any, in RNA. On the other hand, the effect of protein binding on dinucleotide steps of DNA is well documented [4,[17][18][19]. Specific RNA-binding domains recognize the sequence and shape of the interacting region [20], while others interact in a non-specific manner. The wide and shallow minor groove present in RNA helix allows easy access for interaction with protein. However, sometimes, the narrow and deep major groove can also interact with proteins owing to presence of mismatch basepairs and bulges along the helix that lead to widening of the major groove [21]. Understanding the conformational changes induced by the interaction of protein, on the dinucleotide steps of the RNA helix, can help deduce general mechanism involved in RNA-protein recognition.
The geometry of dinucleotide step is mainly influenced by basepair hydrogen bonds and stacking interactions [22][23][24]. Apart from the standard hydrogen bonds involved in basepairing, additional interactions involving base, ribose sugar (especially O2′ group) and phosphate atoms in the RNA backbone were reported in RNA structures [25][26][27]. These interactions mainly occur in hairpin, internal or junction loops or as part of tertiary interactions. In the light of the developments in our understanding of 'weak hydrogen bonds' [28,29], the importance of such interactions in the structure formation, folding and stability of various macromolecules are also being investigated [30][31][32][33]. Presence of potentially weak cross-strand and intra-strand hydrogen bond interactions in dinucleotide steps of B-DNA crystal structures have been reported and analyzed [30,34,35]. In RNA crystal structures, such interactions between bases in a dinucleotide step have not been reported. Moreover, the characteristic A-like geometry seen in RNA helices can possibly prevent their formation or favor some novel interactions. Recent molecular dynamic (MD) simulation studies on modeled RNA duplexes suggest that potential interactions in a dinucleotide steps are present between exo-cyclic atoms [36]. Hence, identification of these additional stabilizing forces in various dinucleotide steps in RNA crystal structures can help understand their preferred geometry and in ab initio modeling of RNA structures.
In this work, a non-redundant RNA crystal structure dataset has been created and we have examined the intrinsic geometries of all WC basepairs and the dinucleotide steps formed by them in the helical regions. To understand the extent of conformational variability that can occur in the dinucleotide steps, the effect of protein binding on these steps was examined by comparing the helices in free RNA with protein bound RNA dataset. Further, we have carried out a systematic analysis of dinucleotide steps to identify potential hydrogen bond interactions between all four bases and correlated the occurrence of such bonds with the dinucleotide step geometry.

Preparation of dinucleotide dataset
The x-ray crystal structure dataset was created by extracting structures with resolution better than 3.0 Å from the Protein Data Bank [37]. The dataset was made nonredundant using the web servers HD-RNAS [38] and FR3D [39]. Non-standard bases and other chemically modified bases were not included in this study. RNA structures that are not bound to proteins were grouped as 'free-RNA' dataset and those that were in complex with a protein were grouped as 'bound-RNA' dataset (Table 1). A non-redundant free DNA dataset containing structures with resolution better than 2.0 Å was created to compare it with RNA datasets. The 10 dinucleotide steps in DNA helices were grouped into A-like and B-like steps, based on their Zp value [4]. They are referred to as 'ADNA' and 'BDNA' dataset respectively. The parameters obtained from crystal datasets are also compared with those of fibre diffraction models. A standard B-DNA fibre model (fibre-BDNA) was generated using NUCGEN [40]. Unlike RNA crystal structures, the basepairs in RNA fibre models in the literature have small negative propeller-twist (-2.1°). 3DNA v2.1 has the option to generate a uniform A-RNA double helix with large negative propeller-twist (-10.5°) [41,42]. We have used this RNA model (referred henceforth as 'ModelRNA') in our analysis for a more realistic comparison with crystal RNA datasets.

Intra-basepair and dinucleotide step parameters
The helical structures in each of the datasets were subjected to geometry based identification and classification of basepairs using BPFind program with default criteria [8]. Helical stems containing 4 or more basepairs alone were included in the datasets. We analyzed steps formed by AU and GC basepair combinations of the cisWW family [43]. The dinucleotide steps were grouped based on their sequence. The six intra-basepair and six dinucleotide step parameters were calculated using NUPARM program [40,44]. All the intra-basepair and dinucleotide step parameters were calculated using the default option of line joining C6 − C8 atoms as y-axis. Additionally, Zp and cup parameters were also calculated for each step. 'Zp' relates the basepairs of the step to their backbone geometries [45]. In a dinucleotide step, it gives the displacement of phosphate atoms in each strand from the midplane between the two stacked basepairs and is the best discriminator between A-form and B-form conformation [4,19]. 'Cup' is the difference in buckle parameter between the two basepairs that form the dinucleotide step. Correlation between parameters was analysed and their statistically significant difference (P < 0.01) were checked using Pearson-Correlation coefficient (r) value. The distribution of the data was shown using a Mahalanobis ellipse fitted on to datapoints with the mean as centre and to cover 90% of the datapoints in each group. The stacking area overlap between the basepairs forming a dinucleotide step was calculated using 3DNA program [42]. Stacking area overlap was calculated between bases by including all atoms, as well as the ring atoms alone (excluding exo-cyclic atoms).
The total stacking area overlap is the sum of two intrastrand and two cross-strand overlap between the 4 bases involved in the step.

Hydrogen bond analysis
Hydrogen atoms coordinates was added to all the crystal structures using REDUCE program [46]. The following criteria were used to identify hydrogen bonds (i) donor to acceptor distance (D..A) ≤ 3.8 Å, (ii) Angle D − H..A ≥ 90°. Only bonds that are observed in more than 50% of the cases in each of the 10 dinucleotide steps are discussed.

Dinucleotide steps interacting with protein
In the bound-RNA dataset, the interactions between the atoms in a dinucleotide steps and the protein atoms were identified using CONTACT program in CCP4 program suite [47]. A contact distance of ≤ 4 Å between any pair of amino acid and RNA atom was considered to be interacting. Thus, the steps in the bound RNA dataset were sub-classified into two datasets, those that are in contact with protein (cont) and those that are not in contact (non-cont). In order to assess if the difference in step parameter values between the various datasets was significant, an unpaired student-t-test was carried out.
MATLAB was used for all statistical analysis and for plotting graphs [48].

Results
The free-RNA dataset consists of 88 protein-free x-ray crystal structures, while bound-RNA dataset includes 127 structures (Table 1). Canonical WC basepairs (AU and GC) constitute more than 83% of the total basepairs in these structures and~74% of dinucleotide steps are comprised of these basepairs ( Table 2). This work focuses on the sequence dependent conformational preferences of the 10 dinucleotide steps formed by WC  The number of PDB structures included in each dataset is given within parenthesis. The steps in DNA duplex were sub-grouped into ADNA and BDNA based on their Zp value [4].
basepairs and are compared with those observed in A-DNA and B-DNA helices.

Dinucleotide step geometries
The intra-basepair parameters of WC basepairs present in the RNA datasets are comparable to that of the model structure but are characterized by large variations. The only noticeable features are that GC basepairs have higher negative buckle compared to AU basepairs, while AU basepairs show higher open angle value compared to GC (Table 3)    parameter values for the ModelRNA are also listed. A comparison of the crystal structure geometries with the model structure values indicates that these steps show some characteristic sequence dependent preferences in their geometries. In general, average propeller-twist value is higher for dinucleotide steps containing only AU basepairs (AA/ UU, AU/AU and UA/UA) and lower for steps with only GC basepairs (GG/CC, GC/GC and CG/ CG). Roll value differs between RR, RY and YR steps (YR > RR > RY), though the overall average roll value for WC basepair containing steps is lower than ModelRNA value (Table 4). It is interesting to note that among the RR sequences, the GG/CC step show slightly larger negative slide and positive Zp value, while AA/UU step has the smallest negative slide value, which is also reflected in their smaller Zp value. All other parameters have similar values.
Among RY steps, AC/GU and GC/GC steps have smallest positive roll angles and negative slide values, among all dinucleotide steps (Table 5). However, AU/AU steps have high roll and large negative average propellertwist and low slide values, with correspondingly small Zp value, as compared to AC/GU and GC/GC steps. This finding was specific to RNA since the equivalent AT/AT steps in both ADNA and BDNA did not show any such difference when compared to other RY steps. All three YR steps have larger roll and negative slide values when compared to the RR and RY steps, as well  as the ModelRNA ( Table 6). The UA/UA steps show particularly high mean roll angle (14.1°) compared to CA/ UG (11.0°) and CG/CG (11.5°). In addition, the slide and cup values for CG/CG steps have larger negative values, than those for CA/UG and UA/UA steps. Most of these sequence dependent features are unique to RNA helices, however the trends observed for them, seem to be similar to those reported earlier for DNA, particularly the trend observed for roll angle values (YR > RR > RY). Hence, an analysis of these trends and correlation between various parameters in freeRNA, ADNA and BDNA datasets has been carried out to identify any specific structure based features.

Correlation between dinucleotide step parameters
We have carried out a pair-wise correlation analysis between the various step parameters in the free-RNA dataset and compared the correlation coefficient values (r) with those of ADNA and BDNA datasets in order to identify correlations that are specific to A-form helices. The parameters that show statistically significant correlations at confidence level > 99.9% being discussed further (Figure 1, Additional file 1). The well-characterized strong correlation between roll and twist that is observed for BDNA [3] is not seen for either ADNA or free-RNA dataset. On the other hand, correlations between shift and tilt and slide and twist are present in both RNA and DNA datasets. Interestingly, the major differences between A and B-form structures are seen for correlations between basepair geometry dependent step parameters and other dinucleotide step parameters. For instance, average-propeller twist is positively correlated with roll, slide and rise in BDNA. However, in ADNA and RNA datasets it is negatively correlated with roll, twist and slide. Twist shows a significant negative correlation with cup value in BDNA, which is absent in A-form structures. Instead, roll and rise show negative correlations with cup in A-form structures, while slide shows a positive correlation with cup. Thus, overall, dinucleotide step parameters in RNA and ADNA datasets show similar correlations that are distinct from those in BDNA dataset.
We have also carried out a correlation analysis, for free-RNA dataset, considering the dinucleotide steps in each of the three sub-groups, RR, RY and YR separately  Figure 2). No significant correlation is seen between roll and twist, for any of the three sub-groups, confirming that this is absent in RNA. Several correlations between step parameter values showed same trend for RR, RY and YR type of steps, e.g. shift with tilt and slide with twist. Similarly, average propeller-twist shows significant negative correlation with slide and roll for all three dinucleotide step types. Cup also shows a negative correlation with roll in all three sub-groups. Thus, a comparison of correlations between parameters of the three step types suggests that the correlations seen in majority of the step are similar to those seen in the pooled dataset and are characteristic of A-form structure.
Effect of Protein binding on the dinucleotide step geometry The mean and standard deviation values of intra-basepair parameters for canonical AU and GC basepairs in the bound-RNA dataset were compared with that of free-RNA dataset (Table 3) Approximately 62% of the total dinucleotide steps in bound-RNA dataset are in direct contact with protein ( Table 2). To examine the direct and indirect effect of protein binding, the dataset was divided into two subdatasets: those steps that are in contact with protein (cont) and those that do not contact the protein (non-cont). The mean and standard deviation values of the step parameters and the corresponding average propeller-twist, cup and Zp value, for each of the 10 dinucleotide steps in the noncont and cont dataset are tabulated separately in Tables 4, 5 and 6. The mean values of steps parameters of non-cont and cont dataset are quite similar to each other, but differ slightly from free-RNA values. Interestingly both cont and non-cont data show large standard deviation values. In addition, the correlation analysis for dinucleotide steps in the bound-RNA dataset does not show any significant Mean values and SD of dinucleotide step parameters for steps along with average propeller-twist (Prop.av), Cup and Zp in free-RNA, non-cont and cont datasets. Other details are same as in Table 4.
difference from that of free-RNA and ADNA datasets (Additional file 1).
To check for statistical significance of the differences between the step parameters of the three datasets (free-RNA, non-cont and cont) unpaired student t-test was carried out (Figure 3). No significant difference was found between cont and non-cont datasets. However, roll, slide and average propeller-twist values for several of the dinucleotide steps in the cont dataset show significant difference (P value < 0.05) from the free-RNA dataset. Some of the steps also showed significant difference between free-RNA and non-cont dataset. Figure 2 Correlation between some dinucleotide step parameters for RR, RY and YR type steps in free-RNA dataset. Panels (a-f) show the correlations between the same pairs of dinucleotide step parameters as shown in Figure 1. A Mahalanobis ellipse fitted with the mean as centre. Correlation coefficient (r) and best-fit line calculated for each group, are also shown. An 'r' value ≥ 0.14, ≥ 0.18, ≥0.18 is significant at 99.9% confidence level for RR, RY and YR steps respectively. The data points as well as 'r' values for RR, RY and YR steps are shown in red: RR (n = 347), blue: RY (n = 210), green: YR (n = 240).

Figure 1
Correlation between dinucleotide step parameters for Watson-Crick basepair containing steps in RNA and DNA helices. The dinucleotide step parameters in each dataset are plotted along with a Mahalanobis ellipse that is fitted with the mean as centre. Correlation coefficient (r) value and best-fit line for each group are also shown. The data are colour coded as red: free-RNA, blue: ADNA, green: BDNA. For the sake of clarity, bound-RNA dataset is not included here, but shows similar trends as free-RNA. Prop.av: corresponds to average propeller-twist of both basepairs constituting a step.
Correlations between a few selected parameters are shown here (a-f). See Additional file 1 for the complete data on correlation between all parameters.

Base overlap and formation of non-canonical hydrogen bonds in dinucleotide steps
Since the parameters for RR, RY and YR steps show some significant differences from Model-RNA, we calculated base stacking overlap for these steps and compared it for the various crystal datasets and the corresponding model structures (Table 7). Figure 4 illustrates the nomenclature used to refer the bases involved in basepair overlap area calculation and the different stacking patterns for RR, RY and YR steps, in A and B-form structures. In case of BDNA dataset, all three types of steps show only intrastrand base overlap, but negligible cross-strand overlap, with RR > RY > YR and the major contribution coming from exo-cyclic atoms, in all cases. In general, the overlap increases in the crystal structure steps as compared to the fibre-BDNA model. In A-form helices, the RR, RY and YR steps show distinctly different features, with RY> > RR ≈ YR. In RR steps, high intra-strand base overlap is seen in strand I (Pur-Pur stacking) while, unlike in BDNA, there is very little overlap in strand II (Pyr-Pyr stacking) and no overlap between the cross-strand bases. RY steps show high intra-strand base overlap along both strands I and II and no cross-strand overlap, with the exo-cyclic atoms making substantial contribution. YR steps in RNA and ADNA datasets are characterized by very small intrastrand contribution and stacking arises mainly due to cross-strand overlap of purine bases, with contributions from both ring and exo-cyclic atoms. Interestingly the overlap in free-RNA crystal structure steps is smaller than in the ModelRNA, for RR and RY steps. Our findings Figure 3 Comparison of WC dinucleotide step parameters between free-RNA, cont and non-cont RNA datasets. The mean values and standard deviation (±1σ) of all the parameters are plotted. The mean of step parameters are connected by a line in all three datasets and are colour coded as Red: free-RNA; Blue: non-cont; Green: cont. Parameters that differ significantly between two datasets (with P < 0.05) are marked by '*' in Red for non-cont and cont, in Blue for free-RNA and non-cont and in Green for free-RNA and cont datasets.   The mutual overlap between bases i1-i2 and j1-j2 represent intra-strand overlap, while that between i1-j2 and j1-i2 correspond to cross-strand overlap. The blocks are drawn with the minor groove facing edge of each base shaded grey and the large blocks representing purines. The glycosidic bond attachment point is marked in black. The distinct stacking pattern of bases in RR, RY and YR steps is shown for RNA (Row 1) and BDNA (Row 2). A thick dashed line is drawn connecting the bases that show significant overlap. The base coordinates are taken from representative crystal structures (PDB_ID: 1RNA and 1BNA) and block diagrams drawn using 3DNA program [42].
suggest that the combined effect of large negative slide and lower twist value contribute towards these overlap patterns, indicating that the interactions that determine the base stacking preferences of dinucleotide steps in RNA helices are different from those seen in DNA helices. We have therefore analyzed various dinucleotide steps to see whether the base overlap patterns are related to formation of some potential non-canonical, intra-strand or cross-strand hydrogen bonds. Some of the geometric preferences seen in dinucleotide steps of DNA have been attributed to the presence of additional hydrogen bond interactions between the bases, particularly in oligo-A tracts [30,34,35]. Similarly, noncanonical hydrogen bonds between RNA bases involved in forming a dinucleotide steps can arise due to favourable intra-strand or cross-strand interactions on both major groove and minor groove side. Many such potential hydrogen bonds are possible in RNA model structure and are found to occur in crystal structures, but only those interactions that occur in more than 50% of each of the steps are discussed here. A list of such cross-strand and intra-strand interactions, along with the mean values of donor-acceptor (DA) distance, hydrogen-acceptor distance (HA) and hydrogen bond angle (DHA) in each dinucleotide step in free-RNA dataset is given in Table 8. Stick drawings of dinucleotide steps with hydrogen bonds marked for selected example (Additional file 2) from crystal structures for RR, RY and YR steps are shown in Figures 5, 6, and 7 respectively. A complete list of hydrogen bonds present identified in RNA and DNA crystal datasets and fibre model structures is given in Additional file 3. It is observed that the number of non-canonical hydrogen bonds is more in ModelRNA as compared to fibre-BDNA model. Many of these are retained, with improved hydrogen bond parameters, in the RNA crystal structures, while some potential interactions are found to occur in specific dinucleotide steps.
Among RR steps, cross-strand C-H..O hydrogen bonds are found in 85% and 59% of AA/UU and GA/UC steps respectively in freeRNA, on the minor groove side ( Figure 5 and Additional file 3). Similar interaction is also observed, though in smaller numbers, for bound RNA steps (non-cont and cont), as well as AA/TT and GA/TC steps in BDNA. The cross-strand N-H..O interaction between 6-amino group of Adenine and O4 atom of Uracil that is commonly seen in AA/TT steps of BDNA (89%) is favoured only in 40% of the AA/UU steps in freeRNA dataset. In GG/CC steps, intra-strand N-H..N interaction between the two Cytosine 4-amino groups shows significant presence in all A-form structures. It is not present in fibre-BDNA model but is seen in BDNA crystal structures with a slightly longer donoracceptor (DA) distance, as compared to RNA structures. Though a similar pair of 6-amino groups of Adenine is present in AA/UU, they do not have favourable hydrogen bond geometry. Table 8 Non Watson-Crick hydrogen bonds commonly observed in free-RNA helices Step   hydrogen bond. In addition, cross-strand interactions between the two purine bases are highly favoured in AC/GU and AU/AU steps in all RNA datasets and equivalent steps in DNA. A cross-strand N-H..O hydrogen bond is present in 83% of AC/GU in free-RNA, between 6-amino group of Adenine and O6 atom of Guanine. An even larger number (~90%) of AU/AU steps show cross-strand N-H..N interaction between the 6-amino group of Adenines, in both free-RNA and BDNA datasets. A combination of high negative average propeller-twist, smaller slide and positive roll values, seen in AU/AU steps in RNA favors this crossstrand interaction, while the intra-strand N-H..O hydrogen bond is relatively infrequent. GC/GC step does not show significant occurrence of any intra-strand or crossstrand interaction between their exo-cyclic groups.
In A-form helices, the large negative slide leads to high cross-strand overlap between the purine bases in YR steps (Table 7). Interestingly, the relative displacement of neighbouring bases within a strand, along with large positive roll, gives rise to favourable orientation of exo-cyclic groups in the A-form structure and hence all possible N-H..O and N-H..N hydrogen bonds are seen in large numbers. An intra-strand N-H..N interaction is present in CA/UG step between the 6-amino group of Adenine and 4-amino group of Cytosine; in 91% of the steps in freeRNA (Figure 7 and Additional file 3). However, the relative orientation of the 6-amino group of Adenine and O6 oxygen atom of Guanine in the major groove does not favour a crossstrand hydrogen bond between them. Intra-strand N-H..O interaction between the 4-amino group of Cytosine and O6 oxygen atom of Guanine is seen in >60% of CG/CG steps, in both strands of RNA structures, while they are absent in BDNA dataset. Almost 100% of UA/UA steps in freeRNA and 80-95% in protein bound-RNA helices, form intra-strand N-H..O interaction between 6-amino group of Adenine and O4 oxygen atom of Uracil. A cross-strand N-H..N interaction between the two 6amino groups of Adenine is also present in more than 60% of the A-like steps.
A rather unusual cross-strand N-H..N interaction is frequently observed between the 2-amino group of Guanine and N9 atom of the Purine base in CA/UG and CG/CG steps in A-like structures (Table 8 and Additional file 3). Unlike other hydrogen bonds that are present in both model structure and RNA datasets, these N2..N9 interactions are much more favourable in the crystal dataset, with Figure 7 Stick drawings of dinucleotide steps favouring cross-strand and intra-strand hydrogen bonds in YR steps. a) In CA/UG step, an N-H..N intra-strand hydrogen bond is shown, with the distance between 3′-Ade-N6 and 5′-Cyt-N4 indicated. Also, an unusual N-H..N cross-strand hydrogen bond is observed between 3′-Gua-N2 and 3′-Ade-N9. b) In CG/CG step, two N-H..O intra-strand hydrogen bonds are shown. The distance between 5′-Cyt-N4 and 3′-Gua-O6 in each strand is marked. Also, two N-H..N cross-strand hydrogen bonds are shown. The distance between strand II, 3′-Gua-N2 (donor) and strand I,3′-Gua-N9 (acceptor) is marked. Similarly, distance between strand I, 3′-Gua-N2 (donor) and strand II, 3′-Gua-N9 (acceptor) is marked. c) In UA/UA step, two N-H..O intra-strand hydrogen bonds are observed. The distance between 3′-Ade-N6 and 5′-Ura-O4 is marked in each strand. In addition, an N-H..N cross-strand hydrogen bond is shown between the two 3′-Ade-N6 groups. Other details are as in Figure 5. See Additional file 2 for details on the structures selected and hydrogen bond parameters. the mean Donor-Acceptor distance (DA) being~3.6 Å (while it is~4.1 Å in ModelRNA structure). This type of hydrogen bond is observed in~65% of CA/UG steps, (Figure 7a). A similar type of hydrogen bond is seen in~50% of CG/CG steps, with 31% showing a pair of reciprocal hydrogen bonds (Figure 7b). A combination of relatively higher values for negative cup, negative propeller twist and negative slide is characteristic of steps with reciprocal interactions between the 2-amino groups and N9 atoms of Guanines in CG/CG steps.
Thus, a number of non-canonical hydrogen bonds are present in the WC steps of both free as well as bound RNA crystal structures and their presence can be related to the sequence dependent geometries seen in the various dinucleotide steps. Overall, the percentage occurrence of these non-WC hydrogen bonds is smaller in bound dataset compared to free-RNA dataset.

Discussion
Contrary to the generally accepted view that RNA helices are uniform and rigid, the various dinucleotide steps have characteristic features and can contribute to heterogeneity in the RNA helical regions. The intra-basepair parameters, propeller-twist and buckle, of the AU and GC basepairs show the usual preferences (large propeller-twist in AT and larger buckle in GC basepairs), which can influence the dinucleotide step geometry. Roll values differentiate the three types of steps in RNA. RY steps have small roll (except for AU/AU steps), YR have high roll and RR have intermediate roll values. Interestingly, while roll values in DNA vary from small negative to small positive, a similar trend is seen with roll for YR > RR > RY steps. In B-DNA, the difference in parameters between RR, RY and YR is attributed to the effect of exocyclic groups on slide, roll and twist. Unlike the dinucleotide steps in B-DNA, which show a large variation in twist value that is strongly correlated with roll and moderately with slide, the twist values of steps in RNA helices cluster within a small range and show a significant correlation only with slide. The larger positive roll and negative slide values lead to a large number of favourable intra as well as cross-strand interactions involving the exo-cyclic groups, particularly in YR steps. Interestingly the slide values of WC steps show a significant negative correlation with average propeller-twist in RNA and a positive correlation for B-DNA. Steps with large average propeller-twist (AA/UU, AU/AU and UA/UA) have smaller negative slide values in RNA. The proximity of atoms on major groove side, arising due to high roll and negative propeller-twist prevents large negative slide. Thus, the slide parameter is directly influenced by propeller-twist of the constituent basepairs, but as the basepairs become near planar (average propeller twist ≈ 0°), slide becomes large negative in RNA and small positive in B-DNA steps. Overall, the six dinucleotide step parameters in RNA helices show mean values as well as correlated variations that are different from those observed in B-form DNA [2,4,6,19,49].
The specificity in RNA-protein interaction is thought to be mainly brought about by the exposed bases [15] which are present at helix termini, in bulges and loops. In our analysis, we find that the interactions occur mainly with the phosphate backbones and very few interactions are seen between proteins and the base atoms. In our study, majority of steps in the protein contacting (cont) and non-contacting (non-cont) dataset, showed statistically significant difference in comparison to free-RNA dataset for roll, slide and average propeller-twist. This suggests that, apart from protein induced conformation change on direct contact, it can allosterically affect the steps that are not in direct physical contact.
Recently, various additional hydrogen bond interactions have been reported from the RNA double helical regions, between the base and phosphate group oxygens (BPh) [26,27,43]. Also, the presence of weak hydrogen bonds between cross-strand amino groups in AA/TT and GA/TC steps of B-form DNA are well documented in crystal structure [50] and supported by theoretical quantum chemical calculations [51]. Similarly the presence of C-H.. O interactions were reported in B-DNA crystal structures [30]. The presence of cross-strand C-H..O interactions in AA/UU and GA/UC steps and N-H..N interaction in AU/ AU steps have been reported from MD simulations of A-RNA duplex sequences [36]. Two other cross-strand N-H..O interactions on the minor groove side in AG/CU and GG/CC steps, reported in the MD studies, occur in less than 20% of these steps in our RNA dataset. However, our analysis has confirmed the presence of other crossstrand interactions and identified novel cross-strand and intra-strand hydrogen bonds that can potentially provide added stability to the RNA dinucleotide step. The crossstrand C-H..O interaction in AT basepair containing steps, AA/TT and GA/TC, reported on the minor groove side of B-DNA crystal structures [29] are surprisingly also found to occur in a majority of AA/UU and GA/UC steps in RNA. In B-DNA helices the AA/TT and GA/TC steps have large negative average propeller twist, but near zero roll and slide, while the AA/UU and GA/UC steps in RNA have large negative average propeller-twist, but moderately positive roll and large negative slide values. Thus, it appears that the same interaction is brought about by a combination of high negative propeller-twist and two different roll-slide geometries, both of which bring the pairing atoms close. Similarly, intra-strand N-H..N interactions in GG/CC and AC/GU steps, cross-strand N-H..O interactions in AC/GU step and the cross-strand N-H..N interactions in AU/AU step are present in both A-form and B-form helices. In RNA, when compared to RR and RY steps, the YR steps show a larger number of these potential hydrogen bonds, due to their unique cross-strand stacking. The weak interaction identified in this study can contribute to stacking and thus to overall stability of the steps. This is in agreement with the results of stacking energy calculated using QM, where the values for RY and YR steps are comparable though the base overlaps as shown in Table 7 are considerably lower for YR steps.
Among YR steps, CG/CG and CA/UG steps, in addition to intra-strand N-H..O and N-H..N hydrogen bonds, they also have an unusual N-H..N interaction between 2-amino groups of Guanines and N9 atoms of Purine bases in these dinucleotide steps. Hydrogen bonds are generally associated with the electronegative character of the donor and acceptor atoms. Electrostatic potential (ESP) derived charges, as well as partial charges in AMBER and CHARMM force fields assign near zero charges to the N9 atoms in Adenosine and Guanosine [52][53][54]. However partial charges calculated for Adenosine and Guanosine using Natural Bond Orbital (NBO) analysis [55] indicate that the N9 atom is quite negative (Additional file 4). Thus, the presence of this potential hydrogen bond needs to be further examined by quantum chemical methods.
It should also be mentioned that x-ray determined crystal structures do not have coordinates of hydrogen atoms. Programs that add hydrogen atoms to the nucleotide ring atoms as well as the N atoms in the pendent amino group, place the hydrogen atoms in the plane of the base, though many QM studies suggest that these amino groups can have pyramidal geometry with hydrogen atoms being out-of-plane [35,[56][57][58][59]. The introduction of non-planar pyramidal amino hydrogen atoms can facilitate further improvement in the geometry of N-H..N as well as N-H.. O hydrogen bonds discussed here. The hydrogen bonds reported here are more prevalent in free RNA structures than protein-bound RNA helices, where they may be replaced by interactions with proteins or nearby water molecules. Flanking basepairs can also affect the formation of these weak hydrogen bonds. This sequence dependency can be studied by analysing all possible tetramer sequences in helices with the dinucleotide step of interest in the centre. However, the currently available RNA crystal structures do not have sufficient representation of all possible tetramer sequences, for a meaningful analysis.
Apart from using the well-known Watson-Crick edge, a base can pair with other bases using the Hoogsteen or Sugar edges. In our crystal dataset we focused only on the dinucleotide steps formed by WC basepair that belong to the cisWW family (~74%), to compare with equivalent dinucleotide steps in DNA helices. However, more than 32 types of cisWW steps, containing at least one non-Watson-Crick basepair, such as GU, AG and UU, are also present and constitute~17% of the total number of steps, while~9% of the steps contain bases that are paired along the Hoogsteen or Sugar edge. Hence, to get a complete picture of the RNA helical geometries, the non-canonical basepair containing steps were analysed, but the small number present for each of these step types in crystal structures poses a challenge in arriving at any statistically significant result.