- Short Report
- Open Access
Incomplete homogenization of 18 S ribosomal DNA coding regions in Arabidopsis thaliana
BMC Research Notes volume 4, Article number: 93 (2011)
As a result of concerted evolution, coding regions of ribosomal DNA sequences are highly conserved within species and variation is generally thought to be limited to a few nucleotides. However, rDNA sequence variation has not been systematically examined in plant genomes, including that of the model plant Arabidopsis thaliana whose genome was the first to be sequenced.
Both genomic and transcribed 18 S sequences were sampled and revealed that most deviation from the consensus sequence was limited to single nucleotide substitutions except for a variant with a 270 bp deletion from position 456 to 725 in Arabidopsis numbering. The deletion maps to the functionally important and highly conserved 530 loop or helix18 in the structure of E. coli 16 S. The expression of the deletion variant is tightly controlled during developmental growth stages. Transcripts were not detectable in young seedlings but could be amplified from RNA extracts of mature leaves, stems, flowers and roots of Arabidopsis thaliana ecotype Columbia. We also show polymorphism for the deletion variant among four Arabidopsis ecotypes examined.
Despite a strong purifying selection that might be expected against functionally impaired rDNAs, the newly identified variant is maintained in the Arabidopsis genome. The expression of the variant and the polymorphism displayed by Arabidopsis ecotypes suggest a transition state in concerted evolution.
In eukaryotes, the 18 S, 5.8 S, and 25 S rRNAs are encoded as a single transcript from rDNA repeats, arranged in head to tail arrays and separated by spacer regions. Within a given species, rDNA repeats are often identical, which has lead to the proposal that rDNA loci undergo a concerted evolution [1, 2]. Concerted evolution results in rapid horizontal homogenization of a select variant through a number of molecular processes such as unequal crossing over and gene conversion. Additionally, functionally constrained regions such as those encoding the 18 S and 25 S genes are subject to a strong purifying selection resulting in a high degree of conservation across species .
Nevertheless, an increasing number of studies motivated by phylogenetic analysis have uncovered the presence of divergent rDNA paralogs and pseudogenes in a number of taxa [4–7]. Systematic studies on the extent and nature of sequence variation have been limited. Indeed, in fully sequenced genomes, regions of rDNA repeats cannot be tiled into contigs and thus megabase size gaps remain unassembled. In Drosophila and several fungal genomes rDNA variability was examined using traces from whole-genome shotgun sequencing projects [8–11]. All four studies point to the existence of a small degree of polymorphism among rDNA repeats, with the vast majority of polymorphic sites located in the spacer regions. For example, out of 227 polymorphic sites detected in several yeast strains, only 44 sites mapped to the rRNA-encoding genes . Within the coding regions, polymorphisms were three to eight times more frequent in the expansion segments compared with the conserved core regions that are functionally constrained.
To date, sequence level variation in Arabidopsis rDNA coding regions remains unexplored. The BAC-end sequencing approach taken for the Arabidopsis thaliana genome precludes the use of trace sequences to examine sequence variation . There are approximately 1200-1500 repeats per diploid genome, at the tips of chromosomes 2 and 4, with two rDNA arrays that could be distinguished by RFLP analysis [12, 13]. A single complete rDNA unit is also found in the centromeric region of chromosome 3. Here we report on the presence of a 270 bp deletion variant of 18 S rDNA that suggests Arabidopsis rDNA arrays are in transition stages of concerted evolution.
BAC-end sequencing of the Arabidopsis genome precludes direct analysis from trace sequences, thus to sample sequence variation of 18 S rDNA, genomic DNA was amplified. A total of 47 individual clones were generated. After sequence assembly with CAP3, two contigs were identified with 44 clones in the first contig and 3 clones in the second contig. The first contig was identical to the 18 S sequence from At3g41768 (GenBank ID: 186510611) and the vast majority of the clones fell in that contig group. Among those 44 clones, sequence variation was very limited, with only a total of 6 clones showing single nucleotide polymorphisms (Table 1). The level of polymorphism captured is 0.3% per fragment length and 13.6% per sequenced clones. Thus sequence analysis limited to the first contig supports the view that rDNA sequences are overall highly homogenous. However, the second contig identified, drastically deviated from the canonical sequence. It lacked a 270 bp fragment corresponding to positions 456-725 in the canonical 18 S sequence (Figure 1). The deletion encompasses the functionally important helix18 critical for ribosome function. In addition, a A151C mutation was also found to be characteristic of the second contig. BLAST searches were carried out to identify similar variants in GenBank but none were detected. Thus the 270 bp deletion variant was new and unusual. A total of 3 clones, obtained from 2 independent samples fell in the second contig group. The deleted variant was thus detected in this experiment at a frequency of 6.4%.
To confirm that the observed deletion is not a PCR or cloning artifact, we directly amplified the deleted variant from genomic DNA using a new reverse primer. The new reverse primer was designed such that the primer overlapped the deleted region and would only anneal to the new variant. PCR was carried out with a polymerase lacking proofreading activity. Plasmids with the full length and deleted 18 S variants served as negative and positive controls. A bright band at 465 bp was consistently obtained from the deleted variant while no visible band or a faint band of 730 bp was amplified from the full length 18 S plasmid. As shown in Figure 2, the expected 465 bp fragment was generated from Arabidopsis genomic DNA. The PCR product was further sequenced to ensure an 18 S fragment was amplified. These results confirm the presence of a new rDNA variant with 270 bp deletion in the Arabidopsis genome.
To examine whether particular sequences were enriched in the transcribed pool of 18 S genes, total RNA was extracted, DNAseI treated and reverse transcribed. PCR was performed on cDNA, 18 S fragments cloned, and a total of 34 clones were sequenced. All sequences aligned with the consensus, however a high number of polymorphism was observed (Table 2). Indeed deviation from the consensus sequence was observed at 30 different positions, with most clones having single nucleotide polymorphisms. The level of polymorphism is 1.7% per fragment length, and 88.2% per sequenced clones, thus about 6 fold higher than that observed for rDNA. As shown in Table 3, the majority of polymorphisms correspond to transitions (76.7%) with T/C transitions being the most frequent (43.3%).
The deleted variant was not found among the 34 rRNA clones, raising the question of its expression. We sought to characterize its expression in roots, leaves, stems and flowers of WT Arabidopsis. RT-PCR was performed after treatment of the RNA extracts with DNAseI to prevent any amplification from genomic DNA. Lack of contamination with genomic DNA was verified by the sole amplification of a 560 bp fragment from the internal control eIF4A. The deleted variant was detected in mature roots, leaves, stems and flowers but not in young seedlings (Figure 3). Thus the deleted variant is transcribed throughout the plant, except in young seedlings suggesting developmental control of expression.
To examine whether the 270 bp deletion is unique to A. thaliana ecotype Columbia, we tested for its presence in 3 other ecotypes: Landsberg erecta, Bay and Shahdara. The deletion variant was found in Landsberg erecta, but not Bay and Shahdara indicating polymorphism in Arabidopsis populations (Figure 4).
Discussion and Conclusion
Systematic analysis of sequence variation in 18 S rDNA of Arabidopsis thaliana revealed that overall sequences were highly homogenous except for the 270 bp deletion variant. More single nucleotide variants were observed in rRNA sequences, most likely due to the fidelity of reverse transcriptase. The discovery of a 270 bp deletion variant highlights the poor characterization of rDNA variants in sequenced genomes. This variant was not detected during BAC-end sequencing of the Arabidopsis genome suggesting it is not found as an isolated repeat. Rather, it is likely to be found embedded among the canonical variants at the tips of chromosomes 2 or 4, the two regions that harbor rDNA repeats. The frequency with which we have detected it in this study, 6.4% is surprisingly high. We have not yet ruled out the possibility of preferential amplification of the deleted variant because of its smaller size and simpler secondary structure. However, given estimates of 1200-1500 rDNA repeats, it is possible that multiple copies of this novel variant may be present in the A. thaliana diploid genome.
The existence of such a divergent 18 S sequence is at odds with highly homogenous sequences that might be expected from concerted evolution. Intra-individual polymorphic rDNAs may subsist either when concerted evolution is impaired by the location of rDNA repeats in non-homologous chromosomes, in polyploids and interspecific hybrids [14–19] or, when the rate of mutation exceeds that of concerted evolution [20, 21]. Given the ancient polyploidy of the Arabidopsis genome  and the otherwise homogeneity of 18 S rDNAs observed, the presence of the deletion variant is unlikely to reflect impaired homogenization. Instead, it probably represents a relatively new mutation and repeats in transition stages of concerted evolution.
Our results show that Arabidopsis thaliana accessions are polymorphic for the presence of the deletion variant. Arabidopsis accessions generally exhibit a relatively high degree of polymorphism that is often shared worldwide; yet some population structure and isolation by distance is evident [22–24]. The availability of such large scale population data in Arabidopsis will enable studies on the inheritance pattern of the deletion variant and evolution of rDNA in Arabidopsis.
The extent of the observed deletion is unprecedented. Indels reported for rDNA coding regions within a species generally concern one or two nucleotides. The 270 bp deletion encompasses the universally conserved helix 18 or '530 loop', critical for ribosome function. Crystal structures of bacterial 30 S ribosomal subunit have established the role of helix 18 in decoding. Correct binding of mRNA and cognate tRNA in the A site of the ribosome induces conformational change of G530, which then interacts with the second position of the anticodon and the third position of the codon . Analysis of 16 S mutants in bacteria corroborates the importance of helix 18. Typically, mutations in helix 18 result in a lethal phenotypes [26, 27]. The importance of this region is also underscored by the fact that it is the target of several antibiotics.
Despite the fact that a functionally critical helix is missing, our results show that the deleted variant is expressed. Several studies in Arabidopsis have revealed that rDNA expression is modulated by a number of factors. Only a fraction of all the rDNA repeats are transcribed because of dosage compensation mechanisms that involve large scale silencing and a similar mechanism operates in silencing specific arrays in hybrids [28–30]. Additionally this large scale silencing is under the control of developmental switches as evidenced by onset during early stages of seed germination [31, 32]. Studies focusing on 5 S rDNA have also shown that aberrant repeats are also silenced. Indeed, specific 5 S rRNA loci were shown to be methylated to prevent production of aberrant transcripts [32, 33].
If misfolded, mutated or non-functional rRNA are transcribed, several quality control mechanisms lead to their degradation before or after ribosome biogenesis. These include the exosome mediated quality control of misfolded pre-rRNAs following their polyadenylation [34–37] as well as the 'non-functional rRNA decay' leading to decreased stability of the mature rRNA contained in fully assembled ribosomes and ribosomal subunits .
It is thus surprising that we have been able to detect the expression of a significantly truncated variant. The expression of the deleted variant appears to be under tight control in young seedlings and relaxed during later stages of development. Whether this control is strictly dependent on developmental stage or sequence is not clear. Thus, this finding opens up the possibility of investigating factors controlling the expression of such aberrant 18 S rRNAs in plants.
In conclusion, this study documents the existence of a new 18 S variant with a 270 bp deletion and demonstrates the incomplete homogenization of rDNA coding regions in Arabidopsis thaliana. Our results also show that Arabidopsis accessions are polymorphic for this variant, which open-up the possibility of investigating the evolution and inheritance pattern of aberrant rDNA variants in the context of studies that examine the population structure of Arabidopsis. In addition, its expression is dependent on the developmental stage of the plant, with tight control in seedlings suggesting that transcriptional or post-transcriptional silencing mechanisms are at play.
For cloning 18 S rDNA, genomic DNA was extracted from Arabidopsis thaliana ecotype Columbia (Col-1) grown for 10 days under sterile conditions on MS media  using a modified CTAB method . For cloning transcribed 18 S rDNA, total RNA was extracted from 100 mg plant material using a Plant RNAeasy kit (Qiagen) and treated with DNAseI (Invitrogen) according to the manufacturer's protocol. The total RNA was then reverse transcribed with random primers using the SuperScript RT-PCR system (Invitrogen).
18 S sequences were PCR amplified from two independent extracts using Platinum Pfx polymerase (Invitrogen) with FwdFull 5'CACC TACCTGGTTGATCCTGCCA3' and RevFull 5 'ATCCTTCCGCAGGTTCAC 3'primers. The primer pair amplifies a 1803 bp fragment from the 18 S rDNA, At3g41768 (GenBank ID: 186510611). The PCR enhancer reagent was included in the reaction. The PCR reaction was carried out for 30 cycles with an annealing temperature of 58°C for 30 seconds and an extension time of two and a half minutes at 68°C. The PCR product running at about 1.8 kb was excised from the gel, cleaned and cloned in the directional pENTR-D-TOPO vector (Invitrogen). The presence of an insert was verified by digesting plasmid preps with Hpa I and Eco RV.
A total of 47 clones generated from genomic DNA and 34 clones generated from cDNA were sequenced with universal FwdM13 and RevM13 as well as 18 S internal primers 5'TCGATGGTAGGATAGTGG3' and 5'ACATCTAAGGGCATCACA3' to cover the entire length of the insert. Trace files were imported in CodonCode Aligner V.3.0.1 (CodonCode Corp.); vector sequences removed and assembled using CAP3 . Sequences were analyzed processed in CodonCode. Low quality bases (q < 20) were automatically replaced by that of the consensus sequence and all other discrepancies resolved manually. The level of polymorphism was calculated per fragment length (total number of clones deviating from the consensus/length of 18SrDNA which is 1808 bp) and per sequenced clones (total number of clones deviating from the consensus/total number of sequenced clones).
For the detection of the presence and expression of the deleted variant, genomic DNA and total RNA were extracted from Arabidopsis thaliana ecotype Columbia (Col-1) as above. RNA samples were obtained from 10 days old seedlings or roots of two weeks old plants grown under sterile conditions as well as mature leaves, stems and flowers from plants grown in soil for 5 weeks. For testing polymorphisms, DNA was extracted from ecotypes Landsberg erecta, Bay and Shahdara.
To specifically detect the presence of the deleted variant, the FwdFull primer as above was used with a new reverse primer, RevDel 5' AGGCACGACCCGGCCAGG 3'. The two primers amplify a 460 bp fragment exclusively from the deleted variant when a polymerase lacking proofreading activity is used. Hence, the Platinum SuperMix (Invitrogen) was used for the detection of the deleted variant. Amplification of eukaryotic translation initiation factor 4A fragments (At3G19760) served as internal controls. Primers FwdeIF4A: 5'TAGAAGAGGCGGTGGAGCTA 3' and ReveIF4A: 5'TCTGGTCCTTGAACCCTCTG 3'were designed such that amplification from genomic DNA would result in a 872 bp fragment and amplification from cDNA would result in a 560 bp fragment.
For visualization of the deleted region on the secondary structure of Arabidopsis 18 S ribosomal RNA, the structure was downloaded from the RNA STRAND database  and edited using Inkscape version 0.48.0-1.
The sequence of the 18 S variant with 270 bp deletion has been deposited in GenBank under the Accession No. GQ380689
Dover GA: Linkage disequilibrium and molecular drive in the rDNA gene family. Genetics. 1989, 122: 249-52.
Eickbush TH, Eickbush DG: Finely orchestrated movements: evolution of the ribosomal RNA genes. Genetics. 2007, 175: 477-85. 10.1534/genetics.107.071399.
Nei M, Rooney AP: Concerted and birth-and-death evolution of multigene families. Annu Rev Genet. 2005, 39: 121-52. 10.1146/annurev.genet.39.073003.112240.
Marquez LM, Miller DJ, MacKenzie JB, Van Oppen MJ: Pseudogenes contribute to the extreme diversity of nuclear ribosomal DNA in the hard coral Acropora. Mol Biol Evol. 2003, 20: 1077-86. 10.1093/molbev/msg122.
Ruggiero MV, Procaccini G: The rDNA ITS region in the lessepsian marine angiosperm Halophila stipulacea. J Mol Evol. 2004, 58: 115-21. 10.1007/s00239-003-2536-0.
Keller I, Chintauan-Marquier IC, Veltsos P, Nichols RA: Ribosomal DNA in the grasshopper Podisma pedestris: escape from concerted evolution. Genetics. 2006, 174: 863-74. 10.1534/genetics.106.061341.
Xu J, Zhang Q, Xu X, Wang Z, Qi J: Intragenomic variability and pseudogenes of ribosomal DNA in stone flounder Kareius bicoloratus. Mol Phylogenet Evol. 2009, 52: 157-66. 10.1016/j.ympev.2009.03.031.
Stage DE, Eickbush TH: Sequence variation within the rRNA gene loci of 12 Drosophila species. Genome Res. 2007, 17: 1888-97. 10.1101/gr.6376807.
Ganley AR, Kobayashi T: Highly efficient concerted evolution in the ribosomal DNA repeats: total rDNA. Genome Res. 2007, 17: 184-91. 10.1101/gr.5457707.
Simon UK, Weiss M: Intragenomic variation of fungal ribosomal genes is higher than previously. Mol Biol Evol. 2008, 25: 2251-4. 10.1093/molbev/msn188.
James SA, O'Kelly MJ, Carter DM, Davey RP, van Oudenaarden A, Roberts IN: Repetitive sequence variation and dynamics in the ribosomal DNA array of. Genome Res. 2009, 19: 626-35. 10.1101/gr.084517.108.
AGI: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.
Copenhaver GP, Pikaard CS: Two-dimensional RFLP analyses reveal megabase-sized clusters of rRNA gene. Plant J. 1996, 9: 273-82. 10.1046/j.1365-313X.1996.09020273.x.
Jellen EN, Phillips RL, Rines HW: Chromosomal localization and polymorphisms of ribosomal DNA in oat (Avena spp.). Genome. 1994, 37: 23-32. 10.1139/g94-004.
Vogler AP, DeSalle R: Evolution and phylogenetic information content of the ITS-1 region in the tiger beetle Cicindela dorsalis. Mol Biol Evol. 1994, 11: 393-405.
Campbell CS, Wojciechowski MF, Baldwin BG, Alice LA, Donoghue MJ: Persistent nuclear ribosomal DNA sequence polymorphism in the Amelanchier agamic complex (Rosaceae). Mol Biol Evol. 1997, 14: 81-90.
Wendel JF: Genome evolution in polyploids. Plant Mol Biol. 2000, 42: 225-49. 10.1023/A:1006392424384.
Gaut BS, Le Thierry d'Ennequin M, Peek AS, Sawkins MC: Maize as a model for the evolution of plant nuclear genomes. Proc Natl Acad Sci USA. 2000, 97: 7008-15. 10.1073/pnas.97.13.7008.
Peterson A, Levichev IG, Peterson J: Systematics of Gagea and Lloydia (Liliaceae) and infrageneric classification of Gagea based on molecular and morphological data. Mol Phylogenet Evol. 2008, 46: 446-65. 10.1016/j.ympev.2007.11.016.
Crease T, Lynch M: Ribosomal DNA Variation in Daphnia pulex. Mol Biol Evol. 1991, 8: 620-640.
Linares AR, Bowen T, Dover GA: Aspects of nonrandom turnover involved in the concerted evolution of intergenic spacers within the ribosomal DNA of Drosophila melanogaster. J Mol Evol. 1994, 39: 151-9.
Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng H, Bakker E, Calabrese P, Gladstone J, Goyal R, et al: The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 2005, 3: e196-10.1371/journal.pbio.0030196.
Francois O, Blum MG, Jakobsson M, Rosenberg NA: Demographic history of european populations of Arabidopsis thaliana. PLoS Genet. 2008, 4: e1000075-10.1371/journal.pgen.1000075.
Platt A, Horton M, Huang YS, Li Y, Anastasio AE, Mulyati NW, Agren J, Bossdorf O, Byers D, Donohue K: The scale of population structure in Arabidopsis thaliana. PLoS Genet. 6: e1000843-10.1371/journal.pgen.1000843.
Ogle JM, Brodersen DE, Clemons WM, Tarry MJ, Carter AP, Ramakrishnan V: Recognition of cognate transfer RNA by the 30 S ribosomal subunit. Science. 2001, 292: 897-902. 10.1126/science.1060612.
Yassin A, Fredrick K, Mankin AS: Deleterious mutations in small subunit ribosomal RNA identify functional sites. Proc Natl Acad Sci USA. 2005, 102: 16620-5. 10.1073/pnas.0508444102.
Fan-Minogue H, Bedwell DM: Eukaryotic ribosomal RNA determinants of aminoglycoside resistance and their role. RNA. 2008, 14: 148-57. 10.1261/rna.805208.
Lawrence RJ, Earley K, Pontes O, Silva M, Chen ZJ, Neves N, Viegas W, Pikaard CS: A concerted DNA methylation/histone methylation switch regulates rRNA gene dosage control and nucleolar dominance. Mol Cell. 2004, 13: 599-609. 10.1016/S1097-2765(04)00064-4.
Probst AV, Fagard M, Proux F, Mourrain P, Boutet S, Earley K, Lawrence RJ, Pikaard CS, Murfett J, Furner I, et al: Arabidopsis histone deacetylase HDA6 is required for maintenance of transcriptional gene silencing and determines nuclear organization of rDNA repeats. Plant Cell. 2004, 16: 1021-34. 10.1105/tpc.018754.
Preuss SB, Costa-Nunes P, Tucker S, Pontes O, Lawrence RJ, Mosher R, Kasschau KD, Carrington JC, Baulcombe DC, Viegas W, et al: Multimegabase silencing in nucleolar dominance involves siRNA-directed DNA methylation and specific methylcytosine-binding proteins. Mol Cell. 2008, 32: 673-84. 10.1016/j.molcel.2008.11.009.
Pontes O, Lawrence RJ, Silva M, Preuss S, Costa-Nunes P, Earley K, Neves N, Viegas W, Pikaard CS: Postembryonic establishment of megabase-scale gene silencing in nucleolar dominance. PLoS One. 2007, 2: e1157-10.1371/journal.pone.0001157.
Douet J, Tourmente S: Transcription of the 5 S rRNA heterochromatic genes is epigenetically controlled in Arabidopsis thaliana and Xenopus laevis. Heredity. 2007, 99: 5-13. 10.1038/sj.hdy.6800964.
Blevins T, Pontes O, Pikaard CS, Meins F: Heterochromatic siRNAs and DDM1 independently silence aberrant 5 S rDNA transcripts in Arabidopsis. PLoS One. 2009, 4: e5932-10.1371/journal.pone.0005932.
Chekanova JA, Gregory BD, Reverdatto SV, Chen H, Kumar R, Hooker T, Yazaki J, Li P, Skiba N, Peng Q, et al: Genome-wide high-resolution mapping of exosome substrates reveals hidden features. Cell. 2007, 131: 1340-53. 10.1016/j.cell.2007.10.056.
Slomovic S, Laufer D, Geiger D, Schuster G: Polyadenylation of ribosomal RNA in human cells. Nucleic Acids Res. 2006, 34: 2966-75. 10.1093/nar/gkl357.
Fulnecek J, Kovarik A: Low abundant spacer 5 S rRNA transcripts are frequently polyadenylated in Nicotiana. Mol Genet Genomics. 2007, 278: 565-73. 10.1007/s00438-007-0273-6.
Kadaba S, Wang X, Anderson JT: Nuclear RNA surveillance in Saccharomyces cerevisiae: Trf4p-dependent polyadenylation of nascent hypomethylated tRNA and an aberrant form of 5 S rRNA. RNA. 2006, 12: 508-21. 10.1261/rna.2305406.
LaRiviere FJ, Cole SE, Ferullo DJ, Moore MJ: A late-acting quality control process for mature eukaryotic rRNAs. Mol Cell. 2006, 24: 619-26. 10.1016/j.molcel.2006.10.008.
Murashige T, Skoog F: A revised medium for rapid growth and bioassays with tobacco tissue culture. Plant Physiol. 1962, 15: 497-
Stewart CN, Via LE: A rapid CTAB DNA isolation technique useful for RAPD fingerprinting and other PCR applications. Biotechniques. 1993, 14: 748-50.
Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9: 868-77. 10.1101/gr.9.9.868.
Andronescu M, Bereg V, Hoos HH, Condon A: RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics. 2008, 9: 340-10.1186/1471-2105-9-340.
This work was supported by NSF grant MCB-0615534, Spelman College and an HHMI Fellowship to RAF. We are grateful to Drs. Aditi Pai and Cynthia Bauerle for comments on the manuscript.
The authors declare that they have no competing interests.
MJ amplified, cloned and prepared 18 S rDNA and rRNA for sequencing. RF carried out experiments to confirm the presence and expression of the 18 S deletion. AM conceived the experiments, analyzed sequences and wrote the manuscript.