EcoBrowser: a web-based tool for visualizing transcriptome data of Escherichia coli
© Li et al; licensee BioMed Central Ltd. 2011
Received: 1 April 2011
Accepted: 13 October 2011
Published: 13 October 2011
Escherichia coli has been extensively studied as a prokaryotic model organism whose whole genome was determined in 1997. However, it is difficult to identify all the gene products involved in diverse functions by using whole genome sequencesalone. The high-resolution transcriptome mapping using tiling arrays has proved effective to improve the annotation of transcript units and discover new transcripts of ncRNAs. While abundant tiling array data have been generated, the lack of appropriate visualization tools to accommodate and integrate multiple sources of data has emerged.
EcoBrowser is a web-based tool for visualizing genome annotations and transcriptome data of E. coli. Important tiling array data of E. coli from different experimental platforms are collected and processed for query. An AJAX based genome browser is embedded for visualization. Thus, genome annotations can be compared with transcript profiling and genome occupancy profiling from independent experiments, which will be helpful in discovering new transcripts including novel mRNAs and ncRNAs, generating a detailed description of the transcription unit architecture, further providing clues for investigation of prokaryotic transcriptional regulation that has proved to be far more complex than previously thought.
With the help of EcoBrowser, users can get a systemic view both from the vertical and parallel sides, as well as inspirations for the design of new experiments which will expand our understanding of the regulation mechanism.
In the past decade, advances on high-throughput sequencing technologies have already made a huge impact on microbiology, providing a fast and economical means of determining whole genome sequences of bacteria . For instance, most of the current completed genome-sequence projects listed on Genomes OnLine Database are microbial. The genome needs to be annotated by identifying the locations and functions of genes. Specifically, the in-depth organizational structure of bacterial genomes still needs to be fully elucidated.
Escherichia coli has been widely used as a prokaryotic model organism whose whole genome was sequenced as early as 1997 . The information about its genes, proteins, intergenic regions and biochemical machineries have been collected in the well known databases, including EcoGene, EcoCyc and EcoliWiki [3–5]. However, identifying all the gene products involved in diverse functions has proved difficult to accomplish solely based on whole genome sequences. Thus, microarray data serve as useful complementary information for functional genomics. Some databases are built based on the microarray data like GenExpDB . GenExpDB brings together an extensive collection of gene expression data from the E.coli community, so that the gene expression level in different conditions and platforms can be easily compared. Recent advance in biology suggests a wide-spread involvement of noncoding RNA in transcript regulations, but the design of gene microarray can only cover the gene coding regions of the whole genome and many new techniques are aiming to investigate the regulation of no-coding regions. As an unbiased tool to investigate protein binding, gene expression and gene structure on a genome-wide scope, tiling arrays has improved the annotation of transcript units and the discovery of many new transcripts of non-coding and natural antisense RNA [7, 8]. While abundant tiling array data have been generated, the lack of appropriate visualization tools to accommodate and integrate multiple sources of data has emerged. The widely used genome browsers such as UCSC genome browser and Ensembl Bacteria reload the entire genome browser page by every action [9, 10]. The discontinuous page transitions impair the user's sense of which genomic locus they are viewing and how the displayed data points relate to one another. In addition, as the size of tiling array data is usually very huge, it is also time consuming to upload and display them on the browser server.
We therefore built EcoBrowser which is a web-based visualization tool for searching genome annotations through transcriptome expression profiles of E.coli. The major difference between EcoBrowser and GenExpDB is that GeneExpDB focuses on gene expression data. EcoBrowser focuses on visualizing the whole-transcriptome mapping data such as tiling array, therefore the expression level of both coding region and non-coding region can be included and led to further integration analysis. The expression value were transformed into shapes of bule colors for drawing the heatmaps. The heatmap of whole genome were pre-rendered as tiles of images at multiple zoom levels and stored on the server-side. With the help of AJAX technology, a smooth panning and zooming effect can be created by dynamically changing the positional offset of these tiles, fetching new tile images when necessary (without reload the whole page). Thus, genome-wide comparison of expression patterns from independent experiments and genome annotation can be performed by direct comparison which will be helpful in discovering new transcripts, non-coding RNAs and generating a detailed description of the transcription unit architecture. It could also provide clues for further investigation of condition-specific transcriptional regulation.
where S i means the signal value of the ith gene, S represents [S1, S2,... S n ], where n is the number of genes. The shade of blue represents the relative expression level of the probes which continuously cover the entire genome in each track. Jbrowse is to navigate trough the gene and transcription unit predictions [11, 17]. The AJAX-based browser offers a faster and smoother navigation through the genome without reloading of the page. The genome annotations are rendered on the client side while the transcriptome expression heatmaps are prerendered and stored on the server.
Results and Discussion
In the case of groS (b4142) and groL (b4143), the two adjacent genes belonging to the same operon are shown to be co-expressed in the tracks RNA_heat_plus and RNA_logphase_plus. RNA polymerase (RNAP) binds to the gene regions of groS and groL by pulses of heat (GB_heat) while not in the log phase (GB_logphase). The above indicates that firstly the transcription of .groS and groL are activated by the heat pulse; secondly, the transcript of groS and groL are still kept in a high level in the log-phase condition due to their essential role in protein maintenance and cell growth. After combining the static map of Rifampicin-induced RNAP-binding promoter regions (GB_logphase_rif), users can get a better understanding of the process of groS and groL transcription. More findings can be revealed by extending the object to more genes of the whole genome as well as more species.
About 80 of hundreds of predicted sRNAs candidates in silico have been experimentally validated in E.coli. However, many more predicted sRNAs located in the intergenic regions shows a high expression levelin EcoBrowser. A recent paper identified 10 new non-coding sRNAs of E.coli by using a genome-wide deep-sequencing approach, 9 of them display a clear high expression level in EcoBrowser (details in supplementary, additional file 1) . Thus, biologists can use EcoBrowser as a reference before the experimental validation of a new sRNA candidate. We have collected the predicted sRNA results of E.coli from several papers to help users make use of the browser more effectively [19–23]. The prediction information is in "Help" page.
The EcoBrowser is a valuable tool for researchers. With the help of the integrated genome browser, users can also get a systemic view both from the vertical and parallel sides, as well as inspirations for the design of new experiments which will expand our understanding of the regulation mechanism. Next generation datasets, such as RNA-seq, will also be included in the future when the next generation sequencing technologies have been extensively applied.
Availability and requirements
Project name: EcoBrowser project
Project home page: http://ecobrowser.biosino.org
Operating systems: Platform independent
Other requirements: None
This work was supported by grant State key basic research program (973):2010CB910200, 2010CB529200; Research Program of CAS:KSCX2-YW-R-112
- MacLean D, Jones JD, Studholme DJ: Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol. 2009, 7 (4): 287-296.PubMedGoogle Scholar
- Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF: The complete genome sequence of Escherichia coli K-12. Science. 1997, 277 (5331): 1453-1462. 10.1126/science.277.5331.1453.PubMedView ArticleGoogle Scholar
- Rudd KE: EcoGene: a genome sequence database for Escherichia coli K-12. Nucleic Acids Res. 2000, 28 (1): 60-64. 10.1093/nar/28.1.60.PubMedPubMed CentralView ArticleGoogle Scholar
- Keseler IM, Collado-Vides J, Santos-Zavaleta A, Peralta-Gil M, Gama-Castro S, Muniz-Rascado L, Bonavides-Martinez C, Paley S, Krummenacker M, Altman T: EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Res. 2011, D583-590. 39 DatabaseGoogle Scholar
- EcoliWiki. [http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki]
- GenExpDB. [http://genexpdb.ou.edu/main/]
- Kampa D, Cheng J, Kapranov P, Yamanaka M, Brubaker S, Cawley S, Drenkow J, Piccolboni A, Bekiranov S, Helt G: Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 2004, 14 (3): 331-342. 10.1101/gr.2094104.PubMedPubMed CentralView ArticleGoogle Scholar
- Cho BK, Zengler K, Qiu Y, Park YS, Knight EM, Barrett CL, Gao Y, Palsson BO: The transcription unit architecture of the Escherichia coli genome. Nat Biotechnol. 2009, 27 (11): 1043-1049. 10.1038/nbt.1582.PubMedView ArticleGoogle Scholar
- UCSC Geonme Browser. [http://genome.ucsc.edu/]
- EnsemblBacteria. [http://bacteria.ensembl.org/index.html]
- Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH: JBrowse: a next-generation genome browser. Genome Res. 2009, 19 (9): 1630-1638. 10.1101/gr.094607.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2011, D38-51. 39 DatabaseGoogle Scholar
- Thomassen GO, Weel-Sneve R, Rowe AD, Booth JA, Lindvall JM, Lagesen K, Kristiansen KI, Bjoras M, Rognes T: Tiling array analysis of UV treated Escherichia coli predicts novel differentially expressed small peptides. PLoS One. 2010, 5 (12): e15356-10.1371/journal.pone.0015356.PubMedPubMed CentralView ArticleGoogle Scholar
- Thomassen GO, Rowe AD, Lagesen K, Lindvall JM, Rognes T: Custom design and analysis of high-density oligonucleotide bacterial tiling microarrays. PLoS One. 2009, 4 (6): e5943-10.1371/journal.pone.0005943.PubMedPubMed CentralView ArticleGoogle Scholar
- Mooney RA, Davis SE, Peters JM, Rowland JL, Ansari AZ, Landick R: Regulator trafficking on bacterial transcription units in vivo. Mol Cell. 2009, 33 (1): 97-108. 10.1016/j.molcel.2008.12.021.PubMedPubMed CentralView ArticleGoogle Scholar
- Peters JM, Mooney RA, Kuan PF, Rowland JL, Keles S, Landick R: Rho directs widespread termination of intragenic and stable RNA transcription. Proc Natl Acad Sci USA. 2009, 106 (36): 15406-15411. 10.1073/pnas.0903846106.PubMedPubMed CentralView ArticleGoogle Scholar
- Skinner ME, Holmes IH: Setting up the JBrowse genome browser. Curr Protoc Bioinformatics. 2010, Chapter 9: Unit 9 13Google Scholar
- Raghavan R, Groisman EA, Ochman H: Genome-wide detection of novel regulatory RNAs in E. coli. Genome Res. 2011Google Scholar
- Argaman L, Hershberg R, Vogel J, Bejerano G, Wagner EG, Margalit H, Altuvia S: Novel small RNA-encoding genes in the intergenic regions of Escherichia coli. Curr Biol. 2001, 11 (12): 941-950. 10.1016/S0960-9822(01)00270-6.PubMedView ArticleGoogle Scholar
- Rivas E, Klein RJ, Jones TA, Eddy SR: Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr Biol. 2001, 11 (17): 1369-1373. 10.1016/S0960-9822(01)00401-8.PubMedView ArticleGoogle Scholar
- Chen S, Lesnik EA, Hall TA, Sampath R, Griffey RH, Ecker DJ, Blyn LB: A bioinformatics based approach to discover small RNA genes in the Escherichia coli genome. Biosystems. 2002, 65 (2-3): 157-177. 10.1016/S0303-2647(02)00013-8.PubMedView ArticleGoogle Scholar
- Yachie N, Numata K, Saito R, Kanai A, Tomita M: Prediction of non-coding and antisense RNA genes in Escherichia coli with Gapped Markov Model. Gene. 2006, 372: 171-181.PubMedView ArticleGoogle Scholar
- Tran TT, Zhou F, Marshburn S, Stead M, Kushner SR, Xu Y: De novo computational prediction of non-coding RNA genes in prokaryotic genomes. Bioinformatics. 2009, 25 (22): 2897-2905. 10.1093/bioinformatics/btp537.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.