A collection of bioconductor methods to visualize gene-list annotations
- Gang Feng†1,
- Pan Du†1,
- Nancy L Krett1,
- Michael Tessel1,
- Steven Rosen1,
- Warren A Kibbe1 and
- Simon M Lin1Email author
© Lin et al.; licensee BioMed Central Ltd. 2010
Received: 25 August 2009
Accepted: 19 January 2010
Published: 19 January 2010
Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes.
We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories), and concept-gene connections (a.k.a. annotations): the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists.
These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.
The gene list from a microarray study is usually summarized by Gene Ontology  or Disease Ontology  annotations to provide a higher-level understanding of the functionalities of the genes identified in such an experiment. We explored the existing methods available in Bioconductor http://www.bioconductor.org to visualize the annotation results, and then extended those methods to create the GeneAnswers package.
The Gene-annotation Data Model
Note that these mappings are usually many-to-many, i.e., one gene belongs to multiple concepts and one concept includes multiple genes.
The GeneAnswers Class
* Category list ( )
* Gene list ( ):
Gene IDs (required)
Fold Change (optional)
Expression profile (optional)
Hypergeometric test result (calculated) ( )
* Concept-and-gene network
* Concept-and-gene cross tabulations
The GeneAnswers package calls different modules to generate the figures depending on the users' requirements.
Testing Data Set
The testing dataset was obtained from a microarray experiment running on Affymetrix human HG-U133A+ version 2.0 chips for a multiple myeloma cell line treated with dexamethasone for 24 hours (three biological replicates). Dexamethasone is a synthetic glucocorticoid. Glucocorticoids are used to treat several hematologic malignancies including multiple myeloma, however the mechanism of action is not completely understood. The gene list from the microarray experiment can help us gain a better understanding of the glucocorticoid-induced gene regulation in myeloma cell lines.
In this dataset, there were 319 genes identified as significantly differentially expressed, based on cut-off criteria of fold change more than 2 and False Discovery Rate (FDR) less than 0.01. Using our new Bioconductor package - GeneAnswers, a hypergeometic test  for over-represented Gene Ontology (GO) terms of the input gene list was conducted. The final GeneAnswers output, describing the GO analysis results of the input gene list (including GO identifiers, GO terms, gene numbers, hypergeometric test p-values and genes), was generated and the results sorted by hypergeometic test p-values. Six relevant GO terms were selected for further analysis based on their biological importance and statistical significance. This testing data set is included in the GeneAnswers package as an example.
One of the most important contributions of the GeneAnswers package is to formally introduce the GeneAnswers class (the concept-to-gene mapping class) into Bioconductor. With this class, the package can visually represent the relationship between genes and any given concepts (gene ontology, disease ontology, and etc.) in two different ways: a concept-and-gene-network, highlighting the involvement of a gene in multiple annotation concepts; and the concept-and-gene cross tabulation, which enables a more traditional heat map visualization with annotations.
Researchers might be more interested in the relationship between gene expression profile and relative GO terms. Therefore, the GeneAnswers package also supplies another way, the "concept-and-gene cross tabulation", to show such details.
Concept-and-Gene Cross Tabulations
Users can combine these two types of visualization tools to identify which genes and biological pathways are potentially involved in the treatment effects. For instance, it is clear from Figure 2 and Figure 3 that genes associated with DNA replication- such as CDC6, CDC45L, CDC25A, MCM4, MCM10, and MCM6 - are all down regulated after dexamethasone treatment.
The "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation" visualization methods provided by the new Bioconductor package "GeneAnswers" are powerful tools that generate a macroscopic view for investigators to understand the relationships between a given gene list and relevant annotations. In addition to seeing an easy to understand blueprint for all genes and statistically significant annotation categories, researchers are also able to identify key genes involved in several different categories and visualize how the gene expression patterns relate to each potential pathway and biological functions. These visualization methods can be incorporated into large-scale genomic data processing pipelines. As of this release, the GeneAnswers Bioconductor package only generates static images. As the number of genes reaches to the magnitude of thousands, the gene symbols and concept categories in the figure will be cramped, although the overall structure can be legible. Such a limitation can be addressed by interactive figures with zooming or scrolling capabilities.
Availability and requirements
Project name: GeneAnswers
Project home page: http://bioconductor.org/packages/2.5/bioc/html/GeneAnswers.html
Operating system(s): Platform independent
Programming language: R version 2.9.0 or higher
Other requirements: annotate 1.20.0, AnnotationDbi 1.6.0, Biobase 2.2.0, DBI 0.2.4, GO.db 2.2.11, KEGG.db 2.2.11, the most updated Megadata packages, org.Hs.eg.db, org.Mm.eg.db, org.Rt.eg.db in Bioconductor and igraph 0.5.1, RSQLite 0.7.1, XML 1.98.1, xtable 1.5.4 Heatplus 1.12.0, MASS 7.2.44, RColorBrewer 1.0.2 in R
The authors would like to thank Drs. Spencer Huang, Hongmei Jiang, and Julie Zhu for critical reading of the manuscript. This project was supported in part by Award Number UL1RR025741 from the National Center for Research Resources, National Institutes of Health; Senior Investigator award from the Multiple Myeloma Research Foundation to NK; and RO1 CA085915 to SR. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Center for Research Resources, the Multiple Myeloma Research Foundation, or the National Institutes of Health.
- Ashburner M, Ball CA, Blake JA: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMed CentralPubMedView Article
- Du P, Feng G, Flatow J: From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations. Bioinfomatics. 2009, 25: i63-8. 10.1093/bioinformatics/btp193.View Article
- Adams WT, Skopek TR: Statistical test for the comparison of samples from mutational spectra. J Mol Biol. 1987, 194: 391-6. 10.1016/0022-2836(87)90669-3.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.