CoreGenes3.5: a webserver for the determination of core genes from sets of viral and small bacterial genomes
© Turner et al.; licensee BioMed Central Ltd. 2013
Received: 17 November 2012
Accepted: 25 March 2013
Published: 8 April 2013
CoreGenes3.5 is a webserver that determines sets of core genes from viral and small bacterial genomes as an automated batch process. Previous versions of CoreGenes have been used to classify bacteriophage genomes and mine data from pathogen genomes.
CoreGenes3.5 accepts as input GenBank accession numbers of genomes and performs iterative BLASTP analyses to output a set of core genes. After completion of the program run, the results can be either displayed in a new window for one pair of reference and query genomes or emailed to the user for multiple pairs of small genomes in tabular format.
With the number of genomes sequenced increasing daily and interest in determining phylogenetic relationships, CoreGenes3.5 provides a user-friendly web interface for wet-bench biologists to process multiple small genomes for core gene determinations. CoreGenes3.5 is available at http://binf.gmu.edu:8080/CoreGenes3.5.
KeywordsCore genes Bacteriophage Taxonomy Viral genomics Data mining
Genes that are common between a set of genomes are known as core genes. Core sets of genes have been used to understand better bacterial genome evolution , orthology in viral genomes , viral evolutionary complexity , and to mine pathogen genomes . Core genes have also been used to investigate the origins of photosynthesis , as well as to classify and untangle the taxonomy of bacteriophages [6–8]. With such a myriad of uses for core genes and the growing numbers of whole genome sequences, it is important to provide user-friendly and validated software tools for the determination of these genes from sets of genomes. Originally developed in 2002 , CoreGenes, a tool for the identification of shared and unique genes among (small) genomes, has been continually updated and refined in response to user demands . These changes include increased robustness of the tool, as well as the ability to upload custom and proprietary data not deposited in GenBank. The major update to this version is the ability and versatility to batch process multiple pairs of small genomes, freeing the user from repetitive and time-consuming manual entry of genome sets. This is of benefit to users who have several large sets of genomes that they wish to analyze, for example a family of bacteriophages.
Other software tools have been developed for the determination of core genes including mGenomeSubtractor , CEGMA , nWayComp , and GenomeBlast . mGenomeSubtractor and GenomeBlast both use BLAST-based algorithms to identify core genes. Of these, mGenomeSubtractor is primarily intended for use with bacterial genomes, whilst CEGMA is intended primarily for eukaryotic genomes; nWayComp and GenomeBlast are no longer accessible online, as is another genome comparison tool called GOAT . In contrast, CoreGenes has been continuously available online since 2002, and shown to be invaluable in characterizing and re-determining the taxonomy and relationships of bacteriophages based on coding sequences [6, 7, 16–19]. It is anticipated that this timely update of CoreGenes will enable the analysis of shared proteins among viral and small bacterial genomes in a faster and more efficient manner.
As the BLASTP comparisons are performed ab-initio and not pre-computed, CoreGenes3.5 is limited to genome sizes of 2 Mb or less. While CoreGenes3.5 can take larger genomes as input, the time taken to process them also increases. Therefore, it is recommended that users submit genomes with the aforementioned limit.
Results and discussion
The advent and continued development of next generation technologies has substantially increased the throughput and fidelity of genome sequence data. With reducing costs, the number of viral and bacterial genomes deposited in the International Nucleotide Sequence Databases/GenBank has grown rapidly (and continues to do so). It is therefore crucial to continue the development and improvement of novel and existing software tools that can efficiently mine this expanding wealth of sequence data and facilitate comparisons of multiple closely or distantly related genomes.
CoreGenes3.5 is the latest and most versatile update to a user-friendly tool for locating and identifying core genes from viral and small bacterial genomes. Like previous versions of CoreGenes, this newest version will be continually updated in response to demands from the user community. The ability of CoreGenes to deal with larger bacterial genomes is actively being addressed.
The batch processing feature of CoreGenes3.5 enables researchers to analyze multiple small genomes expeditiously using a web interface. This allows users to data mine the increasing numbers of genomes in sequence databases and to determine quickly the phylogenetic relationships amongst them.
Availability and requirements
Project name: CoreGenes3.5
Project home page: http://binf.gmu.edu:8080/CoreGenes3.5
Operating system(s): Platform independent
Programming language: Java
Any restrictions to use by non-academics: License required for commercial usage
We thank Chris Ryan for maintaining the server on which CoreGenes3.5 is hosted and Jason Seto for critical comments and software validation. We also thank Andrew Kropinski for suggestions and comments over the years to improve these software tools. Publication of this article was funded in part by the George Mason University Libraries Open Access Publishing Fund.
- Liang W, Zhao Y, Chen C, Cui X, Yu J, Xiao J, Kan B: Pan-Genomic analysis provides insights into the genomic variation and evolution of Salmonella Paratyphi A. PLoS One. 2012, 7: e45346-PubMedPubMed CentralView Article
- Garavaglia MJ, Miele SAB, Iserte JA, Belaich MN, Ghiringhelli PD: The ac53, ac78, ac101 and ac103 are newly discovered core genes in the family Baculoviridae. J Virol. 2012, 86: 12069-12079.PubMedPubMed CentralView Article
- Yutin N, Koonin EV: Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes. Virol J. 2012, 9: 161-PubMedPubMed CentralView Article
- Mahadevan P, King JF, Seto D: Data mining pathogen genomes using GeneOrder and CoreGenes and CGUG: gene order, synteny and in silico proteomes. Int J Comput Biol Drug Des. 2009, 2: 100-114.PubMedView Article
- Mulkidjanian AY, Koonin EV, Makarova KS, Mekhedov SL, Sorokin A, Wolf YI, Dufresne A, Partensky F, Burd H, Kaznadzey D, Haselkorn R, Galperin MY: The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci USA. 2006, 103: 13126-13131.PubMedPubMed CentralView Article
- Lavigne R, Darius P, Summer EJ, Seto D, Mahadevan P, Nilsson AS, Ackermann HW, Kropinski AM: Classification of Myoviridae bacteriophages using protein sequence similarity. BMC Microbiol. 2009, 9: 224-PubMedPubMed CentralView Article
- Lavigne R, Seto D, Mahadevan P, Ackermann H-W, Kropinski AM: Unifying classical and molecular taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res Microbiol. 2008, 159: 406-414.PubMedView Article
- Mahadevan P, Seto D: Taxonomic parsing of bacteriophages using core genes and in silico proteome-based CGUG and applications to small bacterial genomes. Adv Exp Med Biol. 2010, 680: 379-385.PubMedView Article
- Zafar N, Mazumder R, Seto D: CoreGenes: a computational tool for identifying and cataloging “core” genes in a set of small genomes. BMC Bioinforma. 2002, 3: 12-View Article
- Mahadevan P, King JF, Seto D: CGUG: in silico proteome and genome parsing tool for the determination of “core” and unique genes in the analysis of genomes up to ca. 1.9 Mb. BMC Res Notes. 2009, 2: 168-PubMedPubMed CentralView Article
- Shao Y, He X, Harrison EM, Tai C, Ou H-Y, Rajakumar K, Deng Z: mGenomeSubtractor: a web-based tool for parallel in silico subtractive hybridization analysis of multiple bacterial genomes. Nucleic Acids Res. 2010, 38: W194-200.PubMedPubMed CentralView Article
- Parra G, Bradnam K, Korf I: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007, 23: 1061-1067.PubMedView Article
- Yao J, Lin H, Doddapaneni H, Civerolo EL: nWayComp: a genome-wide sequence comparison tool for multiple strains/species of phylogenetically related microorganisms. In Silico Biol (Gedrukt). 2007, 7: 195-200.
- Lu G, Jiang L, Helikar RMK, Rowley TW, Zhang L, Chen X, Moriyama EN: GenomeBlast: a web tool for small genome comparison. BMC Bioinforma. 2006, 7 (Suppl 4): S18-View Article
- Kaluszka A, Gibas C: Interactive gene-order comparison for multiple small genomes. Bioinformatics. 2004, 20: 3662-3664.PubMedView Article
- Chibeu A, Lingohr EJ, Masson L, Manges A, Harel J, Ackermann H-W, Kropinski AM, Boerlin P: Bacteriophages with the ability to degrade uropathogenic Escherichia coli biofilms. Viruses. 2012, 4: 471-487.PubMedPubMed CentralView Article
- Kropinski AM, Van den Bossche A, Lavigne R, Noben J-P, Babinger P, Schmitt R: Genome and proteome analysis of 7-7-1, a flagellotropic phage infecting Agrobacterium sp H13-3. Virol J. 2012, 9: 102-PubMedPubMed CentralView Article
- Lehman SM, Kropinski AM, Castle AJ, Svircev AM: Complete genome of the broad-host-range Erwinia amylovora phage phiEa21-4 and its relationship to Salmonella phage felix O1. Appl Environ Microbiol. 2009, 75: 2139-2147.PubMedPubMed CentralView Article
- Villegas A, She Y-M, Kropinski AM, Lingohr EJ, Mazzocco A, Ojha S, Waddell TE, Ackermann H-W, Moyles DM, Ahmed R, Johnson RP: The genome and proteome of a virulent Escherichia coli O157:H7 bacteriophage closely resembling Salmonella phage Felix O1. Virol J. 2009, 6: 41-PubMedPubMed CentralView Article
- Celamkoti S, Kundeti S, Purkayastha A, Mazumder R, Buck C, Seto D: GeneOrder3.0: software for comparing the order of genes in pairs of small bacterial genomes. BMC Bioinforma. 2004, 5: 52-View Article
- Mahadevan P, Seto D: Rapid pair-wise synteny analysis of large bacterial genomes using web-based GeneOrder4.0. BMC Res Notes. 2010, 3: 41-PubMedPubMed CentralView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.