- Technical Note
ColorTree: a batch customization tool for phylogenic trees
BMC Research Notesvolume 2, Article number: 155 (2009)
Genome sequencing projects and comparative genomics studies typically aim to trace the evolutionary history of large gene sets, often requiring human inspection of hundreds of phylogenetic trees. If trees are checked for compatibility with an explicit null hypothesis (e.g., the monophyly of certain groups), this daunting task is greatly facilitated by an appropriate coloring scheme.
In this note, we introduce ColorTree, a simple yet powerful batch customization tool for phylogenic trees. Based on pattern matching rules, ColorTree applies a set of customizations to an input tree file, e.g., coloring labels or branches. The customized trees are saved to an output file, which can then be viewed and further edited by Dendroscope (a freely available tree viewer). ColorTree runs on any Perl installation as a stand-alone command line tool, and its application can thus be easily automated. This way, hundreds of phylogenic trees can be customized for easy visual inspection in a matter of minutes.
ColorTree allows efficient and flexible visual customization of large tree sets through the application of a user-supplied configuration file to multiple tree files.
Studies in comparative genomics, e.g., analyzing protein family evolution [1–3] or lateral gene transfers [4–7], typically generate large sets of phylogenic trees. Visual inspection of these trees is often necessary, as automated algorithms are not yet sufficiently flexible and reliable [8, 9]. The aim of such analyses is often to check for consistency with given null hypotheses (e.g., the clustering of gene copies from known monophyletic groups). This task is often simplified by manual customization of the trees prior to inspection. Customizations usually involve changes of foreground and background colors of specific labels, line-width and color of associated branches, and other aspects of a phylogenic tree. The majority of existing tree viewing programs allow the customization of one or a few opened trees within reasonable time; few also allow to save and re-open customized results [10, 11]. However, such manual customization becomes time-consuming and error prone for large trees (the tree of life or the phylogenic tree of NCBI taxonomy, for example) or large tree numbers. In some modern tree-editors published recently, TreeDyn  and Dendroscope  for example, scripting and command-line consoles are introduced to tackle the problem; in both program, users can manipulate leaves and nodes through a command-line window (the console). By using TreeDyn, user can even save their commands into script files and re-apply them to other tree files afterwards. The advantage of such implementation is that the manual customization jobs are greatly facilitated in a semi-automatic way; however, the disadvantages are also obvious: users have to learn yet some other languages (although both are as simple as plain English and easy to learn for those who had programming experiences) and it's still difficult to apply the same set of commands to multiple tree-files.
Here we introduce a new program, ColorTree, which quickly and automatically customizes phylogenic trees based on a user-supplied customization file. Results are saved in a format that can be read by Dendroscope , a powerful tree viewer and editor freely available from its authors http://www-ab.informatik.uni-tuebingen.de/software/dendroscope. The advantages of ColorTree over existing customization methods are: (1) It is a standalone program that can be run from the command-line, making it ideally suited for batch use; (2) customized results are saved for further viewing and editing; and (3) the user-supplied configuration files, based on pattern matching logic, guarantee the stability and flexibility of customization results.
ColorTree takes two input text files, a tree file in any of the "Newick" and "NEXUS" formats, and a user-defined configuration file detailing the desired customizations. Input tree files may contain multiple phylogenic trees, delimited by ';'. Bootstrap scores on input tree branches are preserved.
Each line of the configuration file specifies one individual customization command. Each consists of five tab-delimited columns, specifying:
how the keyword will be searched in branch labels
the keyword to be searched in branch labels
foreground color to be applied to branches and node labels
background color to be applied node labels
line width to be applied to branches
The first two columns are obligatory, while the other three columns can be left blank. Detailed descriptions of the configuration file, as well as instructions for its generation, are available in the software package.
To customize trees in the input file, terminal node labels of each tree are searched using the user-supplied keywords. Four ways of searching are supported: "prefix", "suffix", "complete", and "contain". The user-defined "background color" will be applied to all matching labels, "branch width" will be applied to the branches that directly connect to the corresponding terminal nodes, and "foreground color" to both labels and directly connecting branches. When all descendant terminal nodes of any internal node have the same color, all intervening branches will also receive that color. This is particularly useful to find the common ancestor of a group of genes, or to pinpoint the separation of two clades during evolution (see examples in Figure 1).
Customized tree(s) are saved in ".dendro" format, which can be viewed and further edited by Dendroscope . It should be pointed out that Dendroscope provides a range of tree customization methods, but these have to be applied to individually opened tree files and tends to be time-consuming.
Examples of customized trees
In a genome sequencing project, evolutionary paths of selected protein families in multiple organisms were investigated. This required visual inspection of several hundred gene families, each containing orthologous genes from different organisms as well as paralogous copies within organisms. Using ColorTree, hundreds of phylogenic trees can be customized within a few hours on a standard desktop computer. All customized trees were then visually inspected in Dendroscope.
Several examples of customized trees are shown in Figure 1. Figure 1a shows a phylogenetic tree for the sorbitol transporter protein. This represents a typical scenario of lineage-specific gene duplications. An ancestral sorbitol transporter gene is found in the common ancestor of green plants (highlighted in green) and brown algae (highlighted in brown). After the separation of green and brown algae, the ancestral gene remained single copy in brown algae, duplicating only in the terminal branches. While there are also duplications specific to rice and poplar, one duplication is evident before the separation of these two species.
Figure 1b shows a phylogenic tree for the glutamine synthetase protein, which was adopted from . Species of archaea (red) and bacteria (black) are intermingled. The tree is thus incompatible with the accepted monophyly of the two kingdoms. If the tree faithfully reflects the evolutionary history of the gene, this would indicate possible lateral gene transfers (LGT) between bacteria and archea. Thus, visual inspection using ColorTree and Dentroscope is a simple and intuitive way to identify certain types of inconsistencies in genetic data. However, users may also wish to look at alternative, sophisticated methods to detect such inconsistencies, e.g., Neighbour-Net  or the 'tree-of-tree' approach .
Availability and requirements
The program described in this note is freely downloadable from http://code.google.com/p/colortree. ColorTree is written in PERL and should run on any platform running PERL and BioPerl. To facilitate users who don't have programming experience or have no PERL pre-installed, we also provide pre-packed executables that can run on computers without PERL and BioPerl modules.
Ganfornina MD, Gutierrez G, Bastiani M, Sanchez D: A phylogenetic analysis of the lipocalin protein family. Mol Biol Evol. 2000, 17: 114-126.
Thornton JW, DeSalle R: Gene family evolution and homology: genomics meets phylogenetics. Annu Rev Genomics Hum Genet. 2000, 1: 41-73. 10.1146/annurev.genom.1.1.41.
Zardoya R: Phylogeny and evolution of the major intrinsic protein family. Biol Cell. 2005, 97: 397-414. 10.1042/BC20040134.
Bapteste E, Boucher Y, Leigh J, Doolittle WF: Phylogenetic reconstruction and lateral gene transfer. Trends Microbiol. 2004, 12: 406-411. 10.1016/j.tim.2004.07.002.
Brown JR: Ancient horizontal gene transfer. Nat Rev Genet. 2003, 4: 121-132. 10.1038/nrg1000.
Obornik M, Peer Van de Y, Hypsa V, Frickey T, Slapeta JR, Meyer A, Lukes J: Phylogenetic analyses suggest lateral gene transfer from the mitochondrion to the apicoplast. Gene. 2002, 285: 109-118. 10.1016/S0378-1119(02)00427-4.
Soria-Carrasco V, Castresana J: Estimation of phylogenetic inconsistencies in the three domains of life. Mol Biol Evol. 2008, 25: 2319-2329. 10.1093/molbev/msn176.
Castresana J: Topological variation in single-gene phylogenetic trees. Genome Biol. 2007, 8: 216-10.1186/gb-2007-8-10-r216.
Roger AJ, Hug LA: The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation. Philos Trans R Soc Lond B Biol Sci. 2006, 361: 1039-1054. 10.1098/rstb.2006.1845.
Trees and Tree – softwares for visualisation and manipulations. [http://bioinfo.unice.fr/biodiv/Tree_editors.html]
Phylogeny Programs. [http://evolution.genetics.washington.edu/phylip/software.html]
Chevenet F, Brun C, Banuls AL, Jacq B, Christen R: TreeDyn: towards dynamic graphics and annotations for analyses of trees. BMC Bioinformatics. 2006, 7: 439-10.1186/1471-2105-7-439.
Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA, et al: Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007, 317: 338-342. 10.1126/science.1138632.
Bryant D, Moulton V: Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol. 2004, 21: 255-265. 10.1093/molbev/msh018.
Huson DH: SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics. 1998, 14: 68-73. 10.1093/bioinformatics/14.1.68.
We would like to acknowledge the helpful suggestions given by the Lercher group.
The authors declare that they have no competing interests.
WHC conceived the project and implemented TreeColor in consultation with MJL. WHC and MJL wrote the manuscript.