Comparison with existing software
Several programs have been designed for the analysis of biological networks, the best known of which is Cytoscape [17]. Although it is a major first class visualization program, we chose to develop gViz and to add some unique and convenient functionalities, the first of which is the capacity to compute and display sub-networks using several different built-in filters (see Figure 1 and 2 for an overview of gViz interface and functionalities). This functionality, although potentially biologically important, does not have its counterpart in Cytoscape (see Figure 3). We also noted that gViz is much more user-friendly and easier to use than most of the alternative software packages. Although it is technically possible to display a huge network containing tens of thousands of nodes and a hundred times more edges, it would not be humanly efficient to identify specific parts of that network. In that context, gViz proposes various features that allow the user to display only parts of the network of interest. The user can then select one or more identifiers from a list deriving from a data set provided and display the sub-network containing only the relationships to be studied. A "deepness" slider feature provided can be used to gradually adjust the neighborhood of the selected identifiers: if n identifiers are selected and a deepness of d is chosen, the sub-network displayed will include the n identifiers and all their neighbors at a distance of a maximum of d edges. For example, selecting a single node and a deepness of 2 will display the selected node, its immediate neighbors and neighbors of the immediate neighbors. It is also possible to set deepness to the "maximum", hence displaying all neighbors reachable from the selected identifiers, regardless of their distance.
gViz can also filter the displayed relationships (edges) by excluding those with a weight under a given threshold (determined during the computational part in MINET; the weight represents the certainty of the selected interaction; i.e., if we consider a pair of nodes i and j, the weight of the arc between them is the maximum of the MRMR (maximum relevance/minimum redundancy) score computed in both directions.). The user can at any time reduce (or increase) the number of edges displayed by eliminating those which are most likely to be false positives.
Apart from filtering by edge weight, one can filter the node list by degree (i.e. number of neighbors) and/or by clusters (i.e. sub-networks without a connection between them). The clusters are recalculated whenever the user changes the threshold of the edge weights, because they do not take into account edges excluded by the filter. Another prominent feature allows the user to find identifiers by providing annotation criteria (such as a description of a gene or its involvement in a biological process), and generate a sub-network using all or part of the search results.
It is worth mentioning that, to our knowledge, gViz is one of the few software packages capable of displaying and analyzing GraphML-based networks at the genome scale. The yEd graph editor [18] is another piece of software able to display GraphML; however, it is less powerful and proposes fewer features than gViz. The GraphML format was introduced around the year 2000 as a common network information exchange format. As such, gViz is a serious candidate for widespread GraphML analysis.
Here is a summary of gViz main advantages: an important feature of gViz is the ability to render dynamic networks. This feature does not appear in most network visualization software (Rhodobase, Starnet, ... [19, 20]), although it does in Cytoscape. gViz has also a unique additional feature of reading and displaying GraphML-based networks. Another important feature of gViz is the possibility to filter networks on the basis of different criteria, topological (number of neighbors, strength of edges, ...) and biological ones (membership in a certain pathway, known interactions, ...). The filtering ability based on several criteria does appear in other programs; however the set of criteria on which one can filter networks in gViz is unique. We believe that some of the filters available in gViz will be very useful to biologists.
gViz functionalities
Networks displayed by gViz are dynamic and can be manipulated; each node or group of nodes can be moved using the mouse. Several layout algorithms are available to automatically position graph nodes and the user can hence organize nodes differently depending on the complexity of the actual network. Other 'visual' features include the possibility to change the size and transparency of nodes, to set edge thickness as a function of their weight, or to set node diameter as a function of their degree. Specific nodes can be selected directly on the network graph using the mouse or from a list of displayed identifiers, and their neighbors are highlighted (the number of neighbors is set using a desired distance). Users can also generate colored clusters differently by choosing the number of edges to remove to form highly connected sub-networks, where these edges are identified using the Girvan and Newman clustering algorithm [21].
When one or more nodes of the network are selected, gViz displays a range of annotation information from the displayed sub-network, such as the list of direct neighbors (displayed or not in the sub-networks) and the neighbors reached by an edge below the threshold set for the edge weight, and from a biological database, such as the probeset identifier, information on associated genes, biological processes, cellular components and molecular functions involved (from Gene Ontology [22]), proteins (sequences from SwissProt and domains from InterPro [23]), references in the literature (from PubMed [24]), diseases in which corresponding genes are involved (from OMIM), chemical reactions and pathways in which these genes are involved (from KEGG [25] and GenMAPP [26]). gViz also has two features that display the shortest path between two nodes and highlight the edges of the sub-network that can be identified in a pre-selected KEGG pathway. Various statistics on the sub-network can be obtained, including histogram plots providing the user with the degree of network node distribution, the diameter of the sub-network, the weight distribution of the edges, the graph distribution coefficient of clustering [27] and/or the cluster size distribution. Finally, the displayed (sub-)network and its statistics can be exported into different image and data file formats.
Case study
In this section, we will give an example of how to collect data, generate a GraphML using Minet and R and explore the resulting graph in gViz.
-
Data collection: There are several web repositories for microarray data (GEO [28], Array Express [29]). We recommend the use of our database PathEx, for easier and faster data collection.
-
Network Computation:
We use the following packages to compute our networks: GCRMA, Minet and Infotheo (all of which freely available for download in Bioconductor).
Here follows the R code to compute the network from microarray.cel files (for R > = R2.10)
library(gcrma)
library(minet)
cel < -list.celfiles()
a < -ReadAffy(filenames = cel)
b < -gcrma(a)
c < -exprs(b)
d < -t(c)
disc < -discretize(d, disc = "equalfreq ", nbins = sqrt(nrow(d)))
mim < -mutinformation(disc)
net < -mrnet(mim)
write.table(net, file = "net.txt ", sep = " \t")
Once these steps are done (which can be long, depending on the computer's speed and dataset size), one can import the network computed into gViz.
-
gViz exploration
Once the network is loaded in gViz (using the 'open' button {1}, see Figure 4 and the gViz manual), the list of the available nodes (genes) is shown in the lower left panel {2}. To display the entire graph, first select the 'circle' layout {3} then click on 'display full graph'{4}. The circle layout is recommended for its low computational needs; rendering a very large network can be extremely resource consuming. However, this step allows for a first overview of the network. At this step, one might want to filter his graph, using the filter on the Minet score {5}, or the filter on the number of neighbors{6}. One can also use the clustering option {7} to group similar nodes, by removing progressively the weakest edges in the graph (controlled by the 'edge removed for clustering' slider {8}). Then, using the selection tool {9}, one highlights the nodes of interest, then hits the 'remove non-selected nodes' button {10}. The resulting sub-graph can then be shown using another, more explicit, layout (for example, 'force-directed').
The different layouts available in gViz have different uses: the 'circle' layout suits best the very big graphs, as it requires less computational power to be displayed. When working on a mid-sized sub-graph (less than 1,000 nodes), the 'Kamada-Kawai' or 'Fruchterman Reingold' layouts are suggested. 'Force directed' or 'Meyer's self organizing' layouts are best suited for small (less than 100 nodes) networks, as they need more computational power to work but better discriminate the edges.
To compute the shortest path between two nodes, first select the 'shortest path' tool {11}, click on the first node of interest then maintain the SHIFT key and click on the second node. gViz automatically computes the shortest path between those two nodes, with respect to the filters possibly applied. This feature can also be controlled via the left panel {11'}. The shortest path can be exported using the 'Export shortest path' button {12}.
It is possible to export the current graph in image (click on 'export network (png)' button {12}) or in various text formats (using the 'export network (text)' button {13}).
To obtain information on a certain node, simply click on its name in the lower right panel {14}, displaying all the information contained in the PathEx database for this particular gene (for more information about PathEx, see [16]).