Network module detection: Affinity search technique with the multi-node topological overlap measure
© Horvath et al; licensee BioMed Central Ltd. 2009
Received: 09 December 2008
Accepted: 20 July 2009
Published: 20 July 2009
Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis.
We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering.
Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/
While most clustering procedures use a pairwise dissimilarity (distance) measure as input, we present a clustering procedure that can accommodate a multi-point dissimilarity measure d(i1, i2, ..., i P ) where P > 1 is the number of points and the indices i k = 1, ..., n run over the n objects. Since we are mainly interested in a network application, we will refer to the objects as nodes and the corresponding measure as multi-node dissimilarity.
Note that this definition reduces to that of a pairwise dissimilarity when P = 2.
The proposed module affinity search technique (MAST) can be used for any pairwise or multi-node dissimilarity measure. But our particular focus is the multi-node topological overlap measure (reviewed below) since it was found useful in network neighborhood analysis . The topological overlap measure is defined for weighted and unweighted networks that can be represented by a (weighted) adjacency matrix A = [a(ij)], i.e. a symmetric similarity matrix with entries between 0 and 1. In an unweighted network, a(ij) = 1 if nodes i and j are connected and 0 otherwise. In a weighted network, 0 ≤ a(ij) ≤ 1 encodes the pairwise connection strength. Examples of such networks include gene co-expression networks and physical interaction networks.
A simple approach for measuring the dissimilarity between nodes i and j is to define dissA(ij) = 1 - a(ij). However, this measure is not very robust with respect to erroneous adjacencies. Spurious or weak connections in the adjacency matrix may lead to 'noisy' networks and modules . Therefore, we and others have explored the use of dissimilarity measures that are based on common interacting partners or on topological metrics [3–7]. In this article, we will define modules as sets of nodes that have high topological overlap.
Review of the pairwise topological overlap measure
where N(i1, i2) denotes the set of common neighbors shared by i1 and i2, N (i1, - i2) denotes the set of the neighbors of i1 excluding i2 and |·| denotes the number of elements (cardinality) of its argument. The Binomial coefficient = 1 in the denominator of Eq. (2) is an upper bound of a(i1i2). One can show that 0 ≤ a(i1i2) ≤ 1 implies 0 ≤ t(i1i2) ≤ 1 [1, 8, 9], i.e. t(i1i2) can be considered a normalized measure of the number of shared direct neighbors. In the following, we will review how to extend this pairwise measure to multiple nodes.
Review of the multi-node topological overlap measure
where N(i1, i2, i3) is the set of the neighbors shared by i1, i2 and i3 and N (i1, i2, - i3) is the set of the neighbors shared by i1 and i2 excluding i3. The Binomial coefficient = 3 in the denominator of Eq. (3) is the upper bound of a(i1i2) + a(i1i3) + a(i2i3) and equals the number of connections that can be formed between i1, i2, and i3. One can easily prove 0 ≤ a(i, j) ≤ 1 implies that 0 ≤ t(i1i2i3) ≤ 1. The stringent MTOM extension is related to the average pairwise TOM extension (Eq. 1) but it puts particular weight onto neighbors that are truly shared between the nodes. Below, we briefly compare of the two multi-node extensions (and corresponding dissimilarity measures).
MTOM for weighted networks
Note that these formulas remain mathematically well defined for a weighted adjacency matrix 0 ≤ a(i1i2) ≤ 1. Therefore, it is natural to use these algebraic formulas for defining the MTOM measure for a weighted networks. It is straightforward to extend the above equations for P = 3 nodes to more than 3 nodes, e.g. our MAST software can deal with any integer P > 1.
Since the topological overlap measure accounts for connections to shared neighbors, it is particularly useful in the context of sparse network, e.g. protein-protein interaction networks [3, 8, 10]. A comparison of the topological overlap measure to alternative measures (e.g. adjacency and correlation-based measures) can be found in [1, 7].
Comparison of MTOM based dissimilarity measures
On real data, we find that the two 3 point dissimilarity measures (Eq. 5) and (Eq. 6) are highly correlated (r ≈ .9) across index triplets with distinct indices, i.e. when i1 ≠ i2 ≠ i3. But if 2 indices are equal and different from the third index, (e.g. when i1 = i2 ≠ i3 which entails dissTOM 2(i1, i2) = 0) then the average pairwise dissimilarity (Eq. 6) takes on substantially lower values than dissimilarity (Eq. 5). Across all (possibly non-distinct) index triplets, we find a weak correlation (r ≈ 0.3) between the two 3 point dissimilarity measures. In the following, we proceed with the non-pairwise dissimilarity measures (e.g. Eq. 5).
Review of network neighborhood analysis using MTOM
In a previous publication, we have shown that the multi-node topological overlap measure (MTOM) can perform better than a pairwise measure in the context of network neighborhood analysis . Since network neighborhood analysis is an important step in our proposed MAST procedure for module detection, we will review it in the following. Neighborhood analysis aims to find a set of nodes (the neighborhood) that is similar to an initial 'seed' set of nodes. Intuitively speaking, a neighborhood is composed of nodes that are highly connected to the given seed set of nodes. Since many of our applications involve networks comprised of genes, we will use the words 'node' and 'gene' interchangeably in this article. Neighborhood analysis facilitates a guilt-by-association screening strategy for finding genes that interact with a given set of biologically interesting genes. Informally, we refer to the resulting neighborhood as MTOM neighborhood. The MTOM-based neighborhood analysis requires as input an initial seed neighborhood composed of S0 ≥ 1 node(s) and the requested final size of the neighborhood S t = S0 + S, where S is the number of nodes that will be added to the initial neighborhood. For each node outside of the initial (seed) neighborhood, the MTOM software computes the MTOM value with the initial neighborhood. Next the S nodes with the highest MTOM values are selected and added to the initial seed neighborhood.
In the following, we report computation times for carrying out MTOM neighborhood analysis on a computer with the following specifications: Intel Pentium 4 CPU 2.40 GHz 2.39 GHz, 1.00 G of RAM. To search for 10 neighbors of a gene, MTOM required 10 seconds in a network comprised of 2000 nodes, 2 minutes in a network of 5000 genes, 12 minutes in a network comprised of 10 k genes, and over 1 hour in a network comprised of 150 k genes.
Finding modules (clusters) in a network
The detection of modules in a network is an important task in systems biology since genes and their protein products carry out cellular processes in the context of functional modules. Here we focus on module identification methods that are based on using a node dissimilarity measure in conjunction with a clustering method. We define modules as clusters of densely interconnected nodes. Although our method can be easily be adapted to any (dis-)similarity measure, our software implementation uses an extension of the topological overlap measure, which has been used in many network applications [1, 3, 7, 8, 11, 12]. Numerous clustering methods exist for pairwise dissimilarity measures. We review hierarchical clustering and partitioning around medoids in more detail since we compare them to the proposed MAST procedure.
Review of hierarchical clustering with the pairwise topological overlap measure
Before we describe the details of the MAST procedure, we method we first review a widely used module detection method which defines modules as branches of a hierarchical clustering tree. The standard approach for using the pairwise topological overlap measure for module detection is to use it as input of a hierarchical clustering procedure which results in a cluster tree (dendrogram) . Next clusters (modules) are defined as branches of the cluster tree. Toward this end, one can use the R package dynamicTreeCut which implements several branch cutting methods . Hierarchical clustering procedure has led to biologically meaningful modules in several applications [3, 8, 10, 11, 15–19], but it has one major limitation: it can only accommodate a pairwise dissimilarity measure. In contrast, our MAST procedure can accommodate a multi-node measure.
Review of partitioning around medoid clustering
Partitioning around medoids (PAM, ), also known as k-medoid clustering, is a widely used clustering method, which generalizes k-means clustering. Compared to the k-means approach that is based on a Euclidean distance measure, PAM accepts any pairwise dissimilarity measure. The number k of clusters (partitions) is a user defined parameter. The PAM-algorithm iteratively constructs representative objects (medoids) for each of the k clusters. After finding a set of k initial medoids, corresponding k clusters are constructed by assigning each observation to the nearest medoid. After a preliminary clusters assignment, new medoids are determined by minimizing the within cluster dissimilarity measure. The procedure is iterated until convergence to a (local) minimum.
Similar to hierarchical clustering PAM can only accept a pairwise dissimilarity measure. Another limitation of PAM is that it does not allow for un-assigned (un-clustered) nodes. In many gene co-expression network applications, there are thousands of background genes that should not be forced into a module [8, 14].
Review of CAST
The Cluster Affinity Search Technique (CAST)  has been found to be an attractive clustering procedure for module detection in gene expression data. CAST is a sequential procedure that defines clusters one at a time. Since we have network applications in mind, we will refer to the clusters as modules. After detecting a module, CAST removes the corresponding nodes from consideration and initializes the next module. Module detection proceeds by adding and removing nodes based on a similarity measure between the nodes and the module members. A node is added if its similarity to the module exceeds a user-defined threshold. At each iteration the 'loosest' node will be dropped if its similarity to the module falls below the threshold. Note that the threshold is constant for the whole clustering procedure, which is why we refer to it as global threshold.
Module Affinity Search Technique (MAST)
MAST extends the CAST procedure to module detection in a network. There are two major extensions. First, we propose to use a multi-node similarity measure instead of a pair-wise similarity since we have found evidence that a multi-node topological overlap measure (MTOM) can be more meaningful in network neighborhood analysis . The second extension (described below) describes a local stopping rule for module growth that is used along the global stopping rule.
The starting point of the MAST procedure is an initial set of seeds for network neighborhood analysis (described above). These seeds can be user-defined or the procedure will automatically choose them based on their network connectivity k(i) = sumj ≠ ia(ij). Since the automatic procedure uses the most highly connected nodes in a network as seeds, we often refer to the seeds informally as 'hub nodes'. Around these hub nodes, MAST defines an MTOM neighborhood of size 20 (referred to as hub neighborhood). We have found that the hub neighborhood size of 20 works well in our simulations and applications but this parameter value could be changed.
MAST stopping rules
After a hub neighborhood has been identified, MTOM neighborhood analysis is used to grow a preliminary module around it. An important question is when to stop the module growth. Here we propose two stopping rules. At each step of the module growth, the closest neighbor to the module is identified based on the MTOM measure. The module growth proceeds until the closest neighbor falls below a global or a local control threshold. Both stopping rules for module growth consider a measure of relative similarity between a node and a module. We will make use of the following notation. Lower case indices denote individual nodes (e.g node i); upper case indices denote modules (e.g. module K). We denote the set of indices of nodes within module K by C(K). The module size (i.e., the number of nodes in module K) is denoted by N(K).
In the following, we describe measure for assessing a) the average similarity of module nodes, b) the similarity between a node and a module, and c) the similarity between two modules.
The pairwise similarity between nodes i and j is denoted as s(i, j). Below, we choose the pairwise topological overlap measure for s(i, j).
Below, we will refer to this quantity informally as module tightness.
The first stopping rule, referred to as global control, stops module growth when the relative similarity between the module and its closest neighbor falls below a global threshold. Analogous to its use in CAST, the global stopping rule is used to prevent the addition of nodes that do not exceed a minimum similarity threshold with regard to the module. The Global Control assesses whether the relative similarity r Between (K, i) between module K and the closest node i is less than a the global threshold.
Steps of the MAST procedure
In the following, we outline the steps of the MAST procedure in more detail.
Step 1, find hub neighborhoods
The MAST procedure is based on finding MTOM neighborhoods around initial seed nodes. The software allows the user to specify an initial set of seed nodes or, as alternative, we have also implemented an automatic procedure for picking seed nodes. Since each node is part of a network, one can compute its connectivity (degree) to other nodes: k i = Σj ≠ ia(ij). We have found that seeds with high connectivity tend to produce more meaningful neighborhoods . Therefore, the first seed is chosen as the most highly connected 'hub' node in the network.
Next we use MTOM neighborhood analysis to arrive at a tight neighborhood around the hub node. We refer to the result as hub neighborhood. The default hub neighborhood size of 20 can be changed by the user. Using the stopping rules described above, the hub neighborhood is then grown to a preliminary module. If the size of the preliminary module is below a user-defined threshold, the initial hub node is rejected and the next most highly connected node will be chosen as seed. But if the size of the preliminary module passes the threshold, then the preliminary module nodes will not be considered when choosing a second seed. Thus, the second seed will be the most highly connected node outside the first preliminary module. This process of choosing seeds is repeated until all nodes have been considered.
Step 2, extend the hub neighborhoods to preliminary modules
In this step, the initial hub neighborhoods comprised of 20 nodes are grown into larger modules in a stepwise fashion. We have implemented a competitive module growth process where the modules compete for the remaining nodes. At each step, the algorithm finds the closest MTOM neighbor of each module. If a node is the closest neighbor for 2 or more modules, it will be assigned to the module with which it has the highest MTOM value. The module growth is subject to the stopping rules described above.
Step 3, merge the preliminary modules
Since some of the preliminary modules may be very similar, it can be advantageous to merge them into larger modules. The merging procedure computes the maximum relative between-module similarity across all possible pairs of modules. If the maximum is above a user-defined threshold T merge , the corresponding modules will be merged. The merging step is repeated until no pair of modules can be found with a relative between-module similarity larger than the threshold.
As outlined above, the MAST procedure allows the user to specify several parameters. Changing the parameters will lead to different modules. We find that the resulting modules are highly dependent on the initial seeds. While we allow the user to specify initial module seeds, we have implemented an automatic seed detection method in step 1. Other parameters are the threshold for the global control T global (step 1 and step 2), the threshold for the local control T local (step 2), the minimum module size, and the merging threshold T merge . The software uses the following default settings T global = 0.6, T local = 0.9999, and minimum module size of 50. In our applications, we illustrate how the results change as a function of different parameter settings. In general, the higher T global and the lower T local , the tighter will be the resulting modules.
Application to a gene co-expression network
with β ≥ 1. The adjacency in Equation (14) implies that the weighted adjacency a(ij) between two genes is proportional to their similarity on a logarithmic scale, log(a(ij)) = β × log(s(i, j)).
Comparing modules to known gene ontologies
We applied the MAST procedure to find modules in a subset of a brain cancer gene co-expression network [8, 12]. The gene co-expression network was constructed on the basis of 55 brain cancer microarrays. A weighted adjacency matrix was defined as follows a(ij) = |cor(x i , x j )|4 where x i and x j are the expression profiles of gene i and j, respectively. Our findings remain largely unchanged with regard to different choices of the power β = 4. To be able to relate modules to known gene ontologies, we restricted the analysis to the 520 genes with the following ontologies: apoptosis (205 genes), neurogenesis (229 genes) and DNA replication (86 genes). To assess the performance of different module detection methods, we compared module membership to the 3 known gene ontologies. We used the Rand index to measure agreement between the resulting modules and the gene ontologies. The Rand index is a widely used measure to evaluate the agreement between two partitions .
Here we present a simple simulation study for highlighting the differences between MAST, PAM, and average linkage hierarchical clustering. We do not claim that the simulated example represents a good approximation to real gene expression data. A more complex model is described below.
where the indicator function I(condition) equals 1 if the condition is true and 0 otherwise.
Simulation Study II
Here we provide a more complex simulation study. Similar to simulation study I, we specify seed genes that lie at the center of the module. We simulated 5 modules. The seeds of the first two modules m(1) and m(2) correspond to vectors whose entries were randomly chosen to be 0 or 1. We added noise from a standard normal distribution N(0, 1). The seed of modules 3 was given by m(3) = a × m(1) + ϵ3 where the coefficient a controls the dependence between the seeds and ϵ represents standard normal noise.
Here we simulated data with a handful of modules. To simulate more realistic gene expression data involving an arbitrary number of modules, future studies could make use of the R function simulateDatExpr in the WGCNA R package .
It is difficult to argue that one clustering procedure is better than another due to the lack of universally accepted benchmark data and a gold standard. We present simulation studies that illustrate that the proposed MAST procedure can be superior to hierarchical clustering and partitioning around medoids when dealing with simulated cluster data whose modules are generated around 'seed' nodes. In this situation, seed-based clustering method such as MAST are expected to perform well. Future comparisons should evaluate MAST on simulated cluster data that are not seed-centered.
Numerous clustering procedures have been proposed for the analysis of gene expression data [24–26]. Bi-clustering procedures cluster genes and samples simultaneously [27, 28]. Here we propose a clustering method that can be used for a multi-node dissimilarity measure. The proposed MAST procedure is well suited for the use of any multi-node dissimilarity measure. In our application and software, we use the multi-node topological overlap measure and the intuition of network neighborhood analysis described in  but it is straightforward to adapt the MAST procedure to any other multi-node measure. For example, by averaging a pairwise dissimilarity measure, one can easily define a multi-node measure as described above. Therefore, the MAST procedure can also be used for any pairwise dissimilarity measure.
Using a gene co-expression network application and several simulated examples we provide evidence that the MAST procedure can outperform hierarchical clustering and PAM. But a more comprehensive comparison with alternative methods is desirable. Several procedures for finding modules in networks have been proposed, e.g. Cytoscape implements several procedures . MAST can be used to find modules in gene expression data or other quantitative data after a correlation network is defined between the quantitative variables. When applied to a network (specified an adjacency matrix), the MAST procedure yields clusters of tightly interconnected modules. Since module sizes and module tightness can vary greatly in practice we have found it useful to implement an adaptive local stopping rule that considers the module growth history.
Our preliminary data (simulations and a co-expression network analysis) are encouraging but more comprehensive comparisons are needed. Our software implementation should facilitate an evaluation and comparison with other methods.
Availability and requirements
The MAST procedure is implemented in the MTOM software. Project name: MTOM Software
Project home page: http://www.genetics.ucla.edu/labs/horvath/MTOM/
Operating system(s): Windows
Programming language: C
Licence: GNU GPL 3
We would like to acknowledge the grant support from 1U19AI063603-01, 5P30CA016042-28, P50CA092131, and DK072206. The authors would like to thank UCLA collaborators Peter Langfelder, Jun Dong, Tova Fuller, Jake Lusis, Tom Drake, Dan Geschwind, Wen Lin, Paul Mischel, Mike Oldham, Anja Presson, and Wei Zhao for useful discussions.
- Li A, Horvath S: Network Neighborhood Analysis with the Multi-node Topological Overlap Measure. Bioinformatics. 2007, 23 (2): 222-31. 10.1093/bioinformatics/btl581.View ArticlePubMedGoogle Scholar
- Lin N, Zhao H: Are scale-free networks robust to measurement errors?. BMC Bioinformatics. 2005, 6: 119-10.1186/1471-2105-6-119.PubMed CentralView ArticlePubMedGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297 (5586): 1551-5. 10.1126/science.1073374.View ArticlePubMedGoogle Scholar
- Zhao W, Serpedin E, Dougherty E: Information theoretic method for recovering temporal gene regulations from time series microarray data. Bioinformatics. 2006, 22: 2129-2135. 10.1093/bioinformatics/btl364.View ArticlePubMedGoogle Scholar
- Chen J, Hsu W, Lee M, Ng S: Increasing confidence of protein interactomes using network topological metrics. Bioinformatics. 2006, 22: 1998-2004. 10.1093/bioinformatics/btl335.View ArticlePubMedGoogle Scholar
- Chua NH, Sung W, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from protein protein interactions. Bioinformatics. 2006, 22: 1623-1630. 10.1093/bioinformatics/btl145.View ArticlePubMedGoogle Scholar
- Yip A, Horvath S: Gene Network Interconnectedness and the Generalized Topological Overlap Measure. BMC Bioinformatics. 2007, 8: 22-10.1186/1471-2105-8-22.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang B, Horvath S: A General Framework for Weighted Gene Co-Expression Network Analysis. Stat Appl Genet Mol Biol. 2005, 4: Article17-10.2202/1544-6115.1128.PubMedGoogle Scholar
- Carlson MR, Zhang B, Fang Z, Horvath S, Mishel PS, Nelson SF: Gene Connectivity, Function, and Sequence Conservation: Predictions from Modular Yeast Co-expression Networks. BMC Genomics. 2006, 7 (40):Google Scholar
- Dong J, Horvath S: Understanding Network Concepts in Modules. BMC Systems Biology. 2007, 1: 24-10.1186/1752-0509-1-24.PubMed CentralView ArticlePubMedGoogle Scholar
- Oldham MC, Horvath S, Geschwind DH: Conservation and evolution of gene coexpression networks in human and chimpanzee brains. PNAS. 2006, 103 (47): 17973-17978. 10.1073/pnas.0605938103.PubMed CentralView ArticlePubMedGoogle Scholar
- Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M, Zhao W, Shu Q, Lee Y, Scheck A, Liau L, Wu H, Geschwind D, Febbo P, Kornblum H, TF C, Nelson S, Mischel P: Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target. Proc Natl Acad Sci U S A. 2006, 103 (46): 17402-17407. 10.1073/pnas.0608396103.PubMed CentralView ArticlePubMedGoogle Scholar
- Kaufman L, Rousseeuw PJ: Finding groups in data. 1990, Wiley, New YorkView ArticleGoogle Scholar
- Langfelder P, Zhang B, Horvath S: Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R. Bioinformatics. 2007, 24 (5): 719-20. 10.1093/bioinformatics/btm563.View ArticlePubMedGoogle Scholar
- Ghazalpour A, Doss S, Zhang B, Plaisier C, Wang S, Schadt E, Thomas A, Drake T, Lusis A, Horvath S: Integrating Genetics and Network Analysis to Characterize Genes Related to Mouse Weight. PLoS Genet. 2006, 2 (8): e130-10.1371/journal.pgen.0020008.PubMed CentralView ArticlePubMedGoogle Scholar
- Gargalovic P, Imura M, Zhang B, Gharavi N, Clark M, Pagnon J, Yang W, He A, Truong A, Patel S, Nelson S, Horvath S, Berliner J, Kirchgessner T, Lusis A: Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids. PNAS. 2006, 103 (34): 12741-6. 10.1073/pnas.0605457103.PubMed CentralView ArticlePubMedGoogle Scholar
- Fuller T, Ghazalpour A, Aten J, Drake T, Lusis A, Horvath S: Weighted gene coexpression network analysis strategies applied to mouse weight. Mammalian Genome. 2007, 18 (6–7): 463-472. 10.1007/s00335-007-9043-3.PubMed CentralView ArticlePubMedGoogle Scholar
- Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH: Functional organization of the transcriptome in human brain. Nature Neuroscience. 2008, 11 (11): 1271-1282. 10.1038/nn.2207.PubMed CentralView ArticlePubMedGoogle Scholar
- Presson A, Sobel E, Papp J, Suarez C, Whistler T, Rajeevan M, Vernon S, Horvath S: Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome. BMC Systems Biology. 2008, 2 (95):Google Scholar
- Ben-Dor A, Shamir R, Yahkini Z: Clustering gene expression patterns. Journal of Computational Biology. 1999, 6 (3/4): 281-297. 10.1089/106652799318274.View ArticlePubMedGoogle Scholar
- Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008, 9: 559-10.1186/1471-2105-9-559.PubMed CentralView ArticlePubMedGoogle Scholar
- Horvath S, Dong J: Geometric interpretation of Gene Co-expression Network Analysis. PLoS Computational Biology. 2008, 4 (8):Google Scholar
- Cohen J: A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960, 20: 37-46. 10.1177/001316446002000104.View ArticleGoogle Scholar
- Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. PNAS. 1998, 95 (25): 14863-14868. 10.1073/pnas.95.25.14863.PubMed CentralView ArticlePubMedGoogle Scholar
- Tseng VS, Kao CP: Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method. IEEE/ACM Trans Comput Biol Bioinformatics. 2005, 2 (4): 355-365. 10.1109/TCBB.2005.56.View ArticleGoogle Scholar
- Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics. 2006, 22 (19): 2405-2412. 10.1093/bioinformatics/btl406.View ArticlePubMedGoogle Scholar
- Madeira SC, Oliveira AL: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2004, 01: 24-45. 10.1109/TCBB.2004.2.View ArticleGoogle Scholar
- Turner HL, Bailey TC, Krzanowski WJ, Hemingway CA: Biclustering Models for Structured Microarray Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2005, 2 (4): 316-329. 10.1109/TCBB.2005.49.View ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research. 2003, 13 (11): 2498-2504. 10.1101/gr.1239303.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.