While most clustering procedures use a pairwise dissimilarity (distance) measure as input, we present a clustering procedure that can accommodate a multipoint dissimilarity measure d(i_{1}, i_{2}, ..., i_{
P
}) where P > 1 is the number of points and the indices i_{
k
}= 1, ..., n run over the n objects. Since we are mainly interested in a network application, we will refer to the objects as nodes and the corresponding measure as multinode dissimilarity.
A multinode (Ppoint) dissimilarity measure d(i_{1}, i_{2}, ..., i_{
P
}) is defined to satisfy the following properties:

i)
it takes on nonnegative values, i.e.
ii) it equals 0 when all indices are equal, i.e.,
iii) it is symmetric with respect to index permutations, i.e.
Note that this definition reduces to that of a pairwise dissimilarity when P = 2.
Several approaches can be used to define a multinode dissimilarity measure. For example, a pairwise dissimilarity measure d(i_{1}, i_{2}) gives rise to multinode dissimilarity measure by averaging all pairwise dissimilarities, e.g. a 3 node measure can be defined as follows
The proposed module affinity search technique (MAST) can be used for any pairwise or multinode dissimilarity measure. But our particular focus is the multinode topological overlap measure (reviewed below) since it was found useful in network neighborhood analysis [1]. The topological overlap measure is defined for weighted and unweighted networks that can be represented by a (weighted) adjacency matrix A = [a(ij)], i.e. a symmetric similarity matrix with entries between 0 and 1. In an unweighted network, a(ij) = 1 if nodes i and j are connected and 0 otherwise. In a weighted network, 0 ≤ a(ij) ≤ 1 encodes the pairwise connection strength. Examples of such networks include gene coexpression networks and physical interaction networks.
A simple approach for measuring the dissimilarity between nodes i and j is to define dissA(ij) = 1  a(ij). However, this measure is not very robust with respect to erroneous adjacencies. Spurious or weak connections in the adjacency matrix may lead to 'noisy' networks and modules [2]. Therefore, we and others have explored the use of dissimilarity measures that are based on common interacting partners or on topological metrics [3–7]. In this article, we will define modules as sets of nodes that have high topological overlap.
Review of the pairwise topological overlap measure
The topological overlap of two nodes reflects their similarity in terms of the commonality of the nodes they connect to. Two nodes have high topological overlap if they are connected to roughly the same group of nodes in the network (i.e. they share the same neighborhood). In an unweighted network, the definition of the pairwise topological overlap measure in the supplementary material of [3] can be expressed in set notation as
where N(i_{1}, i_{2}) denotes the set of common neighbors shared by i_{1} and i_{2}, N (i_{1},  i_{2}) denotes the set of the neighbors of i_{1} excluding i_{2} and · denotes the number of elements (cardinality) of its argument. The Binomial coefficient = 1 in the denominator of Eq. (2) is an upper bound of a(i_{1}i_{2}). One can show that 0 ≤ a(i_{1}i_{2}) ≤ 1 implies 0 ≤ t(i_{1}i_{2}) ≤ 1 [1, 8, 9], i.e. t(i_{1}i_{2}) can be considered a normalized measure of the number of shared direct neighbors. In the following, we will review how to extend this pairwise measure to multiple nodes.
Review of the multinode topological overlap measure
The topological overlap measure presented in Eq. (2) is a pairwise similarity measure. While pairwise similarities are widely used in clustering procedures, we have shown that it can be advantageous to consider multinode similarity measures [1]. There are several ways to extend this measure to a multinode similarity. The first extension (referred to as average pairwise extension) is to simply average the pairwise measure across all pairwise index selections as described in Eq. (1). The second extension (referred to as stringent MTOM extension) keeps track of the numbers of neighbors that are truly shared among multiple nodes. For example, the MTOM involving 3 different nodes i_{1},i_{2},i_{3} is defined as
where N(i_{1}, i_{2}, i_{3}) is the set of the neighbors shared by i_{1}, i_{2} and i_{3} and N (i_{1}, i_{2},  i_{3}) is the set of the neighbors shared by i_{1} and i_{2} excluding i_{3}. The Binomial coefficient = 3 in the denominator of Eq. (3) is the upper bound of a(i_{1}i_{2}) + a(i_{1}i_{3}) + a(i_{2}i_{3}) and equals the number of connections that can be formed between i_{1}, i_{2}, and i_{3}. One can easily prove 0 ≤ a(i, j) ≤ 1 implies that 0 ≤ t(i_{1}i_{2}i_{3}) ≤ 1. The stringent MTOM extension is related to the average pairwise TOM extension (Eq. 1) but it puts particular weight onto neighbors that are truly shared between the nodes. Below, we briefly compare of the two multinode extensions (and corresponding dissimilarity measures).
MTOM for weighted networks
To extend the MTOM measure to weighted networks, we express the set notation in terms of the entries of the adjacency matrix as follows
Note that these formulas remain mathematically well defined for a weighted adjacency matrix 0 ≤ a(i_{1}i_{2}) ≤ 1. Therefore, it is natural to use these algebraic formulas for defining the MTOM measure for a weighted networks. It is straightforward to extend the above equations for P = 3 nodes to more than 3 nodes, e.g. our MAST software can deal with any integer P > 1.
Since the topological overlap measure accounts for connections to shared neighbors, it is particularly useful in the context of sparse network, e.g. proteinprotein interaction networks [3, 8, 10]. A comparison of the topological overlap measure to alternative measures (e.g. adjacency and correlationbased measures) can be found in [1, 7].
Comparison of MTOM based dissimilarity measures
The MTOM measures is a similarity measure that has an upper bound of 1. To turn it into a dissimilarity we subtract it from 1. For example, a twonode dissimilarity is given by
and a three node dissimilarity measures are given by
As outlined in Eq. (1), an alternative 3node dissimilarity can be defined as follows
On real data, we find that the two 3 point dissimilarity measures (Eq. 5) and (Eq. 6) are highly correlated (r ≈ .9) across index triplets with distinct indices, i.e. when i_{1} ≠ i_{2} ≠ i_{3}. But if 2 indices are equal and different from the third index, (e.g. when i_{1} = i_{2} ≠ i_{3} which entails dissTOM 2(i_{1}, i_{2}) = 0) then the average pairwise dissimilarity (Eq. 6) takes on substantially lower values than dissimilarity (Eq. 5). Across all (possibly nondistinct) index triplets, we find a weak correlation (r ≈ 0.3) between the two 3 point dissimilarity measures. In the following, we proceed with the nonpairwise dissimilarity measures (e.g. Eq. 5).
Review of network neighborhood analysis using MTOM
In a previous publication, we have shown that the multinode topological overlap measure (MTOM) can perform better than a pairwise measure in the context of network neighborhood analysis [1]. Since network neighborhood analysis is an important step in our proposed MAST procedure for module detection, we will review it in the following. Neighborhood analysis aims to find a set of nodes (the neighborhood) that is similar to an initial 'seed' set of nodes. Intuitively speaking, a neighborhood is composed of nodes that are highly connected to the given seed set of nodes. Since many of our applications involve networks comprised of genes, we will use the words 'node' and 'gene' interchangeably in this article. Neighborhood analysis facilitates a guiltbyassociation screening strategy for finding genes that interact with a given set of biologically interesting genes. Informally, we refer to the resulting neighborhood as MTOM neighborhood. The MTOMbased neighborhood analysis requires as input an initial seed neighborhood composed of S_{0} ≥ 1 node(s) and the requested final size of the neighborhood S_{
t
}= S_{0} + S, where S is the number of nodes that will be added to the initial neighborhood. For each node outside of the initial (seed) neighborhood, the MTOM software computes the MTOM value with the initial neighborhood. Next the S nodes with the highest MTOM values are selected and added to the initial seed neighborhood.
Computational Speed
In the following, we report computation times for carrying out MTOM neighborhood analysis on a computer with the following specifications: Intel Pentium 4 CPU 2.40 GHz 2.39 GHz, 1.00 G of RAM. To search for 10 neighbors of a gene, MTOM required 10 seconds in a network comprised of 2000 nodes, 2 minutes in a network of 5000 genes, 12 minutes in a network comprised of 10 k genes, and over 1 hour in a network comprised of 150 k genes.
Finding modules (clusters) in a network
The detection of modules in a network is an important task in systems biology since genes and their protein products carry out cellular processes in the context of functional modules. Here we focus on module identification methods that are based on using a node dissimilarity measure in conjunction with a clustering method. We define modules as clusters of densely interconnected nodes. Although our method can be easily be adapted to any (dis)similarity measure, our software implementation uses an extension of the topological overlap measure, which has been used in many network applications [1, 3, 7, 8, 11, 12]. Numerous clustering methods exist for pairwise dissimilarity measures. We review hierarchical clustering and partitioning around medoids in more detail since we compare them to the proposed MAST procedure.
Review of hierarchical clustering with the pairwise topological overlap measure
Before we describe the details of the MAST procedure, we method we first review a widely used module detection method which defines modules as branches of a hierarchical clustering tree. The standard approach for using the pairwise topological overlap measure for module detection is to use it as input of a hierarchical clustering procedure which results in a cluster tree (dendrogram) [13]. Next clusters (modules) are defined as branches of the cluster tree. Toward this end, one can use the R package dynamicTreeCut which implements several branch cutting methods [14]. Hierarchical clustering procedure has led to biologically meaningful modules in several applications [3, 8, 10, 11, 15–19], but it has one major limitation: it can only accommodate a pairwise dissimilarity measure. In contrast, our MAST procedure can accommodate a multinode measure.
Review of partitioning around medoid clustering
Partitioning around medoids (PAM, [13]), also known as kmedoid clustering, is a widely used clustering method, which generalizes kmeans clustering. Compared to the kmeans approach that is based on a Euclidean distance measure, PAM accepts any pairwise dissimilarity measure. The number k of clusters (partitions) is a user defined parameter. The PAMalgorithm iteratively constructs representative objects (medoids) for each of the k clusters. After finding a set of k initial medoids, corresponding k clusters are constructed by assigning each observation to the nearest medoid. After a preliminary clusters assignment, new medoids are determined by minimizing the within cluster dissimilarity measure. The procedure is iterated until convergence to a (local) minimum.
Similar to hierarchical clustering PAM can only accept a pairwise dissimilarity measure. Another limitation of PAM is that it does not allow for unassigned (unclustered) nodes. In many gene coexpression network applications, there are thousands of background genes that should not be forced into a module [8, 14].
Review of CAST
The Cluster Affinity Search Technique (CAST) [20] has been found to be an attractive clustering procedure for module detection in gene expression data. CAST is a sequential procedure that defines clusters one at a time. Since we have network applications in mind, we will refer to the clusters as modules. After detecting a module, CAST removes the corresponding nodes from consideration and initializes the next module. Module detection proceeds by adding and removing nodes based on a similarity measure between the nodes and the module members. A node is added if its similarity to the module exceeds a userdefined threshold. At each iteration the 'loosest' node will be dropped if its similarity to the module falls below the threshold. Note that the threshold is constant for the whole clustering procedure, which is why we refer to it as global threshold.