IMHRC

Dscription

IMHRC (Inter-Module Hub Removal Clustering) is a graph clustering algorithm that is developed based on inter-module hub removal in the weighted graphs which can detect overlapped clusters. Due to these properties, it is especially useful for detecting protein complexes in protein-protein interaction (PPI) networks with associated confidence values. IMHRC by removing some of the inter-module hubs and module hubs, eliminates a meaningful percentage of noise in the dataset and indirectly consider difference occurrence time of the PPI in the network. After removing hubs, some proteins are considered as seeds. Each seed creates a primary cluster. Then removed module hubs are added to the resulting clusters based on the amount of their interactions with other proteins in the clusters. Clusters are then merged based on their overlaps. IMHRC is available as a standalone command-line application.

Overview

Our algorithm provides an accurate and scalable method to detect and predict protein complexes from PPI networks. In our assessment four experimental yeast PPI datasets have been used which include Gavin (Gavin, et al., 2006), Collins (Collins, et al., 2007), Krogan Core and Krogan Extended (Krogan, et al., 2006). All these dataset were weighted. For evaluating the result of the methods, two gold standards were used as benchmarks, the MIPS catalog of protein complexes and the Gene Ontology based protein complex annotations from SGD. To assess the robustness of IMHRC against other complex detection algorithms, we selected seven of the best algorithms in this topic. These algorithms include: AP (Frey and Dueck, 2007), CFinder (Palla, et al., 2005), CMC (Liu, et al., 2009), MCL (Pereira‚ÄźLeal, et al., 2004), ClusterONE (Nepusz, et al., 2012), Core of RNSC (King, et al., 2004) and RRW (Macropol, et al., 2009).

Datasets and results:

Datasets

Gold standards

Algorithm for download:

IMHRC V1

The following is the options used to control IMHRC:

--back-penalty set the node back penalty value

--black-list specifies the black list exprestion

-d,--min-density specifies the minimum density of clusters

(default: auto)

--debug turns on the debug mode

--growth-penalty set the node growth penalty value

-h,--help shows this help message --max-overlap specifies the maximum allowed overlap between two clusters

-s,--min-size specifies the minimum size of clusters -smax,--max-size specifies the maximum size of clusters

-v,--version shows the version number

Example:

C:\ >java -jar IMHRC-V1.jar collins2007.txt -min-size 3 -max-size 10000 -black-list (0.008,0.012) -max-overlap 0.53 -growth-penalty 2.1 -back-penalty 2 -min-density 0.3

You should see the following output:

Loaded graph with 1622 nodes and 9074 edges

[====================] 100% Growing clusters from seeds...

[====================] 100% Supplementary Growing clusters ...

[====================] 100% Supplementary Growing clusters ...

[====================] 100% Finding highly overlapping clusters...

[====================] 100% Merging highly overlapping clusters...

New graph with 1622 nodes and 6306 edges has processed

Detected 188 complexes

+98 21 22431653
ch-eslahchi@sbu.ac.ir
Copyright 2016 Eslahchi Lab | All Rights Reserved