Computational prediction of protein-protein complexes
© Mishra; licensee BioMed Central Ltd. 2012
Received: 13 October 2011
Accepted: 5 July 2012
Published: 9 September 2012
Protein-protein interactions form the core of several biological processes. With protein-protein interfaces being considered as drug targets, studies on their interactions and molecular mechanisms are gaining ground. As the number of protein complexes in databases is scarce as compared to a spectrum of independent protein molecules, computational approaches are being considered for speedier model derivation and assessment of a plausible complex. In this study, a good approach towards in silico generation of protein-protein heterocomplex and identification of the most probable complex among thousands of complexes thus generated is documented. This approach becomes even more useful in the event of little or no binding site information between the interacting protein molecules.
A plausible protein-protein hetero-complex was fished out from 10 docked complexes which are a representative set of complexes obtained after clustering of 2000 generated complexes using protein-protein docking softwares. The interfacial area for this complex was predicted by two “hotspot” prediction programs employing different algorithms. Further, this complex had the lowest energy and most buried surface area of all the complexes with the same interfacial residues.
For the generation of a plausible protein heterocomplex, various software tools were employed. Prominent are the protein-protein docking methods, prediction of ‘hotspots’ which are the amino acid residues likely to be in an interface and measurement of buried surface area of the complexes. Consensus generated in their predictions lends credence to the use of the various softwares used.
KeywordsProtein-protein complex prediction Protein-protein interface Unbound protein-protein docking HHsearch ZDOCK ClusPro MetaPPISP Optimal docking area Surface racer
Protein-protein interactions (PPIs) form the hallmark of several biological processes. Recent years are witnessing the emergence of protein-protein complexes as prospective drug targets. Studies on protein-protein complexes in Protein Data Bank show distinction between complexes formed by identical (homocomplex) or non-identical (heterocomplex) protein molecules, between obligate and non-obligate (non-obligate are those heterocomplexes in which the interacting partners are not co-localized initially) complexes, and between transient and permanent complexes depending upon the complex’s lifetime; although many PPIs do not fall into distinct types. Protein-protein contacts between these distinct types of complexes differ in terms of surface complementarities, steric, electrostatic, hydrophobic and hydrogen-bonding forces, accessible surface area, residue propensity and planarity[2, 3].
Despite high-throughput experimental efforts in proteomics, the number of interacting protein complexes in databases remains low. In silico protein-protein interaction studies that were scarce earlier, primarily due to protein folding problem being impregnable to a practical solution, are gaining ground in recent times because of advances in the accuracy of prediction through computational tools. Here, an attempt has been made to develop an approach that can be utilized in computational protein-protein interaction studies between any two interacting protein heterocomplexes. As an example for the elucidation of ways and means towards in silico exploration, a hypothetical protein, HP986, found in H. pylori that binds to tumor necrosis factor receptor 1 (TNFR1) as observed by surface plasmon resonance was studied.
A combination of several bioinformatics tools was implemented towards HP986-TNFR1 complex prediction. The softwares and web servers were carefully chosen based upon their wide use in literature as evidenced through PubMed search, their consistently high performance in Critical Assessment of Protein Structure Prediction (CASP) and Critical Assessment of PRediction of Interactions (CAPRI) community-wide comparative evaluations as well as some preliminary validation studies using known crystal structures of protein complexes. In this validation study, the programs Optimal Docking Area (ODA) and ZDOCK2.3 correctly identified the binding interface of some published experimental complex structures (Data not shown). Computational programs such as HHsearch/HHpred in SWISS-MODEL workspace for HP986 model generation, ZDOCK2.3 and ClusPro for unbound protein-protein docking, Optimal Docking Area and MetaPPISP for the prediction of interfacial residues and Surface Racer program for the calculation of buried surface area were used.
HHsearch/HHpred programs implemented in SWISS-MODEL workspace are sensitive techniques for remote homologue detection if high homology is not found between the target and template proteins. This is so because they are based on the pairwise comparison of profile hidden Markov models (HMMs). Profile HMMs contain information about the frequency of insertions and deletions at each column in addition to the amino acid frequencies in the columns of a multiple sequence alignment, thereby improving sensitivity significantly. Not surprisingly, in the recent-most CASP9 result, HHpred was ranked first among the automatic structure prediction servers in template-based modeling.
ZDOCK web server has consistently performed well in several CAPRI rounds and is also implemented in the commercial Accelrys’ Discovery Studio software. Based upon Fast Fourier Transform correlation, this rigid-body protein-protein docking technique generates about 2000 complexes which can be clustered together using ClusPro for ease of analyses. After complexes with favorable surface complementarities are retained, these are filtered to select those complexes with good electrostatic and desolvation free energies. ClusPro then generates cluster centers that are a representative set of complexes that form a cluster. The cluster centers are ranked according to cluster sizes.
MetaPPISP and Molsoft’s Optimal Docking Area (ODA) tools are used to predict the interfacial residues in a protein-protein complex. These two softwares are based on different algorithms and a consensus interface generated from these could be used to identify the possible docking site. Meta-PPISP is built up using cons-PPISP, Promate and PINUP individual servers, each using different attributes for prediction, hence representing a consensus. It uses an amino acid sequence as input and outputs a list of residues likely to be in an interface. ODA tool uses a 3dimensional (3D) structure as an input. It generates surface patches of different sizes in a protein and calculates the docking surface energy of these patches. This docking surface energy is based on atomic accessible surface area (ASA) of the component residues. In a recent paper published in Nucleic Acids Research, ODA was used to identify binding sites for spTranslin with itself as well as spTRAX which was supported by experimental evidence.
Materials and methods
Website addresses of the softwares and web servers employed in the studies
Swiss PDB Viewer
Optimal Docking Area
Molecular visualization and general analyses on the model were done using DeepView version 4.0 and Accelrys’ ViewerLite 4.2. For model validation (Ramachandran plot calculations), the PROCHECK tool available with PDBsum program was used. Secondary structure prediction was done with the program Jpred3.
Molsoft ICM Browser was used to visualize the ODA (Optimal Docking Area) identified for TNFR1 and HP986 model using the online ODA tool. The regions likely to be involved in an interface are denoted as red spheres whereas those not likely to be in an interfacial area are denoted as blue spheres. Protein-protein interaction site prediction was done using MetaPPISP. Calculations for minimization energy and interacting residues within 4.5 Å of those in another protein were done using DeepView version 4.0. Solvent accessible surface area (SASA) was calculated using the program Surface Racer 3.0. Buried surface area (BSA) was calculated according to the following formula: [SASA(Receptor) + SASA(ligand–SASA(receptor + ligand)]/2.
Protein-protein docking was performed with the web version of ZDOCK 2.3. In the crystal structure, the unliganded TNFR1 (PDB ID: 1NCF) exists as a dimer, and therefore only one molecule of TNFR1 (receptor) was taken for unbound protein-protein docking with HP986 model (ligand). The 2000 predictions returned by ZDOCK 2.3 were clustered using ClusPro to identify a representative set of complexes. 10 such complexes were returned with the highest ranking (first) complex representing the largest population size.
Results and discussion
Protein structure prediction
Because the experimental 3-D structure of the HP986 protein is not available, the 3-D model was built using SWISS-MODEL in an automated mode. No significant hits with proteins in the database with a high homology level were identified in a simple BLASTp search. The template identified through the HHsearch method implemented in SWISS-MODEL workspace was 1XMXA (ExPDB code, ExPDB is a template library extracted from PDB, Protein Data Bank).
HHsearch/HHpred program implemented in SWISS-MODEL workspace works as follows: To detect distantly related template structures, a target sequence can be searched against a hidden Markov model (HMM) based template library. Each HMM of the library is based on a multiple sequence alignment of the template sequence built by PSI-BLAST search enriched with secondary structure assignment. In the latest Critical Assessment of Protein Structure Prediction 9 (CASP9) result, HHpred was ranked first in automatic structure prediction servers in template-based modeling, thereby enhancing confidence in the model’s reliability. This template was also identified consistently using Phyre structure prediction program.
The unbound protein-protein docking was carried out using ZDOCK2.3 with default parameters. 2000 predictions were generated using TNFR1 (PDB ID 1NCF) as receptor and HP986 model as ligand. The 2000 complexes generated from ZDOCK were submitted to ClusPro in order to cluster them. 10 cluster centers were returned by ClusPro, with the first ranked cluster center containing the highest number of complexes.
A list of putative interacting residues in the protein-protein interface for TNFR1 and HP986 proteins using a combination of Meta-PPISP and ODA tools
C104(117), S105(118), L106(119), L108(121),N109 (122), T111(124) H113(126), L114(127), C116(129), N121(134) to E136(149)
D1(97), F2(98), R3(99), K4(100), Y5(101), I6(102), I7(103), G9(105) to F11(107), E13(109), Y14(110), Y16(112) to E18(114), L20(116), R32(128) to I36(132), K72(168), L76(172), I104(200), D105(201), I124(220)
There were no steric clashes, since the complexes are subjected to CHARMm minimization by the ClusPro program itself. However, the complexes 1, 5 and 10 were further subjected to a short minimization using DeepView with Gromos96 force field in vacuo and the minimization energy score was determined for comparison. Buried surface area (BSA) of the complexes was calculated using Surface Racer program to identify the complex having the largest contact area between the two proteins.
Minimization energy (in kJ/mol) and buried surface area (in Å 2 ) values and interacting residues within 4.5 Å of residues in another protein in three complexes returned by ClusPro
Minimization Energy (kJ/mol)
Buried Surface Area* (Å2)
C104, S105, L106, L108, N109, T111, H113, L114, C116
R3, I7, L76
Same as above
D1, R3, K4, Y5, I7, G9, W10, E13, R32, L33, N34, M35, I36, L76
Same as above
This paper delves on the approach taken towards the prediction of the most plausible protein-protein heterocomplex from thousands of complexes generated and in the event of little or no information available for the interface of the two binding partners. It is interesting to note that all the different prediction tools used here, with either the sequence or the structure as inputs, were consensual in the results generated. This lends greater confidence in the approach used. Structurally, HP986 protein domain seems to belong to alpha + beta protein fold family, whereas its interacting partner, TNFR1, is an elongated all-beta structure. Simulations to model the conformational changes of interacting proteins may include molecular dynamics studies on protein mutants that provide a valuable insight into the investigation of conformational behaviour and dynamics of a particular protein. The question of the accuracy of complex prediction remains dependent on the experimental verification. There are reports on the experimental verification using these tools, a recent one is presented by Eliahoo et al. (2010). The approach taken here can be utilized towards the in-silico characterization of any protein-protein hetero-complex which can help generate hypotheses for experimental work later on.
Hypothetical protein found in Helicobacter pylori
Tumor necrosis factor receptor 1
Schizosaccharomyces pombe protein translin
Translin paralog associated with translin
Hypothetical protein from Vibrio cholerae.
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci U S A. 1996, 93: 13-20. 10.1073/pnas.93.1.13.PubMedPubMed CentralView ArticleGoogle Scholar
- Nooren IMA, Thornton JM: Diversity of protein–protein interactions. EMBO J. 2003, 22: 3486-3492. 10.1093/emboj/cdg359.PubMedPubMed CentralView ArticleGoogle Scholar
- Archakov AI, Govorun VM, Dubanov AV, Ivanov YD, Veselovsky AV, Lewi P, Janssen P: Protein-protein interactions as a target for drugs in proteomics. Proteomics. 2003, 3: 380-391. 10.1002/pmic.200390053.PubMedView ArticleGoogle Scholar
- Alvi A, Ansari SA, Ehtesham NZ, Rizwan M, Devi S, Sechi LA, Qureshi IA, Hasnain SE, Ahmed N: Concurrent proinflammatory and apoptotic activity of a helicobacter pylori protein (HP986) points to its role in chronic persistence. PLoS One. 2011, 6 (7): e22530-10.1371/journal.pone.0022530.PubMedPubMed CentralView ArticleGoogle Scholar
- Söding J: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005, 21: 951-960. 10.1093/bioinformatics/bti125.PubMedView ArticleGoogle Scholar
- Arnold K, Bordoli L, Kopp J, Schwede T: The SWISS-MODEL Workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006, 22: 195201-View ArticleGoogle Scholar
- Chen R, Li L, Weng Z: ZDOCK: an initial-stage protein docking algorithm. Proteins Struct Funct Genet. 2003, 52: 80-87. 10.1002/prot.10389.PubMedView ArticleGoogle Scholar
- Comeau SR, Gatchell DW, Vajda S, Camacho CJ: ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics. 2004, 20: 45-50. 10.1093/bioinformatics/btg371.PubMedView ArticleGoogle Scholar
- Qin SB, Zhou H-X: meta-PPISP: a meta web server for protein-protein interaction site prediction. Bioinformatics. 2007, 23: 3386-3387. 10.1093/bioinformatics/btm434.PubMedView ArticleGoogle Scholar
- Fernandez-Recio J, Totrov M, Skorodumov C, Abagyan R: Optimal docking area: a new method for predicting protein-protein interaction sites. Proteins. 2005, 58: 134-143.PubMedView ArticleGoogle Scholar
- Eliahoo E, Yosef RB, Pérez-Cano L, Fernández-Recio J, Glaser F, Manor H: Mapping of interaction sites of the Schizosaccharomyces pombe protein Translin with nucleic acids and proteins: a combined molecular genetics and bioinformatics study. Nucleic Acids Res. 2010, 38: 1-15. 10.1093/nar/gkp829.View ArticleGoogle Scholar
- Kelley LA, Sternberg MJE: Protein structure prediction on the web: a case study using the Phyre server. Nat Protoc. 2009, 4: 363-371.PubMedView ArticleGoogle Scholar
- Laskowski RA, Chistyakov VV, Thornton JM: PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res. 2005, 33: D266-D268.PubMedPubMed CentralView ArticleGoogle Scholar
- Cole C, Barber JD, Barton GJ: The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 2008, 36: W197-W201. 10.1093/nar/gkn238. Web Server issuePubMedPubMed CentralView ArticleGoogle Scholar
- Tsodikov OV, Record MT, Sergeev YV: A novel computer program for fast exact calculation of accessible and molecular surface areas and average surface curvature. J Comput Chem. 2002, 23: 600-609. 10.1002/jcc.10061.PubMedView ArticleGoogle Scholar
- Purohit R, Rajendran V, Sethumadhavan R: Studies on adaptability of binding residues and flap region of TMC-114 resistance HIV-1 protease mutants. J Biomol Struct Dyn. 2011, 29: 137-152. 10.1080/07391102.2011.10507379.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.