- Research note
- Open Access
Structural and functional annotation of hypothetical proteins of human adenovirus: prioritizing the novel drug targets
BMC Research Notes volume 10, Article number: 706 (2017)
Human adenoviruses are small double stranded DNA viruses that provoke vast array of human diseases. Next generation sequencing techniques increase genomic data of HAdV rapidly, which increase their serotypes. The complete genome sequence of human adenovirus shows that it contains large amount of proteins with unknown cellular or biochemical function, known as hypothetical proteins. Hence, it is indispensable to functionally and structurally annotate these proteins to get better understanding of the novel drug targets. The purpose was the characterization of 38 randomly retrieved hypothetical proteins through determination of their physiochemical properties, subcellular localization, function, structure and ligand binding sites using various sequence and structure based bioinformatics tools.
Function of six hypothetical proteins P03269, P03261, P03263, Q83127, Q1L4D7 and I6LEV1 were predicted confidently and then used further for structure analysis. We found that these proteins may act as DNA terminal protein, DNA polymerase, DNA binding protein, adenovirus E3 region protein CR1 and adenoviral protein L1. Functional and structural annotation leading to detection of binding sites by means of docking analysis can indicate potential target for therapeutics to defeat adenoviral infection.
Human adenoviruses are non-enveloped dsDNA viruses of almost 35 kb in size . HAdV can infect a variety of tissues and cause a wide range of complications like gastroenteritis, hepatitis, myocarditis, keratoconjunctivitis and pneumonia [2, 3]. It is contagion in nature which occurs through direct contact or fomites and virus is also resistant to various physical and chemical agents. Children younger than the age of 5 years and immune compromised persons especially the pediatric patients are most susceptible to these viruses. Worldwide 5–7% respiratory tract infections are ascribed by HAdV in pediatric patients  and persons of all ages are susceptible to infections caused by these viruses .
Seven known Human adenoviruses species from HAdV-A to HAdV-G are constitute of the genus Mastadenovirus in which all the human adenoviruses are categorized and further divided into different strains . Now 67 types of HAdV have been reported . Their number is rapidly increasing due to bioinformatics and genomic advances and availability of whole genome sequences [8, 9].
After an immense effort 50–60% genes have a known function in most of completely sequenced genomes. Number of genes having unknown functions called as hypothetical protein are present in each organism’s genome . To understand the biology and genome of the organisms, it is important to discover the function of hypothetical proteins, despite HAdV has a small size genome but still it has a several hypothetical proteins. So, in order to treat infectious diseases such as those caused by HAdV, functional annotation of these HPs might open avenues for prioritizing novel drug targets .
In-silico strategies to annotate the hypothetical proteins are cost effective and fast enough to explore their function. In this study, multiple algorithm based software’s have been used for the prediction of hypothetical protein function that may lead to the identification of novel pharmacological targets for screening, drug discovery and designing for the treatment of HAdV infections .
Proteins having unknown function of Human adenovirus were taken from UniProt [12, 13]. Random selection of 38 hypothetical proteins belonging to eight different types of HAdV was carried out (Additional file 1: Table S1). The sequence analysis was done by taking FASTA sequence of these proteins along with their UniProt ID. For characterization purposes, number of software based on different algorithms were used as shown in Fig. 1.
Analysis of physiochemical properties of all HPs was done by online server ExPASy’s Protparam tool . This server executes theoretical evaluation of physiochemical properties like isoelectric point, molecular weight, aliphatic index, grand average of hydropathicity (GRAVY) and instability index .
To predict the cellular function of a protein it is important to get information about its sub-cellular localization i.e. a protein can be present in outer membrane, inner membrane, periplasm, extracellular space or in cytoplasm . Sub-cellular localization of viral proteins were predicted using Virus-PLoc  online server tool , TMHMM [19, 20] and HMMTOP [21, 22].
Most basic step in the function prediction of a protein is looking for its structural homologs in different available genomics and proteomics based databases. Popular bioinformatics tool BLASTp was used for this purposes [23, 24].
Function and disulfide bridges prediction
For precise function annotation, various tools like SVMport, ProtNet [25, 26], Pfam, Motif [27, 28], CDART [24, 29], CATH [30, 31], SMART [32, 33], Superfamily [34, 35] and InterProscan [27, 36] were used that classified all 38 proteins of HAdV into families and subfamilies on the basis of their sequence, structure and function [16, 37]. DISULFIND  server was used to evaluate occurrence of disulfide bonds between cysteine residues .
Structure prediction and validation
Functions of proteins based on structural analysis are considered more acceptable as compared to sequence based function annotation, because homologous proteins show more conserved structures in evolution than sequences . For this purpose, we have used ProFunc  and COACH [40, 45].
Random selection of 38 hypothetical proteins belonging to eight different types of HAdV was carried out from UniProt (Additional file 1: Table S1). The amino acid length of 38 randomly selected proteins of eight different types of Human Adenovirus ranges from 1198 amino acids for longest protein to 81 amino acids for shortest protein (Additional file 1: Table S1). Protparam tool has been used for the prediction of physiochemical properties of all hypothetical proteins (Additional file 2: Table S2). Subcellular localization and transmembrane helix prediction software predicated most of the HPs to be localized in the host cytoplasm and a few in-host cell membrane and nucleus (Additional file 3: Table S3). Multiple softwares were used for the function prediction of 38 hypothetical proteins (Additional file 4: Table S4, Additional file 5: Table S5). Out of 38 proteins, 6 HP’s whose function was confidently predicted by ≥ 6 software’s were confidently selected (Table 1). Confidently function predicted HPs were further used for structure prediction, structure analysis and disulphide bridges prediction. The detailed results of structure prediction and analysis are shown in Additional file 6: Table S6 and Additional file 7: Table S7. DISULFIND was unable to find disulphide bonds in any of the HP’s and characterized them as thermally unstable proteins.
In this study, we carried out structural and functional annotation of 38 HPs of human adenovirus that is responsible for variety of clinical diseases. Physiochemical properties prediction showed that Isoelectric point  of HPs ranges from 4.1 to 12.43. Isoelectric point is pH at which the net charge on the protein is zero and at this pH the protein become less soluble, compact and stable that leads to crystallization of protein. So, the purification and crystallization of protein can be carried out by developing a buffer system with the help of computed pI [47, 48] (Additional file 2: Table S2).
The extinction coefficient of the HPs computed by Protparam tool ranges from 1490.0 to 179,580.0 M−1 cm−1 at 280 nm. This computed extinction coefficient can be helpful for quantitatively studying protein–ligand and protein–protein interaction. It is forecasted that if the instability index is less than 40 then a protein will be stable and if greater than 40 then it will be unstable. The instability index of 38 hypothetical proteins ranges from 20.1 to 106.56 and due to this only nine proteins are stable and rest is unstable. The GRAVY index of all proteins ranges from − 0.908 to 0.166 and out of 38 HPs, 32 HPs have negative GRAVY index which indicate that these proteins are non-polar in nature .
The detailed information about the functional and structural annotation for six hypothetical proteins is as follow:
P03269 is predicted as an adenoviral DNA terminal protein that performs function in the initiation of the viral DNA replication . This protein is covalently bound to the viral DNA and acts as a primer for viral genomic replication by DNA strand displacement . Seven software confidently predicted the function of this protein and Virus-PLoc server also confirmed its function by predicting its location in host nucleus. Predicted three-dimensional structure highest C-score − 2.25 (Additional file 8: Figure S1) was selected and structure verification through RAMACHANDRAN PLOT showed 76.9% residues are in most favored region and 18.8% residue are in additional allowed region (Additional file 9: Figure S2). For pharmaceutical and docking analysis, COACH has been used, out of many ligand binding sites, best ligand binding sites with maximum C-score were selected that can be used for further molecular docking analysis (Additional file 6: Table S6). Further structure based function analysis predicted adenoviral DNA terminal protein motif in HP P03269 and Ala159-Arg161, Gly558-Gly560, Leu406-Glu408, Gln241-Ala243 and Pro275-Arg277 structure motifs are also predicted to be conserved in this HP that may have a similar function (Additional file 7: Table S7). Gene Ontology analysis shows that HP P03269 may have role in the biological process of DNA replication, cellular process, cellular metabolic process, cellular biosynthetic process and biochemical function as DNA binding and nucleic acid binding.
P03261 belongs to Human adenovirus C serotype 2 and predicted to contain DNA polymerase type-B family catalytic domain and sub-cellularly localized in host nucleus. DNA-directed DNA polymerases has both exonucleases and polymerase activity and play role in the process of recombination, repair and DNA replication . Out of five 3D models predicted by I-TASSAR, structure with highest C-score (− 0.20) was selected (Additional file 10: Figure S3) and structure verification shows that 67.1% residues are in favored region and 25.7% residues are in additional allowed regions of RC-plot (Additional file 11: Figure S4). ProFunc server has predicted DNA polymerase family B signature.
Gene ontology analysis showed that this HP may play its role in DNA replication and cellular process and biochemically function in nucleotide binding, nucleic acid binding and DNA-directed DNA-polymerase activity. DNA polymerase type B, organellar and viral and DNA-directed DNA-polymerase family B signature motifs in the HP these results have further validated the results of sequence based function prediction. Five other structure motifs were also identified as Leu323-Asp326, His955-Leu957, Ser926-Pro928, Leu645-Pro647 and Lys850-Asn853 (Additional file 7: Table S7).
P03263 is predicted as an adenoviral protein L1 52/55-kDa and that perform multiple functions in DNA packaging by facilitating stable interactions between empty capsid and viral DNA through its expression both in the early and late stages of infection cycle  (Additional file 12: Figure S5).
Model with highest C-score − 3.74 was selected and structure validation shows that 69.6% residues are in favored regions and 23.2% resides are present in the additional allowed regions of RC-plot (Additional file 13: Figure S6). Functional analysis server has verified the results of sequence based function prediction by predicting adenoviral protein L1 52/55-kDa motif in HP P03263 along with three conserved structure motifs Ala105-Ala107, Glu7-Asp9 and Asp4-Glu6 (Additional file 7: Table S7). According to gene ontology results HP P03263, HPQ83127, HP I6LEV1 are involved in the biological process of virion assembly, anatomical structure formation, anatomical structure formation involved in morphogenesis and cellular component assembly involved in morphogenesis.
Q83127 is annotated as Adeno E3 region protein CR1 that is responsible for controlling the viral interactions with host . The virus-PLoc also confirmed that this protein is a transmembrane and HMMTOP predicts 2 helices in a membrane. Three-dimensional structure with highest C-score − 4.78 (Additional file 14: Figure S7) was selected and structure verification using SAVES shows that 42.9% residues are in favored region and 44.1% residues are in additional allowed regions (Additional file 15: Figure S8). HP contains Adenovirus E3 region protein CR2 and Adenovirus E3 region protein CR1 motifs along with one conserved structural motif Gln171-Pro173 (Additional file 7: Table S7).
Q1L4D7 is predicted as adenoviral protein L1 and confidence level for this HP is seven out of nine respectively. This protein expresses in both early and late stage of viral life cycle and plays multiple roles in DNA packaging . We have modeled its three-dimensional structure and out of five models with C-score − 4.53 (Additional file 16: Figure S9) was selected. RC-plot shows that 34.7% residues are in favored region and 46.0% are present in the additional allowed regions (Additional file 17: Figure S10) and contains two structure motifs Glu65-Ala67 and Val115-Gly117 (Additional file 7: Table S7).
I6LEV1 is also predicted as adenoviral protein L1 like HP Q1L4D7. Structure verification of model with C-score − 4.56 (Additional file 18: Figure S11) shows that 36.0% residues are in favored regions and 43.2% residues are in additional allowed regions of RC-plot (Additional file 19: Figure S12). Sequence based function prediction verified by structural analysis and predicted three structural motifs Leu22-Leu24, Val98-Glu100 and Arg126-His128 (Additional file 7: Table S7).
To summarize, this study helped to search functionality in the hypothetical proteins of human adenovirus whose exact role in the infectious cycle was still unknown. Finally, we may emphasize that quantitative computational analysis that is carried out in the present study, may help us in better understanding of the biology of adenovirus as a whole and identify potential therapeutic leads to molecular level and may facilitate better understanding of the human biology.
As our study is based on less sample size, increase sample size can provide more information about the function of HPs proteins and for identifying novel drug targets and this study is totally based on in silico analysis but through side by side wet lab analysis these proteins can be used for drug targeting analysis on experimental basis.
grand average of hydropathicity
iterative threading assembly refinement
Davison AJ, Benko M, Harrach B. Genetic content and evolution of adenoviruses. J Gen Virol. 2003;84(11):2895–908.
Robinson CM, Singh G, Lee JY, et al. Molecular evolution of human adenoviruses. Sci Rep. 2013;3:1812. http://doi.org/10.1038/srep01812.
Ramke M, et al. The 5′ UTR in human adenoviruses: leader diversity in late gene expression. Sci Rep. 2017;7(1):618.
Ghebremedhin B. Human adenovirus: viral pathogen with increasing importance. Eur J Microbiol Immunol. 2014;4(1):26–33.
Scott MK, et al. Human adenovirus associated with severe respiratory infection, Oregon, USA, 2013–2014. Emerg Infect Dis. 2016;22(6):1044.
Huang G, Xu W. Recent advance in new types of human adenovirus. Chin J Virol. 2013;29(3):342–8.
Li X, et al. An outbreak of acute respiratory disease in China caused by human adenovirus type B55 in a physical training facility. Int J Infect Dis. 2014;28:117–22.
Lion T. Adenovirus infections in immunocompetent and immunocompromised patients. Clin Microbiol Rev. 2014;27(3):441–62.
Sayers EW, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2011;39(suppl 1):D38–51.
Sivashankari S, Shanmughavel P. Functional annotation of hypothetical proteins—a review. Bioinformation. 2006;1(8):335–8.
Barragán-Osorio L, et al. Computational analysis and functional prediction of ubiquitin hypothetical protein: a possible target in Parkinson disease. Cent Nerv Syst Agents Med Chem. 2016;16(1):4–11.
Breuza L, Poux S, Estreicher A, et al. The UniProtKB guide to the human proteome. Database J Biol Databases Curation. 2016;2016:bav120. http://doi.org/10.1093/database/bav120.
Consortium, U. The universal protein resource (UniProt). Nucleic Acids Res. 2008;36(suppl 1):D190–5.
Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, Bairoch A. Protein identification and analysis tools on the ExPASy server. In: Walker JM, editor. The proteomics protocols handbook, Humana Press; 2005. pp. 571–607.
Gasteiger E, et al. Protein identification and analysis tools on the ExPASy server. Berlin: Springer; 2005.
Gazi MA, et al. Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: an in silico approach for prioritizing the targets. Gene. 2016;591(2):442–55.
Shen H-B, Chou K-C. Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers. 2007;85:233–40. http://doi.org/10.1002/bip.20640.
Shen HB, Chou KC. Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers. 2007;85(3):233–40.
Krogh A, et al. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305(3):567–80.
Zhou H, Zhou Y. Predicting the topology of transmembrane helical proteins using mean burial propensity and a hidden-Markov-model-based method. Protein Sci Publ Protein Soc. 2003;12(7):1547–55.
Welner S, Nielsen M, Rasmussen M, Buus S, Jungersen G, Larsen LE. Prediction and in vitro verification of potential CTL epitopes conserved among PRRSV-2 strains. Immunogenetics. 2017;69(10):689–702. http://doi.org/10.1007/s00251-017-1004-8.
Tusnady GE, Simon I. The HMMTOP transmembrane topology prediction server. Bioinformatics. 2001;17(9):849–50.
Mahram A, Herbordt MC. Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering. In: Proceedings of the 24th ACM international conference on supercomputing. New York: ACM; 2010.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.
Li YH, Xu JY, Tao L, et al. SVM-Prot 2016: a web-server for machine learning prediction of protein functional families from sequence irrespective of similarity. PLoS ONE. 2016;11(8):e0155290. http://doi.org/10.1371/journal.pone.0155290.
Sasson O, et al. ProtoNet: hierarchical classification of the protein space. Nucleic Acids Res. 2003;31(1):348–52.
Venkataraman A, Chew TH, Hussein ZAM, Shamsir MS. A protein short motif search tool using amino acid sequence and their secondary structure assignment. Bioinformation. 2011;7(6):304–306.
Bateman A, et al. The Pfam protein families database. Nucleic Acids Res. 2004;32(suppl 1):D138–41.
Geer LY, et al. CDART: protein homology by domain architecture. Genome Res. 2002;12(10):1619–23.
Knudsen M, Wiuf C. The CATH database. Hum Genom. 2010;4(3):207–12. http://doi.org/10.1186/1479-7364-4-3-207.
Pearl FM, et al. The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci. 2002;11(2):233–44.
Letunic I, Doerks T, Bork P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2012;40(D1):D302–5.
Schultz J, Copley RR, Doerks T, Ponting CP, Bork P. SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acid Res. 2000;28(1):231–4.
Wilson D, et al. SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 2009;37(suppl 1):D380–6.
Wilson D, Madera M, Vogel C, Chothia C, Gough J. The SUPERFAMILY database in 2007: families and functions. Nucleic Acid Res. 2007;35(Database issue):D308–13. http://doi.org/10.1093/nar/gkl910.
Zdobnov EM, Apweiler R. InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17(9):847–8.
Rentzsch R, Orengo CA. Protein function prediction using domain families. BMC Bioinform. 2013;14(Suppl 3):S5. http://doi.org/10.1186/1471-2105-14-S3-S5.
Ceroni A, Passerini A, Vullo A, Frasconi P. DISULFIND: a disulfide bonding state and cysteine connectivity prediction server. Nucleic Acid Res. 2006;34(Web Server issue):W177–81. http://doi.org/10.1093/nar/gkl266.
Ceroni A, et al. DISULFIND: a disulfide bonding state and cysteine connectivity prediction server. Nucleic Acids Res. 2006;34(suppl 2):W177–81.
Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I-TASSER suite: protein structure and function prediction. Nat Methods. 2015;12:7–8.
Naveed M, et al. Bioinformatics based structural characterization of glucose dehydrogenase (gdh) gene and growth promoting activity of Leclercia sp. QAU-66. Braz J Microbiol. 2014;45(2):603–11.
Wang W, Xia M, Chen J, et al. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum. Data Br. 2016;9:345–348. http://doi.org/10.1016/j.dib.2016.05.025.
Naveed M, et al. In-silico analysis of non-synonymous-SNPs of STEAP2: to provoke the progression of prostate cancer. Open Life Sci. 2016;11(1):402–16.
Kumar K, et al. Structure-based functional annotation of hypothetical proteins from Candida dubliniensis: a quest for potential drug targets. 3 Biotech. 2015;5(4):561–76.
Laskowski RA, Watson JD, Thornton JM. ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 2005;33(suppl 2):W89–93.
Gustin KE, Lutz P, Imperiale MJ. Interaction of the adenovirus L1 52/55-kilodalton protein with the IVa2 gene product during infection. J Virol. 1996;70(9):6463–7.
Kantardjieff KA, Rupp B. Protein isoelectric point as a predictor for increased crystallization screening efficiency. Bioinformatics. 2004;20(14):2162–8.
School K, et al. Predictive characterization of hypothetical proteins in Staphylococcus aureus NCTC 8325. Bioinformation. 2016;12(3):209.
Islam M, et al. In silico structural and functional annotation of hypothetical proteins of Vibrio cholerae O139. Genom Inform. 2015;13(2):53–9.
Tamanoi F, Stillman BW. Function of adenovirus terminal protein in the initiation of DNA replication. Proc Natl Acad Sci. 1982;79(7):2221–5.
Lieber A, He C-Y, Kay MA. Adenoviral preterminal protein stabilizes mini-adenoviral genomes in vitro and in vivo. Nat Biotechnol. 1997;15(13):1383–7.
Garg P, Burgers PM. DNA polymerases that propagate the eukaryotic DNA replication fork. Crit Rev Biochem Mol Biol. 2005;40(2):115–28.
Deryckere F, Burgert H-G. Early region 3 of adenovirus type 19 (subgroup D) encodes an HLA-binding protein distinct from that of subgroups B and C. J Virol. 1996;70(5):2832–41.
MN, ST, MU, ZC and GA carried out characterization of hypothetical proteins following methodology designed by ST. ST and MU wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
All data generated or analyzed during this study are included in this published article as additional information files.
Consent to publish
Ethics approval and consent to participate
The data used in this study was retrieved from online publicly available database so no ethical approval was required as not use any living organism.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.