- Research note
- Open access
- Published:
Fe(2)OG: an integrated HMM profile-based web server to predict and analyze putative non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenase function in protein sequences
BMC Research Notes volume 14, Article number: 80 (2021)
Abstract
Objective
Non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenases (i2OGdd), are a taxonomically and functionally diverse group of enzymes. The active site comprises ferrous iron in a hexa-coordinated distorted octahedron with the apoenzyme, 2-oxoglutarate and a displaceable water molecule. Current information on novel i2OGdd members is sparse and relies on computationally-derived annotation schema. The dissimilar amino acid composition and variable active site geometry thereof, results in differing reaction chemistries amongst i2OGdd members. An additional need of researchers is a curated list of sequences with putative i2OGdd function which can be probed further for empirical data.
Results
This work reports the implementation of \(Fe\left(2\right)OG\), a web server with dual functionality and an extension of previous work on i2OGdd enzymes \(\left(Fe\left(2\right)OG\equiv \{H2OGpred,DB2OG\}\right)\). \(Fe\left(2\right)OG\), in this form is completely revised, updated (URL, scripts, repository) and will strengthen the knowledge base of investigators on i2OGdd biochemistry and function. \(Fe\left(2\right)OG\), utilizes the superior predictive propensity of HMM-profiles of laboratory validated i2OGdd members to predict probable active site geometries in user-defined protein sequences. \(Fe\left(2\right)OG\), also provides researchers with a pre-compiled list of analyzed and searchable i2OGdd-like sequences, many of which may be clinically relevant. \(Fe(2)OG\), is freely available (http://204.152.217.16/Fe2OG.html) and supersedes all previous versions, i.e., H2OGpred, DB2OG.
Introduction
Dioxygenases, unlike monooxygenases are oxidoreductases which can incorporate both atoms of molecular oxygen, one each into a substrate and co-substrate (Fig. 1). These enzymes are classified on the basis of a metal co-factor (iron, cobalt, nickel, copper) and the presence of a haem-prosthetic group. Iron-based dioxygenases are mononuclear (extradiol catechol, \(EC \mathrm{1.13.11},x\); 2-oxoglutarate-dependent, \(EC \mathrm{1.14.11},x\)), possess Rieske clusters (naphthalene 1,2-dioxygenase, \(EC \mathrm{1.14.12,12}\)) and may utilize haem (indoleamine 2,3-dioxygenase, \(EC \mathrm{1.13.11,52}\); tryptophan 2,3-dioxygenase, \(EC \mathrm{1.13.11,11}\)) [1,2,3,4,5]. Extradiol and 2OG-dependent dioxygenases, possess a triad of catalytically competent \(H{X}_{n}[DE]{X}_{n}H\) residues and comprise one face of a distorted octahedral co-ordination sphere with iron(II) [3,4,5,6]. The other face is formed by three displaceable water molecules, a factor that contributes significantly to the architecture of the active site [3,4,5,6]. The subset that comprises non-haem iron(II)- and 2OG-dependent dioxygenases (i2OGdd) is characterized by \(HX\left[DE\right]{X}_{n}H \left(n\in [\mathrm{50,120}]\right)\), in a jelly-roll or double stranded beta-helical fold [4,5,6]. The i2OGdd-superfamily \((EC \mathrm{1.14.11}.x)\), is characterized by the dependence of catalysis on 2-oxoglutarate and occurs by the bi-dentate co-ordination of \(C1\) (carboxylic acid) and \(C2\) (2-oxo/alpha-keto) with mononuclear ferrous iron (Fig. 1). The last dative bond is the cognate substrate after the solitary water molecule has been displaced. The reaction chemistry involves a progressive increase in the oxidation state of iron (\(II\to IV\)) and is followed by proton abstraction and the formation of a substrate-radical. This in turn leads to catalytic conversion of the substrate by the incorporation of a single oxygen atom (Fig. 1). The transformation is therefore, a de facto oxidative hydroxylation although, this is accompanied in most cases by a concomitant desaturation, cyclization, stereo-isomerization and sulfate cleavage (Fig. 1) [7,8,9]. The second oxygen atom is incorporated into 2OG with the release of succinic acid and \({CO}_{2}\) (Fig. 1). Members of the i2OGdd-superfamily participate in cell signaling under hypoxic conditions, DNA repair, stress response mechanisms, metabolism (lipids, growth factors) and biodegradation of herbicides (Fig. 1) [5, 10,11,12,13,14,15,16,17].
The work presented revises, updates and integrates the functionality of two servers, i.e., \(Fe(2)OG\equiv \left\{H2OGpred, DB2OG\right\}\) [18, 19]. \(Fe(2)OG\), can be used by researchers as a single-point web resource to screen protein sequence(s) for potential i2OGdd-activity and shortlist putative i2OGdd members from the available pre-compiled sequence repository. The latter is searchable on the basis of taxonomy, cellular compartment and HMM-profiles of the sequences. A novel feature of \(Fe\left(2\right)OG\) is the inclusion of clinically relevant non-haem iron(II)- and 2OG-dependent dioxygenases. This includes links and preliminary analyses to several human putative i2OGdd members. The coding and interfacing is done using in-house developed PERL scripts.
Main text
Rationale for incorporating empirical data into a profile-based search application
Non-haem iron(II)- and 2OG-dependent-dioxygenases are characterized by variable reaction chemistry and a broad spectrum of substrates. The reverse mapping of substrate descriptors to the active site of known enzymes is well documented and can be utilized to repurpose pharmacological agents. Several theoretically sound statistical tools such as multi-class support vector machines (SVMs), artificial neural networks (ANNs), and hidden markov models (HMMs) have been utilized to garner insights into the active site geometry of an enzyme in the presence of a pharmacophore [18,19,20,21,22]. Although HMMs, as a predictive modality are non-committal, this can be rectified by mathematical filters. The transformed output can then be utilized by clustering algorithms and ANNs to generate unambiguous predictors [21, 23, 24]. In fact, a rigorously derived integrated HMM-ANN algorithm has been presented and used to characterize sequences which are few and closely related such as those from an enzyme family or sub family [23, 24].
Mathematical basis for the algorithms deployed by \(Fe\left(2\right)OG\)
Whilst, a detailed description of the computational pipeline deployed and its relevance has already been published, the mathematical basis for these has not been addressed [18, 19]. Briefly, HMM-profiles of catalytically relevant clusters and laboratory validated enzymes of the i2OGdd-superfamily \(\left({a}_{i}\in A\subseteq \mathcal{H}\right)\) are utilized to score regions of an amino acid sequence. The empirical data that is considered is the presence of one or more 3D-structures, kinetic and mutagenesis data and mRNA expression levels [18]. A suitable mathematical representation is as under:
Theorem:
A unique set of HMM profiles \(\left(A,B\subseteq \mathcal{H}\right)\) can exist iff there is at least one unique sub-profile.
Proof:
\(Fe\left(2\right)OG,\) then, is an implementation of a particular instance of the combined HMM of sequences and available structures \(\left(A={a}_{i}|1\le i\le 28, 2\le \#{a}_{i}\le 4\right)\) [19]; URL-http://janelia.org. The lower limit of number of the sequences in each profile \(\left(\mathrm{min}\left(\#{a}_{i}\right)\right)\) Eq. (1) is implied by definition. The upper limit, however, is estimated as a proportion of the total number of sequences,
Description and utilization of \(Fe(2)OG\)
i) \(Fe(2)OG\), a predictor of the catalytic spectrum of an unknown or single function enzyme
The algorithm and code that \(Fe\left(2\right)OG\) utilizes to predict the dominant profile, in a user-defined sequence(s), has been described in detail [18]. Briefly, i2OGdd enzymes \((n>220)\) with available empirical data (structure, kinetic, mRNA expression) are clustered on the basis of the substrates catalyzed and/or the reaction chemistry (Fig. 1) [18]. The enzymes present in each ‘functional’-group, \(\left(2\le \#{a}_{i}\le 4\right)\) Eqs. (1) and (2) are then aligned and assigned a HMM-profile (Figs. 1 and 2) [18]. A database of these HMM-profiles is used to probe the catalytic spectrum of a user-defined sequence as per the stringency specified. Unlike \(H2OGpred\), \(Fe\left(2\right)OG,\) compares a query sequence(s) with all, rather than isolated HMM-profiles (Fig. 2) [18]. The rationale for this alteration is that since the catalytic profile of an unknown sequence(s) is debatable, a generic analysis rather than a specific one is a better indicator of i2OGdd-like activity. Furthermore, sequences with known function can also be investigated for other reaction chemistries. Clearly, in both cases the analysis with individual profiles is superfluous and may be omitted (Table 1A). The tabulated list of relevant cognate substrates, for each profile is also available and may be used as a reference (Figs. 1 and 2). In addition, to the overt directives of use, users can also sample the functionality of \(Fe\left(2\right)OG\) by clicking the “Examples” button \((Step P1)\) (Fig. 2). This loads bonafide i2OGdd sequences into the text area which can be analyzed in accordance with the steps that are outlined subsequently. These include choice of threshold parameter \((Evalue,Bit score)\) and assignment of a suitable numerical value \((Steps P2,P3)\) (Fig. 2). The output comprises a tabular summary of suitably matched profiles with detailed statistics and exhaustive pair-wise alignments of all supra-threshold matches (Fig. 2). Since, \(Fe\left(2\right)OG\) has dual functionality, the user can submit this independently \(\left(Steps P1-P3\right.\to Submit)\) (Fig. 2).
ii) \(Fe\left(2\right)OG\), a repository of i2OGdd-like sequences
The second component of \(Fe\left(2\right)OG\) is a flat-file database. This comprises a pre-compiled and updated list of i2OGdd-like sequences \(({n}_{AB}=4496)\) (Fig. 2). This is accomplished by constructing a generic-HMM after combining representative \(\left(n\sim 80\right)\) i2OGdd enzymes from each ‘functional’-group. This is then used to query UniprotKB for probable matches \(\left({n}_{AB}\right)\) [19]. The downloaded sequences are analyzed and assigned a dominant cellular compartment \(\left({n}_{A}\right)\) [19]. Sequences, which are not amenable to these preliminary investigations are annotated as such \(\left({n}_{B}\right)\). Users can download updated lists of these sequences \(\left({n}_{A}=3429,{n}_{B}=1067\right)\) (Fig. 2). This is facilitated by arranging the sequences as a matrix of compartments \((p)\) and taxonomy \((q) \left(AB=\left\{{y}_{pqr}\in \left({ab}_{pq}\right);p=10,q=7,r\in {\mathbb{N}}\right\}\right). \, Fe\left(2\right)OG\), also uses the logical operators (\(\left\{AND,OR\right\}\)) to formulate an advanced HMM profile-based query to partition the sequences (\(Step S1\); Fig. 2) [19]. Another modification introduced in \(Fe\left(2\right)OG\) is the omission of the “All sequences”-option (\(Step S1\)) (Fig. 2). The rationale for this amendment, is that users may require sequences specific to one or more HMM-profiles (Figs. 1 and 2). Since, each profile is based on a specific reaction chemistry, users will also possess, a priori, a definitive list of probable ligands to characterize the kinetics of their search result with (Fig. 1, Table 1A). Furthermore, the entire database \(\left({n}_{A}\right)\) is accessible with the “OR” and “Include these profile(s)”, if the user so chooses (\(Steps S1,S2\)) (Fig. 2). The other fraction could not be further classified and is presented only in terms of their respective taxonomies \(\left({n}_{B}=1067\right)\). Here, too, the user can submit this independently \(\left(Steps S1,S2\right.\to Submit)\) (Fig. 2).
Comparative analysis and biomedical relevance of \(Fe\left(2\right)OG\)
Despite the similarity in algorithms and general usage, \(Fe\left(2\right)OG\), offers several new and upgraded features (Table 1). These include links to i2OGdd members which are uncharacterized and clinically relevant, whilst offering researchers a tool to extend the catalytic profiles of known enzymes. Additionally, the list of sequences with putative i2OGdd function is updated and non-redundant. The i2OGdd are amongst the largest group of non-haem dioxygenases and can arguably compete in importance with the more established cytochrome P450 (\({CYP}_{450}\)) superfamily of haem monooxygenases (Fig. 1). The differential activity of i2OGdd members in response to fluctuating concentrations of oxygen and iron also suggest a system-level function in sensing and thence regulating the uptake, utilization and release of these micronutrients [25, 26]. In fact, clinical data is available for several i2OGdd enzymes. This includes phytanoyl-CoA hydroxylase, hypoxia-inducible Proline hydroxylases, collagen modifiers (Proline- and Lysine-hydroxylases) and DNA/mRNA-demethylases (Table 1B) [27,28,29,30,31,32,33,34,35,36,37,38]. The analysis by \(Fe\left(2\right)OG\) results in a small subset \((\approx 24\%,n=17)\) of enzymes and are grouped into mitochondrial, cytosolic and extracellular fractions (Additional file 1: Text S1a). However, a larger proportion \((\approx 76\%,n=53)\) remains unclassified and merits a deeper investigation (Additional file 1: Text S1b).
Limitations
\(Fe\left(2\right)OG\), is an online web resource that is dedicated to expanding the knowledge base of non-haem iron(II)- and 2OG-dependent-dioxygenase superfamily of enzymes amongst scientists and clinicians. \(Fe\left(2\right)OG\), can predict whether an unknown protein sequence(s) possesses i2OGdd-activity. It also provides preliminary analyses (taxonomy, cellular compartment) and an analytic tool (sequence-based, logical) to shortlist enzyme candidates from a pre-compiled list of sequences. Since, newer sequences are constantly becoming available, \(Fe\left(2\right)OG\) will require constant updates to its core of HMM-profiles and the raw sequences that are queried for putative function, thereof, to remain relevant to the biomedical community. However, since this information is dependent on available empirical data, an annual update might suffice. \(Fe\left(2\right)OG\), is also not exhaustive and lacks structural-models and simulation data for its members. These short comings will be addressed in future studies.
Availability of data and materials
Data is available as supporting material with the manuscript.
Abbreviations
- Fe:
-
Iron
- 2OG:
-
2-Oxoglutarate
- HMM:
-
Hidden Markov Model
- i2OGdd:
-
Non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenases
References
Koehntop KD, Emerson JP, Que L Jr. The 2-His-1-carboxylate facial triad: a versatile platform for dioxygen activation by mononuclear non-heme iron(II) enzymes. J Biol Inorg Chem. 2005;10(2):87–93.
Barry SM, Challis GL. Mechanism and catalytic diversity of rieske non-heme iron-dependent oxygenases. ACS Catal. 2013. https://doi.org/10.1021/cs400087p.
Lipscomb JD. Mechanism of extradiol aromatic ring-cleaving dioxygenases. Curr Opin Struct Biol. 2008;18(6):644–9.
Islam MS, Leissing TM, Chowdhury R, Hopkinson RJ, Schofield CJ. 2-oxoglutarate-dependent oxygenases. Annu Rev Biochem. 2018;87:585–620.
McDonough MA, Loenarz C, Chowdhury R, Clifton IJ, Schofield CJ. Structural studies on human 2-oxoglutarate dependent oxygenases. Curr Opin Struct Biol. 2010;20(6):659–72.
Martinez S, Hausinger RP. Catalytic mechanisms of Fe(II)- and 2-oxoglutarate-dependent oxygenases. J Biol Chem. 2015;290(34):20702–11.
Clifton IJ, Doan LX, Sleeman MC, Topf M, Suzuki H, Wilmouth RC, et al. Crystal structure of carbapenem synthase (CarC). J Biol Chem. 2003;278(23):20843–50.
Eichhorn E, van der Ploeg JR, Kertesz MA, Leisinger T. Characterization of alpha-ketoglutarate-dependent taurine dioxygenase from Escherichia coli. J Biol Chem. 1997;272(37):23031–6.
Janc JW, Egan LA, Townsend CA. Purification and characterization of clavaminate synthase from Streptomyces antibioticus. A multifunctional enzyme of clavam biosynthesis. J Biol Chem. 1995;270(10):5399–404.
Fukumori F, Hausinger RP. Purification and characterization of 2,4-dichlorophenoxyacetate/alpha-ketoglutarate dioxygenase. J Biol Chem. 1993;268(32):24311–7.
Holland PJ, Hollis T. Structural and mutational analysis of Escherichia coli AlkB provides insight into substrate specificity and DNA damage searching. PLoS ONE. 2010;5(1):e8680.
Kershaw NJ, Mukherji M, MacKinnon CH, Claridge TD, Odell B, Wierzbicki AS, et al. Studies on phytanoyl-CoA 2-hydroxylase and synthesis of phytanoyl-coenzyme A. Bioorg Med Chem Lett. 2001;11(18):2545–8.
Koivunen P, Hirsila M, Gunzler V, Kivirikko KI, Myllyharju J. Catalytic properties of the asparaginyl hydroxylase (FIH) in the oxygen sensing pathway are distinct from those of its prolyl 4-hydroxylases. J Biol Chem. 2004;279(11):9899–904.
Bruick RK, McKnight SL. A conserved family of prolyl-4-hydroxylases that modify HIF. Science. 2001;294(5545):1337–40.
Lester DR, Phillips A, Hedden P, Andersson I. Purification and kinetic studies of recombinant gibberellin dioxygenases. BMC Plant Biol. 2005;5:19.
Reuter K, Pittelkow M, Bursy J, Heine A, Craan T, Bremer E. Synthesis of 5-hydroxyectoine from ectoine: crystal structure of the non-heme iron(II) and 2-oxoglutarate-dependent dioxygenase EctD. PLoS ONE. 2010;5(5):e10647.
Wehner KA, Schutz S, Sarnow P. OGFOD1, a novel modulator of eukaryotic translation initiation factor 2alpha phosphorylation and the cellular response to stress. Mol Cell Biol. 2010;30(8):2006–16.
Kundu S. Distribution and prediction of catalytic domains in 2-oxoglutarate dependent dioxygenases. BMC Res Notes 5. 2012. https://doi.org/10.1186/1756-0500-5-410.
Kundu S. Unity in diversity, a systems approach to regulating plant cell physiology by 2-oxoglutarate-dependent dioxygenases. Front Plant Sci. 2015;6:98.
Khater S, Mohanty D. In silico identification of AMPylating enzymes and study of their divergent evolution. Sci Rep. 2015;5:10804.
Kundu S, Sharma R. In silico identification and taxonomic distribution of plant class C GH9 endoglucanases. Front Plant Sci. 2016;7:1185.
Kundu S. Insights into the mechanism(s) of digestion of crystalline cellulose by plant class C GH9 endoglucanases. J Mol Model. 2019;25(8):240.
Kundu S. Mathematical basis of predicting dominant function in protein sequences by a generic HMM-ANN algorithm. Acta Biotheor. 2018;66(2):135–48.
Kundu S. Mathematical basis of improved protein subfamily classification by a HMM-based sequence filter. Math Biosci. 2017;293:75–80.
Wilson JW, Shakir D, Batie M, Frost M, Rocha S. Oxygen-sensing mechanisms in cells. FEBS J. 2020. https://doi.org/10.1111/febs.15374.
Kundu S. Co-operative intermolecular kinetics of 2-oxoglutarate dependent dioxygenases may be essential for system-level regulation of plant cell physiology. Front Plant Sci. 2015;6:489.
Mukherji M, Chien W, Kershaw NJ, Clifton IJ, Schofield CJ, Wierzbicki AS, et al. Structure-function analysis of phytanoyl-CoA 2-hydroxylase mutations causing Refsum’s disease. Hum Mol Genet. 2001;10(18):1971–82.
Hirsila M, Koivunen P, Gunzler V, Kivirikko KI, Myllyharju J. Characterization of the human prolyl 4-hydroxylases that modify the hypoxia-inducible factor. J Biol Chem. 2003;278(33):30772–80.
Patel N, Khan AO, Mansour A, Mohamed JY, Al-Assiri A, Haddad R, et al. Mutations in ASPH cause facial dysmorphism, lens dislocation, anterior-segment abnormalities, and spontaneous filtering blebs, or Traboulsi syndrome. Am J Hum Genet. 2014;94(5):755–9.
Pfeffer I, Brewitz L, Krojer T, Jensen SA, Kochan GT, Kershaw NJ, et al. Aspartate/asparagine-beta-hydroxylase crystal structures reveal an unexpected epidermal growth factor-like domain substrate disulfide pattern. Nat Commun. 2019;10(1):4910.
Guo H, Tong P, Liu Y, Xia L, Wang T, Tian Q, et al. Mutations of P4HA2 encoding prolyl 4-hydroxylase 2 are associated with nonsyndromic high myopia. Genet Med. 2015;17(4):300–6.
Kim JH, Lee SM, Lee JH, Chun S, Kang BH, Kwak S, et al. OGFOD1 is required for breast cancer cell proliferation and is associated with poor prognosis in breast cancer. Oncotarget. 2015;6(23):19528–41.
Pirskanen A, Kaimio AM, Myllyla R, Kivirikko KI. Site-directed mutagenesis of human lysyl hydroxylase expressed in insect cells. Identification of histidine residues and an aspartic acid residue critical for catalytic activity. J Biol Chem. 1996;271(16):9398–402.
Abdalla EM, Rohrbach M, Burer C, Kraenzlin M, El-Tayeby H, Elbelbesy MF, et al. Kyphoscoliotic type of Ehlers-Danlos Syndrome (EDS VIA) in six Egyptian patients presenting with a homogeneous clinical phenotype. Eur J Pediatr. 2015;174(1):105–12.
Webby CJ, Wolf A, Gromak N, Dreger M, Kramer H, Kessler B, et al. Jmjd6 catalyses lysyl-hydroxylation of U2AF65, a protein associated with RNA splicing. Science. 2009;325(5936):90–3.
Chang B, Chen Y, Zhao Y, Bruick RK. JMJD6 is a histone arginine demethylase. Science. 2007;318(5849):444–7.
Gerken T, Girard CA, Tung YC, Webby CJ, Saudek V, Hewitson KS, et al. The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase. Science. 2007;318(5855):1469–72.
Daoud H, Zhang D, McMurray F, Yu A, Luco SM, Vanstone J, et al. Identification of a pathogenic FTO mutation by next-generation sequencing in a newborn with growth retardation and developmental delay. J Med Genet. 2016;53(3):200–7.
Acknowledgements
Not Applicable.
Funding
This work is funded by an early career intramural grant awarded to SK (Code A-766) by the All India Institute of Medical Sciences (AIIMS, New Delhi, INDIA).
Author information
Authors and Affiliations
Contributions
SK outlined and designed the study, designed and conceptualized the algorithm(s) and formulae for prediction, wrote the mathematical proofs, manually collated all the sequences, and their references, carried out the computational analysis, wrote all the code and the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Yes.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Text S1.
a HMM profiles of analyzed human i2OGdd-like sequences (n = 17). b Uniprot IDs of unprofiled human i2OGdd-like sequences (n = 53)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Kundu, S. Fe(2)OG: an integrated HMM profile-based web server to predict and analyze putative non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenase function in protein sequences. BMC Res Notes 14, 80 (2021). https://doi.org/10.1186/s13104-021-05477-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13104-021-05477-z