A computational protocol to evaluate the effects of protein mutants in the kinase gatekeeper position on the binding of ATP substrate analogues

Background The determination of specific kinase substrates in vivo is challenging due to the large number of protein kinases in cells, their substrate specificity overlap, and the lack of highly specific inhibitors. In the late 90s, Shokat and coworkers developed a protein engineering-based method addressing the question of identification of substrates of protein kinases. The approach was based on the mutagenesis of the gatekeeper residue within the binding site of a protein kinase to change the co-substrate specificity from ATP to ATP analogues. One of the challenges in applying this method to other kinase systems is to identify the optimal combination of mutation in the enzyme and chemical derivative such that the ATP analogue acts as substrate for the engineered, but not the native kinase enzyme. In this study, we developed a computational protocol for estimating the effect of mutations at the gatekeeper position on the accessibility of ATP analogues within the binding site of engineered kinases. Results We tested the protocol on a dataset of tyrosine and serine/threonine protein kinases from the scientific literature where Shokat’s method was applied and experimental data were available. Our protocol correctly identified gatekeeper residues as the positions to mutate within the binding site of the studied kinase enzymes. Furthermore, the approach well reproduced the experimental data available in literature. Conclusions We have presented a computational protocol that scores how different mutations at the gatekeeper position influence the accommodation of various ATP analogues within the binding site of protein kinases. We have assessed our approach on protein kinases from the scientific literature and have verified the ability of the approach to well reproduce the available experimental data and identify suitable combinations of engineered kinases and ATP analogues.

substrates of kinase enzymes is therefore of great importance for elucidating their functional role in the cell and to develop disease-specific therapies. However, the identification of specific kinase substrates is highly challenging due to the large number of protein kinases in cells, their substrate specificity overlap and the lack of absolute specificity of inhibitors [6,7].
The majority of protein kinases share a bilobal kinase domain fold, where the N-lobe is formed by five β-strands and a single α-helix and the C-lobe is predominantly α-helical [6,8]. These domains are connected by a short segment called the hinge region [9]. The C-lobe contains the activation segment that is typically composed of 20-30 residues. This lobe is composed of the activation loop that activates protein kinase when a specific residue is phosphorylated (usually a Tyr or a Thr) and the loop that is involved in substrate binding [8] (Fig. 1a). The ATP binding pocket is located in the cleft between the N-lobe and the C-lobe of the kinase domain. It contains a highly conserved Asp which has a significant role in the phosphorylation reaction catalyzed by kinase enzymes. The Asp acts as catalytic base to free up the hydroxyl oxygen of a Ser, Thr or Tyr on the protein substrate. The deprotonated residue is involved in a nucleophilic attack on the terminal (γ) phosphoryl group (PO 2− 3 ) of ATP [10]. The ATP binding site is made up of five areas, the "adenine region" which corresponds to the hinge region, the "ribose region", the "phosphates region", the "solvent accessible region", and the "buried region" [11,12] (Fig. 1). The "buried region" is a hydrophobic region located in the back of the ATP pocket and is not occupied by ATP. The size and the shape are controlled by the first amino acid of the hinge region-this amino acid act as a 'molecular gate' controlling the accessibility to the buried region. A residue with a small side chain 'opens the gate' to the buried region whereas a large side chain effectively 'closes the gate' making the buried region inaccessible. For that reason, this residue has been termed the 'gatekeeper' residue [13][14][15][16] (Fig. 1b). The gatekeeper residue is generally preceded by two hydrophobic residues and followed by an acidic residue and another hydrophobic amino acid. In 73% of human kinases a hydrophobic amino acid with a bulky side chain (Met, Phe or Leu) is observed at that position, 22% have a small residue, such as Thr or Val and the remaining 5% have one of the other amino acids [11,12,17,18].
By using isotope radiolabeled ATP (P 32 or P 33 ) as cosubstrate, the phosphorylation reaction can be monitored with high sensitivity in vitro. However, in an in vivo context this approach is not feasible due to the large number of kinases present. Therefore, Shokat and coworkers developed a protein engineering-based approach to enlarge the ATP binding pocket of a specific kinase to accommodate a chemically modified ATP as co-substrate, which would not bind to native kinase enzymes [19]. They engineered the nucleotide binding pocket of the prototypical viral proto-oncogene tyrosine protein kinase Src (v-Src) by mutating the gatekeeper residue Isoleucine at position 338 to Glycine. This point mutation enlarged the binding pocket making the buried region accessible to ATP-competitive analogues with non-polar substituents at the N6 position of the adenine base. The  [38]. Tyr belonging to the activation segment is represented as stick. b Surface representation of a kinase ligand-binding pocket. The ATP is represented as stick. The five regions belonging to the ATP binding pocket are represented in different colors with the buried region behind the ATP ATP analogue preferentially used by the engineered v-Src kinase as phosphodonor was N 6 -benzyl-adenosine-5′triphosphate (N6-(benzyl) ATP). The use of γ-phosphate radiolabeled [γ-32 P] N6-(benzyl) ATP resulted in the v-Src substrates being specifically radiolabeled and identified in the presence of other protein kinases and all other kinase substrates [13,20]. This approach allowed the identification of cofilin and calumenin as specific v-Src substrates [21]. The conservation of the ATP binding site between different protein kinases makes the approach widely applicable for identifying specific kinase substrates. The gatekeeper residue is identified by the sequence alignment of the kinase of interest with v-Src. In a similar approach, other kinases were engineered to bind specifically modified inhibitors [22][23][24][25][26][27][28]. One of the challenges in applying this method to other kinase systems is to identify the optimal combination of kinase binding pocket mutations and ATP derivatives such that the ATP analogue acts as substrate for the engineered, but not the native or other cellular kinases. The mutation should modify size and shape of the ATP binding pocket while the engineered kinases have to remain catalytically active. The ATP analogue has to bind to the engineered kinase at sufficient affinity and in a suitable geometry to accomplish its role as phosphodonor. It needs to enter the engineered binding site, provide the γ-phosphate and leave the binding site in order to allow the engineered protein to perform catalysis. An ATP analogue bound too tight or in the wrong geometry would decrease or abolish the activity of the engineered enzyme.
In this study, we developed a computational protocol that evaluates how mutations within the ATP binding site of protein kinases influence the accommodation of various ATP analogues. The protocol explores pairings of potential mutations and ligand analogues by identifying which residues within the binding pocket could be mutated to accommodate a specific ATP analogue. We tested the protocol on data for different protein kinases from the scientific literature where the Shokat's method was applied to mutate the gatekeeper position.
For each analogue, the ensemble was superposed onto the adenine moiety of the native ATP ligand within the binding pocket of the reference protein. If the distance between an atom of a protein residue and any atom of the substituent group of a ligand analogue in the ensemble is shorter than the sum of their van der Waals [32] radii, the corresponding residue is considered a potential candidate for single-point mutagenesis. If no residues were identified by this approach, the analogue was considered to act as substrate for the native target and thus not further considered. The method was implemented in Python 2.5.4 and contains functions from the OpenStructure software framework [33].
In the second step, the interaction between potential protein mutants and ligand analogues was evaluated using a protein-ligand scoring function. Amino acids at positions identified in the first step were replaced in silico to generate mutant proteins. When a residue was changed into Gly or Ala, the entire structure was relaxed by a minimization step performed using OPLS_2005 as force field in Maestro [34]. When a residue was mutated into an amino acid with a larger side chain, such as Met or Thr, a rotamer scan was performed to identify the most probable rotamer state using Rapid Torsion Scan tool available in Maestro. The kinase mutant-ligand conformer pairs were evaluated and ranked by the protein-ligand scoring function GlideScore [35]. The kinase mutant-ligand conformer structure with the lowest GlideScore was selected and the corresponding Glide energy was computed. The Glide energy is the sum of the Coulomb and van der Waals terms and represents an estimate for the protein-ligand interaction energy. Typically, predicted energies of interaction (Glide energies) correlate better with protein-ligand binding affinities or experimental IC 50 values than GlideScore [36]. We arbitrarily limited all positive energies to zero as we were only interested in identifying favorable interactions. In the case of engineered kinases and ATP analogue pairs, only the adenine base and the substituent group were scored by GlideScore.

Kinase data set
A set of 7 protein kinases and 15 mutants for which experimental data were available in literature was used as test set (Table 1). Unless stated otherwise, in silico mutagenesis was performed using Maestro and the structure was prepared with the Protein Preparation Wizard tool [34]. Residues are numbered as as in PDB structures. The crystal structure of JNK bound to ANP (an ATP analogue with an amino group in place of the oxygen between the β and γ phosphates that mimics the natural cofactor) and Mg 2+ was solved in 1998 (Homo sapiens, PDB:1JNK, resolution 2.30 Å, [37]). The crystal structure was prepared for molecular modelling by adding hydrogen atoms, optimizing the hydrogen bonding network, the orientation of the amide groups of Asn and Gln, and the orientation and protonation state of the imidazole ring of His. This optimization allowed for improving interactions between charged groups as well as hydrogen bonds within the structure. The optimization was performed at pH of 7. Finally, a minimization Fig. 2 Workflow of the computational protocol. The protocol is organized in two parts, the first part identifies residues to mutate and the 2nd part evaluates mutant-analogue interactions. The specific inputs are depicted in circles, steps of the workflow are shown in rectangles and outputs are depicted in rectangles with dashed lines. In case all analogue conformations are scored as having favorable interactions with the wild type, the analogue is considered to act as substrate for the wild-type protein and thus not further considered step was applied to relax the entire structure. OPLS_2005 was used as force field and the termination criterion was based on the rmsd of the heavy atoms relative to their initial location (rmsd less than or equal to 0.30 Å). The M108GL168A mutant was obtained by in silico replacing Met108 to Gly and Leu168 to Ala and the structure was prepared as described above.
The kinase domain of v-Src differs from that of the cellular protein kinase c-Src at position 338 within the binding pocket (Ile338 in v-Src and Thr338 in c-Src). The crystal structure of c-Src in complex with ANP has been solved (Homo sapiens, PDB:2SRC, resolution 1.50 Å, [38]). To obtain a model of v-Src bound to its natural cofactor, we substituted in silico Thr338 into Ile. The v-SrcI338A and v-SrcI338G mutants were obtained in the same way.
The complex of Fyn bound to the PP1 conformer with the best GlideScore was minimized in vacuo without constraints. We used the Polak-Ribier Conjugate Gradient (PRCG) as method for 2500 steps [45]. The same procedure was used for the complexes of FynT339A, Abl and AblT334A. The procedure was performed using MacroModel. Tyrosine and serine/threonine kinases IC 50

Data comparison
All plots reported in this paper were made using the Matplotlib [46] and NumPy packages [47]. In the plot of JNKM108GL168A, the interaction energies were scaled between 0 and 100 to fit the same range of observed phosphorylation values (expressed as percentage of phosphorylation). The lowest Glide energy was set to 0 and the highest to 100. The plots of v-Src, v-SrcI338A and v-SrcI338G in complex with ATP and N6-(benzyl) ATP were created by comparing the experimental catalytic efficiency (kcat/Km) and the predicted interaction energies (Glide energies). To correlate experimental and predicted data, we computed the negative logarithm of the kcat/Km ratio. The plots of tyrosine kinases and serine/ threonine kinases in complex with PP1 were made measuring the linear correlation between the predicted interaction energies and the experimental measured pIC 50 (−log(IC 50 )). For each family, the Pearson correlation coefficient was computed.

Results and discussion
The gatekeeper position in protein kinases controls the accessibility to a buried region at the end of the ATP binding pocket. Shokat has demonstrated that by mutating the gatekeeper residue, the size and shape of the ATP binding site can be modified such that the engineered kinases can use specific chemically modified ATP molecules as co-substrates. The gatekeeper residues of the kinases in our test set equivalent to position Ile338 in v-Src are shown in Fig. 4. Kinases with large gatekeeper residues, such as Ile or Met, do not allow for binding of ligand analogues with bulky side chains (e.g. v-Src or JNK) whereas those with smaller gatekeeper residues, e.g. Thr, can accommodate analogues within the binding pocket (for instance Fyn or Abl). We tested the performance of our computational protocol on a data set containing 7 wild-type protein kinases and 15 mutants ( Table 1). The ATP-competitive ligands used in the test set are N6-(substituent) ATPs with bulky hydrophobic groups at the N6 position of the adenine ring and the pyrazolopyrimidine PP1 (Fig. 3). The pyrazolopyrimidine core of PP1 mimics the adenine ring of ATP in binding within the nucleotide pocket [39]. The proteins belonging to the data set are from three independent experimental studies where Shokat's method was applied and tested. For JNK, the ability of the ATPcompetitive ligands to bind kinase mutants was tested by measuring their ability to inhibit the phosphorylation of a given substrate in presence of ATP (% substrate phosphorylation) [26]. For v-Src, the kinetic efficiency (kcat/Km) was used to measure the preference of protein kinases and/or mutants for different co-substrates [20]. For kinases belonging to tyrosine and serine/threonine families, the potency of PP1 to inhibit protein kinases and/or mutants (IC 50 ) was measured [48]. We applied our computational approach to identify residues to mutate within the ATP binding pocket of these protein kinases, and the predicted protein-ligand interaction energies (Glide energies) were then compared to the published experimental data.

JNK and N6-(substituted) ATPs
Habelhah and coworkers modified the JNK ATP binding site so that it binds N6-(substituted) ATPs that cannot be accommodated by the wild-type binding pocket. The designed JNK mutant-ATP analogue pair allowed for the identification of novel JNK substrates [26]. To determine the ATP analogue with the highest affinity for the engineered JNK, they compared four N6-(substituent) ATP analogues. Their efficiency as phosphodonor was tested by measuring their ability to prevent phosphorylation of substrates by ATP when they are added in excess with respect to ATP. For wild-type JNK and the ATP analogues the percentage of substrate phosphorylation ranged from 99 to 93%, showing the inability of the wildtype kinase to accommodate any of the four ATP analogues. On the other hand, the JNKM108GL168A mutant was able to accommodate N6-(substituent) ATPs and N6-(2-phenythyl) was the ligand with the highest affinity to the mutant (the percentage of substrate phosphorylation is 8%) ( Table 1).
We applied the computational protocol to JNK and the four N6-(substituent) ATPs. For the wild-type we could not identify a low energy binding conformation without steric hindrance, indicating that none of the ATP analogues can fit into the wild-type JNK ligand binding Fig. 4 Sequence alignment of the N-lobe and hinge region of the seven wild-type protein kinases belonging to our data set. The alignment is build using the T-Coffee web server [52]. Residues are colored by percentage of identity. Secondary structure elements are represented as follows: β strands as arrows, α helixes as cylinders, and coils as lines pocket ( Table 1). The computational protocol identified two residues within the JNK binding site as potential candidates for double mutagenesis in order to enlarge the binding pocket, the gatekeeper Met108 and Leu168. We in silico replaced them with Gly and Ala, respectively, and evaluated the interaction of the engineered JNK with each ATP analogue. The complex of JNKM108GL168A and N6-(2-phenythyl) ATP shows the lowest Glide energy, implying that N6-(2-phenythyl) ATP is the substrate with the best ability to bind the engineered JNK in a constructive manner (Table 1). Employing our computational protocol, in first instance we reproduce the experimental findings that identify Met108 and Leu168 as amino acids to mutate within the JNK binding pocket in order to enlarge it. Furthermore, we correctly reproduce the relative ranking of the four ATP analogues as substrates for the engineered JNK classifying N6-(2-phenythyl) ATP as the best substrate (Fig. 5).

v-Src and N6-(benzyl) ATP
Shokat and coworkers engineered v-Src to produce a kinase mutant that preferentially used N6-(benzyl) ATP as co-substrate instead of the natural nucleotide (ATP) [20]. They performed kinetic measurements revealing that wild-type v-Src had a substrate preference for ATP over the ATP analogue (1.6*10 5 min −1 M −1 vs 0) and the I338G mutant preferentially used N6-(benzyl) ATP as cosubstrate over the natural ATP (the kcat/Km ratio is 4-1).
We used our approach and identified the gatekeeper Ile338 as being a good candidate for point mutation to enlarge the v-Src ligand-binding site, in agreement with Shokat's experimental findings. We scored mutant models I338A and I338G in complex with N6-(benzyl) ATP and both had negative energy of interaction with the ATP analogue implying their ability to accommodate it within their engineered binding pocket. The predicted interaction energies well reproduced the trend of the experimental kinetic constants (Table 1). Wild-type v-Src, v-SrcI338A and v-SrcI338G are able to interact with ATP with almost equal interaction energies (Fig. 6a). Wild-type v-Src cannot accommodate N6-(benzyl) ATP because of the steric overlaps between the side chain of Ile338 and the benzyl group attached at the N6 position of the ATP analogue. V-SrcI338A and v-SrcI338G have enlarged binding pockets that accommodate the ATP analogue in a constructive interaction. V-SrcI338G has the best predicted energy of interaction and is confirmed as the best binder to the ATP analogue (Fig. 6b).

Tyrosine and serine/threonine protein kinases and PP1
A study conducted by Liu and coworkers analyzed how the gatekeeper residue controls the ability of PP1 to inhibit protein kinases [48]. The gatekeeper amino acid corresponds to Ile338 in v-Src, Thr339 in Fyn, Thr334 in Abl, Phe89 in CamKII, Phe80 inCdk2, and Thr106 in P38. The study showed that residues equal to or larger than Ile, such as Phe and Met, make PP1 a less potent inhibitor (IC 50 ≥ 1 μM) whereas residues smaller than Ile, such as Ser, Thr, Val, Cys and especially Ala and Gly increase the potency of PP1 (IC 50 values ranging from 0.05 to 0.82 μM).
We mutated the gatekeeper residues to obtain structural models of the engineered kinases and analyzed the correlation between predicted energies of interaction of wild-type and engineered kinases with PP1 and inhibition data (IC 50 ). For both tyrosine kinase and serine/threonine kinase families the predicted interaction energies reproduced the trend of the inhibitor potency (Table 1). A positive correlation between the experimental −log(IC 50 ) (pIC 50 ) and the predicted interaction energies was found for both families, with a Pearson correlation of 0.85 for the Src tyrosine kinases and of 0.75 for the serine/threonine kinases (Fig. 7). Our computational protocol discriminated well between protein variants that are inhibited by PP1 (negative interaction energies, e.g. v-SrcI338S or CamKIIF89G) and proteins that are not inhibited (positive interaction energies, e.g. v-Src, v-SrcI338F or Cdk2). In the specific case of v-Src, the protocol is able to reproduce the ranking of the mutants and identify which engineered kinases are the best binders to PP1, with v-SrcI338A and v-SrcI338G being identified as the best in agreement with IC 50 values (Table 1). Despite the overall good correlation between inhibition data and predicted interaction energies, in some cases GlideScore We explored to which extend energy minimization of the complex models before scoring would lead to better correlation between experimental and predicted data. For both Fyn and Abl and the respective mutants we considered the protein-PP1 complexes with the best GlideScore and minimize them without constraints. Although the introduction of a minimization step results in lower predicted protein-inhibitor interaction energies (Table 2), GlideScore was not capable of differentiating relative affinity between generally strong protein-inhibitor interactions. The use of scoring functions more sensitive to the subtle changes in protein-ligand interactions, or scoring functions tailored to specific binding site properties [49] might help to overcome the inability of GlideScore in discriminating relative binding affinity for good binders.
The main goal of this study is to identify, which binding-site residues of the target kinase could be mutated to accommodate a specific ATP analogue as co-substrate without interfere with the catalytic activity of the kinase protein. To reach this goal, we used a protein structure derived by X-ray crystallography in complex with   Table 1 the natural ATP substrate as starting point. In order to be able to act as co-substrate in catalysis, a ligand was assumed to be able to bind in place of the natural substrate in a low-energy conformation. We therefore modelled each modified ATP with adenine, ribose and phosphates geometry identical to the native ATP within the kinase binding site, and sampled the conformational ensemble of substituents for low energy conformations which could be accommodated in the binding site. Our computational approach reproduces the experimental data available in literature. The method is able to discriminate between residues that have to be mutated into smaller ones to allow the accommodation of ligand analogues, (e.g. Ile338 in v-Src) and residues that instead allow for the binding of specific analogues within the wild type enzyme (e.g. Thr339 of Fyn).
Shokat and coworkers tested 12 N6-(substituent) ATPs with 7 v-Src mutants in order to identify the optimal combination of a mutation within the v-Src ligandbinding pocket and a chemical derivative of ATP to use for identifying the specific v-Src substrates [19,20], and identified N6-(benzyl) ATP as suitable substrate for an engineered v-Src with an enlarged binding pocket, v-SrcI338G. Their approach was based on the 'bumpand-hole' model [50,51]. The gatekeeper residue was mutated into a small amino acid generating a 'hole' within the ligand-binding site that can accept ligands with bulky substituent groups, 'bumps' . The method was based on exploring shape complementarity between the enlarged kinase binding pocket and the ATP derivative.
The computational protocol we developed in this work can help to rationalize the experimental procedure to identify the substrates of a specific kinase: It aims to prescreen a large number of computationally modelled mutant-analogue complexes, in order to reduce the number of pairs to test in vitro and/or in vivo. Furthermore, in our procedure the gatekeeper position could be replaced into each of the other 19 amino acids. This would allow identifying new residues for mutation based on shape complementarity as well as specific protein-ligand interactions between side chains of mutated residues and substituent groups of ATP analogues.

Conclusions
We developed a computational protocol for evaluating how mutations at the gatekeeper position influence the accessibility of ATP-competitive ligands within the binding site of kinase mutants. Shokat and coworkers have experimentally identified the gatekeeper position as suitable for engineering kinases with modified cosubstrate specificity. Our computational protocol allows further exploration of this approach via two routes. The first route is able to provide a relative rank of various ATP analogues for a given gatekeeper residue mutation. The second route provides a way to evaluate for given ligand analogue, which mutations at the gatekeeper residue position would be compatible. The computational screen of a large ensemble of potential mutantanalogue pairs can reduce the number of experimental essays to perform resulting in a significant reduction of the time and the cost of the whole experiment. Besides protein-ligand shape complementarity, our computational protocol allows the evaluation of different types of interactions between an engineered kinase and an ATP derivative. This will allow exploring gatekeeper mutations exhibiting specific polar interactions with the ATP analog, which have not yet been explored in the literature.
Authors' contributions VR designed and developed the computational protocol, acquired and analyzed the computational data and was a major contributor in writing the manuscript. TdB and TS helped with interpreting computational data, provided guidance relative to the theoretical aspects of designing the computational protocol as well as revisions of the paper. All authors read and approved the final manuscript. 1 Biozentrum, University of Basel, Basel, Switzerland. 2 SIB Swiss Institute of Bioinformatics, Basel, Switzerland.