In silico work flow for scaffold hopping in Leishmania

Background Leishmaniasis,a broad spectrum of diseases caused by several sister species of protozoa belonging to family trypanosomatidae and genus leishmania , generally affects poorer sections of the populace in third world countries. With the emergence of strains resistant to traditional therapies and the high cost of second line drugs which generally have severe side effects, it becomes imperative to continue the search for alternative drugs to combat the disease. In this work, the leishmanial genomes and the human genome have been compared to identify proteins unique to the parasite and whose structures (or those of close homologues) are available in the Protein Data Bank. Subsequent to the prioritization of these proteins (based on their essentiality, virulence factor etc.), inhibitors have been identified for a subset of these prospective drug targets by means of an exhaustive literature survey. A set of three dimensional protein-ligand complexes have been assembled from the list of leishmanial drug targets by culling structures from the Protein Data Bank or by means of template based homology modeling followed by ligand docking with the GOLD software. Based on these complexes several structure based pharmacophores have been designed and used to search for alternative inhibitors in the ZINC database. Result This process led to a list of prospective compounds which could serve as potential antileishmanials. These small molecules were also used to search the Drug Bank to identify prospective lead compounds already in use as approved drugs. Interestingly, paromomycin which is currently being used as an antileishmanial drug spontaneously appeared in the list, probably giving added confidence to the ‘scaffold hopping’ computational procedures adopted in this work. Conclusions The report thus provides the basis to experimentally verify several lead compounds for their predicted antileishmanial activity and includes several useful data bases of prospective drug targets in leishmania, their inhibitors and protein – inhibitor three dimensional complexes. Electronic supplementary material The online version of this article (doi:10.1186/1756-0500-7-802) contains supplementary material, which is available to authorized users.


Background
Leishmaniasis, a broad spectrum of diseases, is caused by more than 20 sister species of protozoa belonging to the family Trypanosomatidae and genus Leishmania. These diseases are generally classified into three forms: visceral (VL), cutaneous (CL) and mucocutaneous (MCL), of which VL is lethal if left untreated, whereas CL, MCL generally self cure, though with the possibility of leaving permanent scars on the patient. The Indian subcontinent along with Sudan and Brazil account for the overwhelming majority of cases in VL, while the incidence of CL predominantly occurs in Afghanistan, Algeria, Brazil, Iran, Peru, Saudi Arabia and Syria [1]. Overall, about 10 million people are affected worldwide. The vector for this disease is the phlebotominae sand fly, which injects the parasites into the host in the course of a blood meal. The parasites thus exist in two forms: as mobile, flagellated promastigotes in the gut of the sandfly and non-motile, non-flagellated amastigotes which multiply within the phagolysosomal compartment of the macrophage in the mammalian host [2].
The first line of defense against the parasites traditionally continues to be generic pentavalent antimonials (sodium stibogluconate), especially in those regions where resistant strains have yet to appear. With the emergence of strains resistant to antimonials (especially in Bihar state of India) [3][4][5] second line drugs such as amphotericin-B (along with its liposomal formulations) and miltefosine are being extensively used [6]. However, both these drugs are more expensive than antimonials, toxic with reportedly severe side effects. Pentamidine and paromomycin are other drugs currently in use though their ready availability in endemic areas appears to be limited [7]. It is thus clear that there is an urgent need to search for and identify therapeutic alternatives to combat the disease.
With the availability of full genome sequences, search for drug targets in pathogenic organisms have been greatly facilitated. Comparative genomics allows the identification of genes unique to an organism, determination of parasitic genes absent in human and the evolutionary conservation of genes, probably reflecting upon their essentiality [8]. Gene conservation across pathogenic species also gives the added advantage that a single broad based antiparasitic targeting a conserved protein, could be used as a drug for several ailments. The genomes of five leishmanial species L. major, L. infantum, L. donovani , L. mexicana and L. braziliensis have been sequenced; with the first three consisting 36 chromosomes each, while L. braziliensis contains only 35. Notably, L. braziliensis has been assigned to a different subgenus Leishmania (Viannia) sp. and is thus somewhat distantly related to the others, which belong to the subgenus Leishmania (Leishmania) sp.. This reduction in chromosome number in L. braziliensis is due to a fusion event joining chromosome 20 and 34 (as numbered in L. major). Likewise, L. mexicana is two chromosomes less due to two fusions between four chromosomes (chromosome 8 and 29; chromosome 20 and 36). These genomes have approximately 8300 protein coding regions of which only about 40% can be ascribed a putative function [9][10][11]. In addition, the genomes of T. brucei (11 chromosomes) and T. cruzi are also available. Generally , the genomes of kinetoplastidae exhibit a high degree of synteny (conservation of gene order) in the organization of their genomes [12]. Comparison between the genomes of T. brucei, T. cruzi and L. major revealed a conserved core of approximately 6200 trypanosomatid genes and about 1000 ORFs [11] were notable for their presence in the genome of L. major alone. Further, upon comparing the genomes of leishmanial species, 5, 26 and 47 genes were identified to be exclusively and specifically present in the genomes of L. major , L. infantum and L. braziliensis respectively [13].
Leishmanial genomes consist of several novel metabolic pathways whose enzymes could serve as potential drug targets. Some of the distinctive features of these genomes include the presence of atypical protein kinases lacking the SH2, SH3, FN-III and immunoglobulin like domains which occur most frequently in humans [14,15]. The cellular surface of leishmania consists of several unique glycoproteins which are essential for immune evasion and hostparasite interaction. The most abundant of these glycoproteins are attached to the surface of the plasma membrane via GPI (glucosylphosphatidyl inositol) anchors, which are essential for parasitic survival. Other novel pathways involve trypanothione metabolism, essential for cell growth and differentiation, which is replaced by glutathione in humans. The first enzyme in trypanothione synthesis is the enzyme ornithine decarboxylase targeted by the drug diflouromethyl ornithine, prescribed for human sleeping sickness. Enzymes of the glycolytic pathway, ergosterol synthesis in sterol metabolism and the purine salvage pathway also offer potential drug targets for therapeutic intervention [14]. Some of these pathways will be discussed in greater detail in the later sections of this paper.
Due to the exponential increase in genomic information, researchers are now confronted with a rapidly expanded list of gene products from which to select prospective targets. Several scoring schemes have been proposed which surveys the genome of a pathogenic organism and ranks genes according to their potential as drug targets [8,16,17]. Most schemes give a high weightage to the essentiality of the protein in the life cycle of the parasite, conservation of the target among different sister species and its corresponding absence in humans. Experimentally, either the lethality of gene deletion or insertion of transposons into the selected gene has been used to determine its essentiality. Non-essential genes could also be selected as targets, provided they play a vital role in the infective virulence of the pathogens. Other considerations includes the assayability of the protein, expression level during the life cycle of the organism possibly determined by microarray data and computational flux based analysis to gauge the effect of protein inhibition on the integrity of biochemical networks. In this connection, the TDRtarget database is one of the most well cited amongst such databases [16]. This database consists of an exhaustive list of drug targets from the genomes of L. major, T. brucei, M. leprae and a host of other pathogens responsible for neglected tropical diseases. A useful feature of the database is its ability to prioritize a set of drug targets, where each criterion is assigned a weight and there is flexibility to change the weights associated with different factors (pathogenicity, essentiality, etc.) in the scoring scheme to extract a 'custom-made' list of targets relevant to the research interest of the user.
Several crystal structures of trypanosomal proteins, either individually or complexed with inhibitors are currently available in the Protein Data Bank, such as trypanothione reductase (T. brucei), trypanothione synthetase (L. major), pteridine reductase 1 (L. major), nucleoside hydrolase (T. vivax) and ATP dependent phosphofructokinase (T. brucei), which provide a detailed three dimensional structure of their active sites facilitating the design of specific inhibitors. In order to generate a library of prospective ligands which could have high affinities towards the active sites of targeted proteins, drug databases could be searched with structure based pharmacophores derived from protein ligand complexes. 'Scaffold hopping' or 'chemotype switching' [18][19][20], which involves identifying molecules with dissimilar backbone structures yet exhibiting very similar pharmacological properties, is one of the widely used techniques to generate compound libraries for eventual screening [21][22][23][24][25]. Lately, considerable success has been achieved in the application of structure based pharmacophores in the identification of lead compounds [21][22][23][24][25]. In the current work the human and the L. major genome have been compared to identify a set of proteins unique to the parasite. Crystal structures of these proteins or those of closely related homologues have been extracted from the Protein Data Bank (PDB) and the literature has been extensively surveyed to identify their specific high affinity inhibitors. Crystal structures of the protein (target) -inhibitor complexes have then been utilized to generate structure based pharmacophores. In case the crystal structures of the ligand bound targets were not available in the PDB, the inhibitors were computationally docked into the active sites of their receptors. Finally, the ZINC database and Drug Bank has been searched utilizing this set of pharmacophores to generate a set of compounds which could serve as a library in the search for prospective antileishmanial drugs.

Methods
The annotated coding sequences (CDS) from the genomes of L. major (8316 CDS), L. donovani (8032 CDS), L. mexicana (8249 CDS), L. braziliensis (8056 CDS), L. infantum (8227 CDS), T. brucei (9962 CDS) and human (~41961 CDS) were downloaded from the NCBI genome database (http://www.ncbi.nlm.nih.gov/genome/; updated as on September 1, 2013). The standalone BLAST [26] was obtained from the NCBI ftp server (ftp://ftp.ncbi.nlm. nih.gov/blast/executables/blast+/LATEST/). Here BLAST refers to the BLASTp program of the NCBI standalone BLAST package which aligns protein amino acid sequences. To compare the annotated protein sequences between human and L. major(CDS's) , the CDS's from L. major were first fed in as a query (in FASTA format) whereas the human CDS's were processed in the makeblastdb tool to form a BLAST database. This was followed by a second run of BLAST (with identical parameters) in which the human sequences were considered to be 'query' and the L. major proteins the database. Proteins which simultaneously passed identical filters in both the runs of BLAST were considered for the second step in the pipeline. A sole exception was made in the case of ATP dependent phosphofructokinase (PFK) in the first filter as the possibility of PFK being a drug target for trypanosomatids has been mentioned in the literature [27][28][29]. To compare L. major CDS's with those of closestrelated species, L. major proteins were fed in as a query and sequences from the other genome were made the database. All alignments with an E-value less than one were output from the program and default options were used for all other parameters. The software to analyze the BLAST outputs was developed locally in C/C++, Perl and the final results were displayed on Microsoft Excel sheets for further analysis.
In order to identify the unique metabolome in L. major with respect to its human host the 'Comparative Analysis and Statistics' option in the BioCyc Database (http://www.biocyc.org/) was used [30]. The metabolic pathways and enzymes associated with this unique set of metabolites from L.major were manually culled from the LeishCyc database (http://biocyc.org/LEISH/organism-summary?object=LEISH).
Template based homology modeling was performed using the MODELLER software both in the standalone mode and also implemented in Accelrys Discovery Studio 2.5. In addition, the GUI version of Modeller 9.11v (Easy Modeller 4.0) [31] was also used. GOLD 5.2 (http://www.ccdc.cam.ac.uk/Solutions/Gold-Suite/Pages/GOLD.aspx) was used to dock specific ligands onto the active site of their corresponding enzymes with the following parameters: population size 100, selection pressure 1.1, number of operations 100000(min) -125000 (max), islands 5, niche size 2, crossover frequency 95, mutation frequency 95, migration frequency 10 and search efficiency 100%. The program was run at least 10 times in order to confirm the best docking solution which was identified based on three criteria a) the CHEMPLP score b) rmsd between the docked solution and the initial placement of the ligand and c) visual survey and examination of the contacts between the docked ligand and protein. In case sufficient prior information was already available with regard to the position of the ligand in the active site of the protein (such as in Group B : See Section on The Protein -Ligand Complexes), the docking solution with the minimum rmsd (generally less than 1.0 Å) and whose CHEMPLP score was amongst the top three (comparable to the score derived from the original proteinligand complex available in the PDB), was accepted as the most reasonable solution, subsequent to the visual inspection of the ligand bound active site. In cases where such information was either ambiguous or limited (Group C : See Section on The Protein -Ligand Complexes) the threshold on rmsd was relaxed to about 2.0 -2.75 Å, the interactions of the ligand with active site were visually examined and the solutions grouped into sets with similar geometry with respect to the binding site. Amongst these sets, the pose with the maximum number of physically meaningful interactions and the best CHEMPLP score was accepted as the most favoured solution. All solutions which exhibited significant displacement of the ligand from the putative active site of the protein (>3.75 Å) were not considered. The decisions regarding both ligand flexibility and the flexibility of residues constituting the active site were decided on a case by case basis which will be discussed below (in the Results and Discussion section). Generally, in case the protein-ligand complex was available in the Protein Data Bank (PDB) and the leishmanial protein modeled utilizing the molecule from such a complex as a template, the ligand was transferred onto the modeled protein by utilizing the rotation matrix, translation vector derived from the superposition of the modeled protein to the template (Dali Server; http:// ekhidna.biocenter.helsinki.fi/dali_lite/start) [32]. Most often in such instances both the ligand and the active site were held rigid whilst docking with GOLD. For ligands, whose complexes with the targeted protein (from L. major) or its close homologue were not available in the PDB, the ligand coordinates were derived from its closest related structure (as found as a complex in the PDB) by fourth atom fixing techniques and energy minimized by the semi-empirical quantum mechanical method in the program GAMESS [33,34]. It goes without saying that the protein in such a complex would either be the target (L. major) or a close homologue from a sister trypanosomatid species. With the initial placement of ligand into the protein active site using the same techniques mentioned above, the newly added chemical groups to the parent compound (obtained from the PDB) were rendered flexible in the subsequent docking by GOLD, in addition to selected active site residues which could provide steric hindrance in the docking process. These residues were identified by visual examination of the initial docked position and examination of the list of contacts. In GAMESS the self-consistent field wave function with the semi-empirical basis set (AM1 model Hamiltonian) was used and the optimum tolerance of the energy minimization cycle was set to 1.0e-5. Where no information was available with regard to the association of the ligand with the target protein, blind docking was attempted subsequent to the placement of the ligand at the centroid of the putative active site of the enzyme.
LigandScout version 3.12 (build 20130912) was used to generate the structure based pharmacophores from the crystallographic and modeled/docked complexes, with manual monitoring of the entire process at every stage. To validate the pharmacophores, the ligand along with other active compounds (with relatively less IC 50 values though with similar structures) were made the kernel of a database which also included decoys generated from the D.U.D.E. decoy generator (http://dude. docking.org/generate). The specific ligand, along with other actives and decoys were next submitted to OMEGA 2.5.1.4 [35] to generate two conformers per decoy and each of the active molecules from the docked/crystal structures. Thus, the database for each ligand (inclusive of the other actives and decoys) consisted of about 1800 molecular conformers in all. The library generating tool of LigandScout was then utilized to convert the database into a library (*.ldb format), prior to searching the library with the corresponding pharmacophore. Invariably, the specific ligand used to derive the structure based pharmacophore would be detected at the topmost rank. Every validation database was split into two, one consisting of actives and the other of decoys. Ligand screening option in LigandScout was then invoked to search both the databases with the specific pharmacophore as the query (with the advanced options, scoring function: 'Pharmacophore -Fit'; Screening Mode: 'Match all query features'; Retrieval Mode: 'Get best matching conformation').For every case, the maximum number of omitted features were varied to get optimal results and the ROC curve. Those pharmacophore queries, which gave screening results with area under the ROC curve less than 0.75 were not utilized in subsequent calculations. Finally, the ZINC database was searched by all the structure based pharmacophores derived from the protein ligand complexes.

Results and discussion
Comparative genomicshuman versus Leishmania The whole set of annotated protein-coding genes from L. major genome (8316 CDS) was compared against the ones from human genome (41961 CDS) and those parasitic proteins (4991 among 8316 sequences) which could not align with any human gene, (first filter) with pident (percentage sequence identity) >35% and a simultaneous query coverage >50%, in two way reciprocal BLAST runs with L. major as query, human as database and vice versa (Materials & Methods), were selected for the next filter. Hypothetical sequences (4407 CDS) were removed from this list (second filter) and the Protein Databank (PDB) searched for each of the remaining sequences, which amounted to a total of 584 putative sequences. The PDB database was downloaded from the RCSB-PDB (http://www.rcsb.org/pdb/home/home.do) website and incorporated into BLAST using the methods given above (Materials and Methods). Only those genes were selected for subsequent analysis (a total 90 sequences) which recorded hits in the PDB with >40% sequence identity and a simultaneous query coverage >75% (third filter). Each gene from this set (consisting of L. major proteins alone) were then checked for BLAST hits (pident >40% & query coverage >75%) in the genomes of L. donovani, L. mexicana, L. braziliensis, L. infantum and also in the clusters of orthologous proteins in the Tritryp database (based on the OrthoMCL annotation [36]) . Upon merging both sets of data, only those (L. major) proteins with homologues/ BLAST hits in all the five genomes were retained (fourth filter). The final set of genes consisted of a total of 86 polypeptide chains (Additional file 1). The schematic representation of the successive filters to arrive final list of drug targets is described in Figure 1. The separated list of hypothetical sequences (4407 CDS) were independently searched in the PDB and eighteen sequences scored hits satisfying the above criteria (given in Additional file 2). Upon the application of more stringent criteria (pident >25 and query coverage >33) in the first filter, 47 out of the original 86 genes satisfied the new threshold values, of which 13 genes were retained from the first thirty proteins of the original set (Additional file 1).

Target prioritization
This list of parasitic proteins were then sorted according to a "weighted union" scoring scheme based on the following factors: i) essentiality as determined from experimental studies, ii) virulence factor iii) expression profile and iv) whether the natural substrate to the protein is a ligand unique to Leishmania spp. with respect to human. Information regarding essentiality and virulence were obtained by an extensive literature survey, in addition to consulting the TDR database. A list of metabolites unique to Leishmania spp. were identified by searching the leishmanial and human metabolome databases [37]. The metabolome of L. major was obtained from the Biocyc Database (http://biocyc.org/LEISH/class-tree? object=Compounds) (see Method) and the corresponding metabolome from human was available in the Human Metabolome database (HumanCyc; http://biocyc.org/ HUMAN/organism-summary?object=HUMAN). Utilizing bioinformatics tools available in the Biocyc, the two metabolomes were compared and the ligands unique to Leishmania spp. were filtered out. From this comparison 129 metabolites were identified to be unique to the parasite. The full set of enzymes corresponding to these substrates were assembled into a blast database and run against the initial list of 86 genes. Only eight polypeptide chains from this list of 86 proteins were found to be associated with unique ligands. As has been mentioned previously, leishmania has two stages in its life cycle (amastigote and promastigote) and the information whether the gene was 'constitutively expressed' in both stages or in only one of them was obtained primarily from GeneDB [38], which provides a convenient platform to cull information with regard to leishmanial gene expression from the work of Leifsko et al. [39]. A Figure 1 Schematic representation of steps involved in the selection of drug targets from the genome of L. major (Additional file 1). The separated set of hypothetical proteins were independently searched against the PDB (Additional file 2). score of 100 was awarded on the full satisfaction of any one of the criteria given above and thus the highest possible score obtained by any protein could be 400. Targets lacking experimental data with regard to essentiality were still given 50 in case strong arguments existed in favour of their being essential genes (e.g. phosphofructokinase in the glycolytic pathway) and 20 if the protein was found to be indispensable in a sister species. A score of 100 was assigned to the gene which was constitutively expressed in both stages and 50 if expressed in only one of them. The final list of prioritized proteins are given in (Additional file 1) and the first thirty proteins from this list ( Table 1) are described in some detail in the context of unique metabolic pathways of leishmania parasites. The only available information with regard to the 18 hypothetical proteins (Additional file 2) was that they are constitutively expressed in both stages of the leishmanial life cycle [39] , and thus upon prioritization, these proteins (all with an identical score of 100) did not find their place amongst the first thirty, in the list of annotated proteins (given in Table 1).
As expected, prominent amongst the list of prioritized proteins (Table 1) are three enzymes associated with trypanothione i)trypanothione reductase (TR: 1), ii)putative trypanothione synthetase (TS: 3) and iii)trypanothionedependent glyoxalase I (GLO1: 10). The trypanothione system in leishmania which replaces the ubiquitous glutathione system present in humans, enables the parasite to survive the high oxidative stress found in the host immune system and the presence of toxic heavy metals [40]. Both trypanothione synthetase (which synthesizes trypanothione from glutathione and spermidine) and trypanothione reductase (which keeps it in its reduced form in the presence of NADPH) are attractive drug targets [41], as this system is the only pathway involved in the crucial regulation of oxidative stress in the parasites. Reduced trypanothione in turn causes the reduction of tryparedoxin which then transfers electrons to the recycling enzyme tryparedoxin peroxidase [40]. Although TR and human glutathione reductase(GR) exhibits 35% sequence identity and shares many physicochemical properties , yet their corresponding active sites are different due to their diverse substrate specificities [42]; TR binding only to the oxidized forms of positively charged glutathionylpolyamine conjugates whereas human GR associates only with negatively charged glutathione [43]. The difference in specificity is primarily due to the presence of five amino acid residues in the TR active site, which confers enhanced hydrophobicity, negative charge and wider access to its binding pocket relative to human GR [43]. Several inhibitors specifically designed for TR have yet to be entirely successful as drugs, probably due to the wide active site of the enzyme which poses obstacles for structure-based drug design, coupled to the pharmacokinetic properties of the inhibitors [43]. In addition, trypanothione is also implicated in the Glyoxalase I, II systems in the parasite (again replacing glutathione in humans) which is responsible for the removal of toxic and mutagenic methylglyoxal formed as a byproduct of glycolysis. The crystal structure of leishmanial glyoxalase I (GLO I) reveals differences with respect to the corresponding human enzyme in its active site architecture [44], which includes increased negative charge and hydrophobicity along with the truncation of a loop which could be involved in the catalytic activity of the human enzyme.
The surface glycocalyx of Leishmania spp. consists of several unique sugars and glycoconjugates which mediate hostparasite interaction and virulence. A significant fraction of these glycoconjugates consists of lipophosphoglycans (LPG) implicated in the adhesion of leishmania to the host cell and glycoinositolphospholipids (GIPLs) involved in pathogenesis [45]. Both LPGs and GIPLs, have β-galactofuranose (β-Galf ) as one of their main constituents, an unusual sugar not found in vertebrates. Three proteins in (Table 1), UDP-galactopyranose mutase (UGM: 4), putative UDP-Glc 4′-epimerase (galE: 5), UDPsugar pyrophosphorylase (USP: 26) are constituents of biochemical pathways, either directly or indirectly responsible for the synthesis of β-Galf. β -Galf is synthesized from its precursor UDP-galactose (UDP-Gal) by the enzyme UDP-galactopyranose mutase (UGM), inhibition of which is known to regulate parasitic virulence and hence is an attractive target [46]. The cellular pool of UDP-Gal is contributed by the Isselbacher and Leloir pathways [45,46]. In the Leloir pathway, UDP-Gal is synthesized from galactose -1 -phosphate by UDP -sugar pyrophosphorylase (USP), whereas in the Isselbacher pathway galactose-1-phosphate is converted to UDP-Gal and glucose-1-phosphate by galactose-1-phosphate uridyltransferase. Within this pathway, the reversible and bidirectional enzyme UDP-Glc 4′-epimerase (GalE) can convert UDP -Gal to UDP -Glucose and vice versa. Despite low sequence identity of about 33% between human and parasitic GalE , high resolution crystal structures of both proteins reveal a common overall topology and similar protein-ligand interactions at the active site [47]. GalE holds great promise as a drug target in T. brucei.
Next, a set of five proteins implicated in purine/pyrimidine metabolism occupied fairly prominent positions in Table 1: putative deoxyuridine triphosphatase nucleotidohydrolase (dUTPase: 11), nonspecific nucleoside hydrolase (NNH: 24), and putative OMPDCase -OPRTase (OMPDC-OPRT: 29). Unlike their human and other mammalian hosts leishmania lack the molecular machinery to synthesize purine nucleotides de novo and is thus dependent on a purine salvage pathway [48]. Extensive genetic studies on the purine salvage pathway show it to be highly complex with several redundant links. For  Prioritized list of drug targets consisting of the first thirty proteins. Information regarding their metabolic pathway, genebank ID and association with a ligand unique to leishmania w.r.t human are also included in the table. Proteins given in bold were subsequently used for pharmacophore calculations.
example, mutant strains individually lacking one of the key enzymes of the pathway: adenine phosphoribosyl transferase (APRT), hypoxanthineguanine phosphoribosyltransferase (HGPRT), adenosine kinase (AK) and xanthine phosphoribosyl transferase (XPRT) were all found to be viable [48]. However, the phenotypic characterization of the double Δhgprt/Δxprt mutant indicated that purine salvage from extracellular sources is primarily funneled through XPRT, HGPRT with AK and APRT being by and large superfluous [49]. Thus, the central role played by these two enzymes (HGPRT & XPRT) confers functional importance to downstream molecules which distributes their products into adenylate and guanylate nucleotides. Adenylosuccinate synthetase (ADSS) and adenylosuccinate lyase (ASL) are two such enzymes which sequentially convert IMP to AMP, the former catalyzing the GTP dependent formation of adenylosuccinate from IMP and aspartic acid while the latter removes a fumarate from adenylosuccinate formed by ADSS. Knock out mutants of ASL shows highly attenuated infectivity of the parasites [49]. Null mutants of purine transporters LdNT1, LdNT2 (Δldnt1, Δldnt2 and Δldnt1 /Δldnt2) do not appear to interfere with parasitic growth based on natural purine sources (with the exception of xanthosine) [48]. Subsequently, the enzyme nonspecific nucleoside hydrolase (NNH) was identified to perform the non-specific conversion of purine nucleosides to nucleobases, which can then be transported by other transporters LdNT3 and LdNT4 [48]. In Leishmania spp. NNH hydrolyzes the N-glycosidic bond of both purine and pyrimidine nucleosides to yield ribose and other bases. Upon intake and conversion, adenine bases are irreversibly deaminated to hypoxanthine by the enzyme adenine aminohydrolase (AAH). However, despite its unique presence in the parasite (w.r.t mammals and humans), the enzyme was found to be nonessential as demonstrated by the viability of Δaah knockouts [50].
Both humans and leishmania are capable of the de novo synthesis of pyrimidines though there exists considerable discrepancy in the organization of their corresponding enzymes into multifunctional polypeptides, cellular localization and allosteric regulators [51]. This is especially true of the last two enzymes in the pyrimidine synthesis pathway, orotate phosphoribosyltransferase (OPRT) and orotidine monophosphate decarboxylase (OMPDC), which are fused into one bifunctional protein, in both human and parasite. However the order of the polypeptide chains are reversed in both cases [51]. As putative OMPDCase -OPRTase (OMPDC -OPRT) is active in the final step of pyrimidine biosynthesis , its inhibition is expected to be lethal for the parasite. Finally, putative deoxyuridine triphosphatase nucleotidohydrolase (dUTPase) hydrolyzes dUTP to dUMP and pyrophosphate leading to the maintenance of the dTTP:dUTP ratio in the cell ensuring precision in DNA replication [52].
Another unique feature of protozoa belonging generally to kinetoplastids is the compartmentalisation of the first seven glycolytic enzymes (and therefore glycolysis) into organelles called glycosomes, in contrast to other organisms where glycolysis generally occurs in the cytosol. This feature is essential for the regulation of glycolysis in the parasite and also to effectively switch over to anaerobic forms of respiration [53]. Three such glycolytic enzymes 2,3 -bisphosphoglycerateindependent phosphoglycerate mutase (PGM: 19), ATPdependent phosphofructokinase (PFK: 21) and glycerol -3 -phosphate dehydrogenase (23), including two other enzymes either upstream or downstream of the glycolytic pathway, putative NADP-dependent alcohol dehydrogenase (17) and glucokinase (22) appeared in Table 1. Since glycolysis is the only known source for ATP in leishmania these enzymes offer attractive targets, especially phosphoglycerate mutase (i-PGM) which is distinct in terms of structure, catalytic mechanism and whose reduced expression was also found to be lethal for cultured T. brucei. However, despite intense effort on some of these validated targets effective pharmaceutical interventions have yet to emerge.
Traditionally, drugs inhibiting folate metabolism, specifically dihydrofolate reductase (DHFR) and thymidylate synthase (TS) have been successful as antibacterials. DHFR maintains the THF (N 5 , N 10 -methylene tetrahydrofolate) pool in the cell by the NADPH dependent reduction of dihydrofolate (DHF), which in turn is utilized by TS to catalyze the conversion of dUMP to dTMP. Lack of dTMP curtails DNA replication leading to cell death [54]. In leishmania both these enzymes are conjoined into a bifunctional enzyme DHFR-TS which is the primary source of reduced folate and also the lone source of thymidylate in the parasite [55]. However, inhibition of this enzyme is ineffective in promoting lethality due to the presence of another short chain nonspecific dehydrogenase/reductase pteridine reductase 1 (PTR 1: 2), which acts both as a modulator and bypass for inhibitors targeting DHFR -TS. PTR 1 is responsible for the essential salvage of unconjugated pterins (such as biopterins) as it catalyzes the NADPH dependent two step reduction of oxidized pterins to their active tetrahydro forms. Deletion mutants of PTR-1 alone were nonviable and hypersensitive to the drug methotrexate (MTX) and had to be simultaneously inhibited [54], in case DHFR-TS was being targeted.
A wide range of proteases spanning most of the major classes have been identified in the leishmanial genomes, with L. braziliensis alone having at least 44 cysteine , 23 serine and 97 metalloproteases [56]. Of these, cysteine proteases (CP) have been confirmed as virulence factors playing a major role in mediating hostparasite interactions; with parasites (L.tropica) treated with CP inhibitors exhibiting reduced viability, growth and pathogenicity. Metalloproteinases have also been known to be expressed on the surface of Leishmania spp., protecting the pathogen from the defensive action of host enzymes and the phagolysozome of macrophages [56]. In addition, a CP inhibitor; inhibitor of cysteine peptidase (6) also appeared at a prominent position in Table 1 by virtue of its being a virulence factor and a probably protecting the parasite from the hydrolytic environment of the sandfly gut or the internal environment of host macrophages.

The proteinligand complexes
An exhaustive literature survey was conducted to identify inhibitors for the first thirty proteins from Table 1. Inhibitors with experimentally determined IC 50 or K i values were found in the literature for only 8 out of the 30 proteins. A total of 27 inhibitors (Table 2, Additional file 3) were shortlisted for the above mentioned eight target proteins by selecting those ligands with the lowest IC 50 values from a given family of compounds (that is a class of compounds with a conserved backbone/kernel and diverse peripheral substituents). Thus a total of 32 protein-ligand complexes (27 from L. major and 5 from T. brucei, iTb4-8) were divided into three sets (Table 3; Group A, B & C): (I) In the first set (Group A), the crystal structures of the ligand bound protein complexes were readily available in the protein data bank and were utilized directly for computing the structure based pharmacophores (Table 3, Additional file 3; inhibitors i1,i2 and i3). Henceforth the inhibitors will be referred to by the number enumerated in Table 2.
(II) In Group B, the crystal structures of the ligand-protein complexes were available, with the protein either being the actual targeted molecule (from L. major) or a closely related homologue, with sequence identity exceeding 60% with respect to the corresponding protein from L. major. In the latter cases the homologous protein present in the PDB was used as a template to model the parasitic protein. Likewise, the specific ligand used to form the complex could either be the original small molecule found in the PDB file; or the ligand coordinates from the crystal structure were used as a template to add peripheral chemical groups. Specific ligands were docked onto the corresponding parasitic proteins using the GOLD software. In addition, the original protein-ligand complexes present in the PDB (from other trypanosomes) were also included in the subsequent calculations (Table 3, The ability of the docking protocol as implemented in the GOLD software to independently locate the ligand position as found in the crystal structure was verified for all the complexes used in this study. (III) In Group C no information regarding both the proteins and their specific ligands were available in the PDB, though crystal structures exceeding a sequence identity of 50% with respect to the target were present in the database, which were used as templates to obtain the three dimensional models of the leishmanial proteins. The ligand coordinates were either obtained from the PubChem database (NCBI) or constructed ab initio (Material & Methods) by fourth atom fixing techniques. Blind docking (by GOLD) was used to position the inhibitor onto the putative active site (as obtained from the literature) of the enzyme (Table 3, Additional file 3; inhibitors i25, i26 & i27).
Thus 32 structure based pharmacophores were computed from their corresponding protein -ligand complexes, of which 3 complexes belonged to Group A, 21to Group B and 3 to Group C. In addition, 5 complexes (for inhibitor no. iTb4 to iTb8) whose proteins belonged to other trypanosomatids were also included in the calculation, by virtue of their being templates for the leishmanial proteins in Group B.
Complexes in Group A (Table 3, Additional file 3) consists of pteridine reductase 1 (PTR 1) bound to inhibitors methotrexate (1E7W; inhibitor no. i1), trimethoprim (2BFM; inhibitor no. i2) and 10propargyl-5, 8dideazafolic acid (2BFA; inhibitor no. i3). Crystal structures of these three complexes include the cofactor NADPH. PTR 1 is a homotetramer, with individual  Other protein targets in Group B apart from PTR -1 were TR, PFK, dUTPase and NNH. Five complexes of TR from T. brucei were included (inhibitor numbers iTb4 -iTb8; See Table 2, Additional file 4) directly from the crystal structures with PDB codes 2WP5, 2WP6, 2WPE, 2WPC and 2WPF [59]. TR from T. brucei has a sequence identity of 66.5% with respect to the corresponding protein from L. major and was used as a template to model the parasitic protein. TR (T. brucei) is a homodimer with each subunit consisting of three domains, the inhibitor binding cleft being formed by a congregation of α helices in domain I [60]. The binding site exhibits conformational flexibity indicating an induced fit of the ligand to the binding pocket. Thus for each ligand (i4 -i8) the leishmanial protein was repeatedly modeled from its original complex (2WP5, 2WP6 etc. as the case maybe). The ligands were initially placed in the active site Inhibitors identified for eight of the proteins from Table 1 along with their respective IC 50 's or Ki values with respect to their corresponding target proteins. The inhibitors will be referred to by the numbers assigned in the first column.   (Table 2, Additional file 4) was obtained from (4 s)-3-benzyl-6-chloro-2-methyl-4-phenyl-3,4dihydroquinazoline (inhibitor i8:2WP6) and the final docked position (allowing for flexibility in residues Glu 18, Trp 21 and Met 113 in the active site) had a rmsd of 0.66 Å (with respect to common atoms of 3,4-dihydro-quinazoline analogues) and a score of 34.69. The same set of core residues (Leu 17, Glu 18 etc.) including Ser 14 and Leu 120 also formed the active site for inhibitor i9 (Additional file 4).
The active site for fructose-6-phosphate (F6P) of the trypanosomatid ATP dependent PFK from T. brucei exhibits significant structural differences compared to its human counterpart and is located at the interface of two subunits in the homotetramer. Each subunit is composed of three domains (A, B, C) with the ATP binding site (between domains B and C) lying adjacent to the F6P site [61]. The complete tetramer of PFK from L. major was modeled based on the homologous protein from T. brucei (3F5M) with respect to which it shares a 71% sequence identity and ATP (along with the Mg 2+ ) were docked/placed in the model, prior to placement of the substrate. The crystal structure of PFK from T. brucei in 3F5M is in complex with ATP and does not contain the substrate F6P, whose coordinates were extracted from 1MTO which consists of PFK from B. stearothermophilus in complex with F6P [61] and docked onto the corresponding active site in leishmanial PFK, subsequent to initial positioning of the molecule, following similar methods given above. Beginning with the coordinates of the furan ring of F6P, three other inhibitors were built by making appropriate substitutions : a N,N0-substituted-1-amino-2, 5-anhydro-1-deoxy-1-D-mannonamide derivative (inhibitor no. i12); 2,5-anhydro-1-deoxy-1-(3,4-dichlorobenzylamino) -D-mannitol (inhibitor no. i13) and 2,5-Anhydro-1deoxy-1-(3,4-dichlorobenzylamino)-D-3,4-dichlorobenzylmannonamide (i14). Subsequent docking with GOLD on a rigid active site gave CHEMPLP scores 70.35, 67.80 and Figure 3 Active site of the modeled Trypanothione Reductase (TR) from L. major complexed with ligand i4 (red) along with iTb4 (blue) placed utilizing the superposition matrices and vectors obtained from superposing the protein model onto its template (2WP5). 46.81 for the inhibitors i12, i13 and i14 respectively. Repeated attempts to dock the ligands on a flexible site did not yield physically meaningful results. The only common feature between the inhibitors -i12, i13 and i14 was the furan ring (from F6P) and considerable variability in the remaining features tended to shift the ligands from their initial position depending on the length and chemical character of the substituents on either side of the furan ring. Consequently with a few exceptions, the constellation of residues constituting their binding pockets were significantly different (Additional file 4).
The crystal structure of dUTPase from L. major, is a dimeric all α protein (in contrast to its trimeric all β human homologue) was obtained from the PDB (2CJE) [62]. The active site is located in the vicinity of the interface between the rigid and mobile domains which constitute each subunit [62]. In addition, the site on one subunit is also constituted by a loop contributed by the other monomer. Crystal structures of the closed and open forms of the enzyme from T. cruzi revealed a significant movement of the mobile domain and rearrangement of the secondary structural elements [62]. The closed ligand bound form of dUTPase from 2CJE was used to model complexes with three other inhibitors. Since the enzyme sits in a special position in the crystal structure, the entire dimer of dUTPase was generated prior to the placement of the other ligands. The bound substrate analogue DUPNHP (2′-deoxyuridine 5′-alpha,beta-imidodiphosphate from 2CJE) was used to design the three inhibitors : a 5′-tert-butyldiphenylsilyloxy derivative (i15), a 2′-deoxyuridine 5′-alpha,beta-imido-diphosphate (i16), 5′-tritylamino-3′-fluoro-2′,3′,5′-trideoxyuridine (i17) and 5′-O-triphenylsilyl-2′,3′-didehydro-2′,3′-dideoxyuridine (i18). CHEMPLP scores for all the four inhibitors ranged from 90-100 and the rmsd's between the starting and final docked position ranged from 0.5 -1.2 Å. Based on the visual examination of the initial ligand position in the active site of the enzyme and survey of the ligandprotein atomic contacts, selected residues were rendered flexible in the docking process which could provide steric hindrance to the optimal orientation of the ligand or adopt alternative conformations in the binding pocket (inhibitor i15 -flexible residues Asn25, Glu48, Glu51, Glu76 , Tyr191 ; inhibitor-i17 : Asn25, Glu51 , Tyr 191; inhibitor-i18 : Glu48, Glu51, Glu76 and Tyr191).
For NNH (Nonspecific Nucleoside Hydrolase) six inhibitors (inhibitor no. i19 to i24) were chosen for docking. Complexes of three of these inhibitors (i22, i23, i24) with a homologous protein from T. vivax were available in the PDB (2FF2, 3EPW and 3EPX respectively), whereas the uncomplexed individual structure of NNH from L. major was found in 1EZR. The α|β enzyme from L. major is a homotetramer with an indispensable calcium ion in its active site [63]. Coordinates for the inhibitors i19, i20 were built starting from the pyrimidine group in the structure of immucilin -H present in 2FF2 and inhibitor i21from the ligand (i23) present in 3EPW. Docking of these inhibitors in the active site of the protein (including the Ca2+ ion), exhibited CHEMPLP scores and rmsd's (with respect to their original placement) ranging from 50 -60 and 1.2 -1.6 Å, respectively. Flexibility was allowed for residues Phe167 and His240 in the enzyme active site during the docking process for all ligands associated with this protein. Interactions with residues Asp 15, Asp 14 (with the exception Inhibitor -i22), Thr 126, Met 152 (except Inhibitor -i20) Asn 160, Glu 166, Phe 167, Asn 168, His 240 , Asp 241 and the calcium ion were common to all the ligands. Contacts with Leu 191 were found only for Inhibitors -i19 and i22 (Additional file 4).

Group C
Due to the lack of available prototypes or templates in terms of actual crystal structures depicting the position of the ligands in their binding sites, the confidence level associated with the docked complexes in Group C is necessarily low and thus the discussion of these complexes will be fairly brief. The crystal structure trypanothione sythetase (TS) from L. major (2VOB) has three putative binding sites for ATP, spermidine and glutathione (GSH). Inhibitor-7 (1-[3-(3-fluorophenyl) indazol-1-yl]-3, 3-dimethylbutan-2-one) binds with uncompetitive inhibition for both the putative ATP and GSH binding sites whereas exhibits competitive inhibition for the site associated with spermidine. Thus, the inhibitor (i25) was placed at the centroid of this site constituted by residues Arg 613, Arg 328, Ser 351, Glu 355, Phe 249 and Glu 407. As mentioned previously, inhibitor -i25 was constructed by ab initio fourth atom fixing techniques coupled to energy minimization. Several iterations with flexible ligand and rigid side chains of the active site led to a final CHEMPLP score of 47.32. Introduction of side chain flexibility did not appear to improve final docking poses. Likewise, inhibitor -i26 (ebselen) was positioned in the putative UDP binding pocket of UDP-glc-4′-epimerase (GalE) of L. major based on the centroid of residues R335, R268, N202 and H221. GalE from L. major was modeled based on the homologous enzyme from T. brucei (~58% sequence identity : 1GY8). Coordinates of ebselen were built by the same methods mentioned above and the final docked position had a CHEMPLP score of 36.96. The crystal structure of glyoxalase-I (GLO I) from L. major was obtained from 2C21 [44] and S-4bromobenzylglutathionylspermidine (inhibitor-i27) was docked into the putative active site of the enzyme constituted by residues : A chain -His8, Arg12, Arg33, Trp35, Val37, Glu52, Glu59, Asn63 and B chain -His77, Asp100, Tyr101, Phe107, Met108, Tyr118, Glu120, Met127 and Lys130. The CHEMPLP docking score of 79.43 was obtained for this docking. For all docking runs in case of i27 both the energy minimized ligand and the residues composing the protein active site were held rigid, as additional trials with either flexible ligands and/or side chains led to significant shifts in their position away from the putative binding sites. As mentioned previously the confidence level is relatively low for these complexes.
Pharmacophore design and screening of zinc database 34 structure based pharmacophores were derived from their corresponding ligand bound three dimensional structures using LigandScout version 3.12 (build 20130912). Those pharmacophores whose area under the ROC curve (See Methods), were less than 0.75 whilst validation, were filtered out (i8, i11, i17 and i20). In addition, pharmacophores with either too few (i2, i20, i26: 3 features) or too many features (i14:13 features, i16:16, i27:12) were removed, leaving a total of 23 pharmacophores for subsequent calculations (Table 4). These pharmacophores were used to search the ZINC database using ZINCPharmer (html search engine) with parameters: 'Max Hits per Conf' = 1, 'Max Hits per Mol' = 1, 'Max Total Hits' = 20 and 'Max RMSD' = 0.5, 0.75, 1. The Max RMSD was gradually increased only if no hits were recorded in the initial cutoffs. The topmost hit of every pharmacophore with the least RMSD, along with hits which were similar to approved drugs (generally greater than 90%) are shown in Table 4 and all the hits are given in Additional file 5.A total number of 344 hits were recorded from the ZINC database which were then used to search the Drug Bank (http://www.drugbank.ca) with a cutoff in similarity score set to 70%, so as to identify similar molecules actually in use as pharmaceutical products. From the 344 compounds distributed over 23 pharmacophores, 9 exhibited similarities to drugs under investigation, 319 showed similarities to experimental drugs (known to bind to specific proteins in mammals, bacteria, viruses, fungi, or parasites) and 16 were similar to approved drugs (in at least one country). Of these 344 hits, 40 were from complexes in Group A, 304 from Group B and none from Group C.
Interestingly, the search for approved drugs similar to ZINC compounds led to paromomycin (with a similarity score of 0.944 with respect to immucilin H in complex with nonspecific nucleoside hydrolase, i22)spontaneously appearing in the list, a drug having passed all clinical trials and is now currently being prescribed for visceral leishmaniasis. Paromomycin has also been successfully used in topical creams for the treatment ulcerative cutaneous leishmaniasis [64]. The inclusion of paromomycin provides some confidence that some of the listed drugs (in Table 5) could possibly exhibit some measure of antileishmanial activity. Likewise framycetin, neomycin, gentamicin, netilimicin and tobramycin all belong to the same class of aminoglycoside antibiotics generally known to inhibit protein synthesis. Framycetin and neomycin have found extensive use in topical ointments and creams. Didanosine and vidarbine are antiviral drugs the former being a nucleoside analogue of guanosine with hypoxanthine attached to the sugar ring and the latter an analogue of adenosine, in this case Darabinose replacing D-ribose. Nelarabine on the other hand is a purine nucleoside analogue currently being applied in the chemotherapy of T-cell acute lymphoblastic leukemia. Other drugs include     lidocaine (and its analog tocainide) an amino amide type local anesthetic , primaquine a member of the 8aminoquinoline group of drugs used in the treatment of malaria/ pneumocystis pneumonia, pralatrexate an anti-folate for anti-cancer therapy and triamterene a diuretic drug for hypertension. Notably, pteridine reductase, trypanothione reductase, deoxyuridine triphosphatase have been found to be essential for survival and nonspecific nucleoside hydrolase plays a central role in the purine salvage pathway. Currently, our aim is to experimentally test the antileishmanial character of these compounds/approved drugs.

Conclusions
The work reported in this paper demonstrates the series of computational steps beginning with the comparison of genomes, prioritization of prospective drug targets, culling or assembly of inhibitortarget complexes through template based model building and docking, generation of pharmacophores and their subsequent use for searching small molecule databases (such ZINC/Drug Bank), to rationally assemble a set of lead compounds for experimentally testing as potential antileishmanials. The natural appearance of paromomycin, a drug currently being employed against visceral leishmaniais, in the list of Approved drugs with similarity score greater than 0.70 to specific ZINC hits. Information of the proteininhibitor complex corresponding to the pharmacophore is also given.