Computational toxicology using the OpenTox application programming interface and Bioclipse
BMC Research Notes volume 4, Article number: 487 (2011)
Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications.
This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources.
A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers.
We here report the establishment of a new interoperable platform for computational toxicology that is able to dynamically discover computational services running the latest predictive algorithms and models, while hiding technicalities by reusing a graphics-oriented workbench for the life sciences. The OECD QSAR ToolBox [1, 2] and ToxTree [3, 4] are existing softwares that aggregate predictive toxicity models, but do not integrate with other functionality easily, such as online services. Bioclipse, however, is designed to integrate local and remote functionality [5–7]. In this paper we outline how we implemented a new platform, integrating the OpenTox Open Standards  and the interactive, but scriptable Open Source workbench for the life sciences, Bioclipse. This approach makes it possible for anyone to make new computational toxicology models available to Bioclipse without the need to change the software source code.
Predictive toxicology is a field where knowledge from many sources needs to be integrated to provide a weight of evidence on the toxicity of untested chemical compounds. Typical sources of information include databases with in vivo and in vitro experimental data such as ToxCast and SuperToxic [9, 10], literature databases summarizing adverse reactions like SIDER , and computational resources based on toxicity data for other compounds including DSSTox . Importantly, this information should be visualized, preferably linked to the chemical structure of the compound, or by visualizing relevant life science data, such as gene, protein and biological pathway information [13–15] or metabolic reactions . Bioclipse was designed to provide such interactive data analysis for the life sciences.
Moreover, predictive toxicology is an advancing science, aiming to develop new alternative testing methods, satisfying the demanding risk assessment requirements of the European REACH guidance . The dynamic discovery of new toxicology-related data and computational methods is therefore of utmost scientific and practical importance. The EU FP7 OpenTox project recently developed a framework to enable the feasibility of semantic integration of such new resources .
We describe here the subsequent technological interoperation of Bioclipse and the OpenTox platform, such as implemented by the AMBIT software . This short report outlines what functionality the new combined platform provides to the toxicologist and what development is ongoing. At the core of the interoperation lies the use of the Resource Description Framework (RDF)  and related Open Standards. OpenTox uses RDF as a primary exchange format and the RDF query language SPARQL  to discover data sets, algorithms and models. Bioclipse was recently extended to support these standards , simplifying the interoperation task with OpenTox.
We outline three applications that exemplify how the various used technologies make this interoperability possible, starting with a computational toxicology example. Advantage is taken of three technologies that drive the interoperability. First, it uses the SPARQL RDF query language to discover functionality on the OpenTox network. Secondly, it uses the OpenTox web services for remote computation. Finally, all graphical user interfaces use a new Bioclipse Scripting Language (BSL)  extension to interact with OpenTox servers, allowing all interaction to be scripted and automated too.
Figure 1 shows how the interoperability of Bioclipse with the OpenTox API is designed, and in particular how it was used to extend the molecular descriptor calculation functionality in Bioclipse described previously . This functionality can be used to calculate properties such as logP and pKa, important to various aspects of toxicity, including membrane transport and receptor binding. Knowledge about such properties can be used under the European REACH regulation. For example, predicted physical and chemical properties can, under certain conditions, complement toxicity testing using animal experiments, and as such, calculation of such descriptors is increasingly relevant.
Bioclipse dynamically discovers descriptor algorithms exposed via the OpenTox servers, using the OpenTox ontology service's SPARQL endpoint. This SPARQL endpoint functions as a registry of available computational services on the OpenTox network, similar to the role of BioCatalogue . These services are described with the OpenTox ontology, which is available as Web Ontology Language  document at http://opentox.org/api/1_1/opentox.owl and discussed in detail in reference . Using the SPARQL query language Bioclipse can retrieve a list of available services. Moreover, when a new descriptor algorithm or model is registered on the OpenTox ontology service, it will automatically be picked up by Bioclipse. Figure 2 shows several discovered OpenTox descriptor algorithms, along with algorithms from other local (CDK ) and remote (CDK REST) providers. Using this approach, Bioclipse has access to the most recent descriptors relevant to toxicity predictions.
OpenTox provides web services to calculate a descriptor value for a given molecule. Using the linked resources idea of the semantic web, the descriptors discovered via the ontology server can be invoked via Bioclipse directly. As such, OpenTox-provided descriptor calculations can be mixed with descriptor calculations local to Bioclipse, or from other remote computational services, as described before . This creates a flexible application for the integration of numerical input for statistical modeling of toxicologically relevant end points, as well as the comparison of various predictive models for a more balanced property analysis.
All functionality for remote computing on the OpenTox network is also available as BSL scripting commands, allowing all OpenTox interoperation with the Bioclipse graphical user interface to be replicated using BSL scripts. Table 1 shows the BSL commands for service and data discovery and the invocation of remote services, under the categories Querying and Computation, respectively.
Using a second, data sharing use case we will explain how all graphical interoperation is using a BSL script extension. For example, Figure 3 shows the Bioclipse dialog for uploading a small data set with ten neurotoxins to an OpenTox server (see Additional file 1). This dialog asks which OpenTox server to upload to (the Ambit2 server is selected, http://apps.ideaconsult.net:8080/ambit2/), a title under which this data set will be available ("Ten neurotoxins found in Wikipedia"), and the data license or waiver under which the data will be available to others. Figure 3 indicates that the Creative Commons Zero waiver  was selected. Other options include the ODC Public Domain Dedication and Licence , Open Database License , and the Open Data Commons Attribution License . Optionally, the user can specify a web location for a custom license agreement under which the data is available, though we encourage users to select a standard license.
Technically, the dialog makes use of the script commands createDataSet (service, molecules), setDatasetLicense (datasetURI, licenseURI), and setDatasetTitle (datasetURI, title) (see Table 1). The latter two methods use the data set Universal Resource Identifier (URI) returned by the first method. When the upload has finished, the resulting OpenTox web page is opened in a browser window in Bioclipse (see Figure 4).
This use case shows nicely how the Bioclipse-OpenTox integration takes advantage of the fact that Bioclipse has all graphical user interface (GUI) functionality matched by a scripted equivalent. The use of the BSL directly, allows interaction with the OpenTox network to be automated, combined with other Bioclipse functionality into larger workflows, and makes it easier to share procedures with others, using social scientific sites like MyExperiment . An example BSL script for calculating molecular descriptors combines OpenTox functionality with cheminformatics functionality provided by the cdk script extensions (also available as Additional file. 2):
//requires an unspecified Bioclipse development version
service = ""; http://apps.ideaconsult.net:8080/ambit2/
serviceSPARQL = ""; http://apps.ideaconsult.net:8080/ontology/
stringMat = opentox.listDescriptors(serviceSPARQL);
stringMat.getColumn("algo");//returns the descriptor services
stringMat.getColumn("desc");//returns the BO entries
descriptor = stringMat.get(1,1);
molecules = cdk.createMoleculeList();
descriptor + " - " +
opentox.calculateDescriptor(service, descriptor, molecules)
- [0.11900000274181366, 2.2190001010894775] http://apps.ideaconsult.net:8080/ambit2/algorithm/org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor
Table 1 shows an overview of the available BSL commands for uploading data to and downloading data from OpenTox servers under the heading Data exchange.
The third demonstration of Bioclipse-OpenTox interoperability is the support for accessing protected resources within the OpenTox network. Despite preferences of the authors, we acknowledge that not all scientific data will be Open Data. As such, authentication and authorization (A&A) are important features of data access. OpenTox implements both aspects, and provides web services for A&A, allowing users to log in and out of OpenTox applications, accompanied by policy-based specification of OpenTox resource access permissions. Additionally, the same mechanism is used to restrict the access to calculation procedures, allowing to expose software with commercial licenses as protected OpenTox resources. Bioclipse was extended to support the OpenTox authentication, allowing the OpenTox servers to properly authorize the user access to particular web services and data sets. The OpenTox account information is registered with Bioclipse' keyring system, centralizing logging in and out onto remote services, providing the graphical user interface for adding a new OpenTox account and to log in and out. The corresponding script commands for the authentication are given in Authentication category in Table 1. Interested people can create a free account at http://www.opentox.org/join_form.
We have described here an interoperability advance, enabling users to interactively explore and evaluate the toxicity properties of molecules based on a semantic web approach to toxicology resources. The integration into Bioclipse makes various components of the OpenTox platform available to the user, both via the graphical user interface as well as via the Bioclipse Scripting Language. The Bioclipse-OpenTox plugin makes it possible to upload data sets to and download them from any OpenTox server, calculate molecular descriptors, and apply predictive toxicology models on molecular structures. All functionality has support for user authentication using the OpenTox-adopted OpenSSO technology. Other components of OpenTox, like model building and validation, have not been added yet, as Bioclipse currently does not have a clear GUI for such functionality yet. Such functionality is being worked on, but outside the scope of this report. The presented aspects make this integration fairly unique; creating a solution which is capable of dynamically discovering new services in the OpenTox network when it starts, which differentiates the software from specialized software like ToxTree and the OECD QSAR ToolBox. These tools aggregate several predictive models, but need to be updated manually by the developers for each new model. However, it is noted that these tools can also be extended to support the OpenTox platform. An added value is that updates to computational modules are only done on the server side, so that the client software (Bioclipse) does not need to be updated; a feature in common with web-based solutions like ToxPredict . The scripting functionality makes it easy to automate data workflows as do workflow applications such as Taverna  and KNIME (http://knime.org), but the combination with the rich Bioclipse user interface makes it possible at the same time to work with OpenTox interactively. The calculation results are cached by the OpenTox dataset service, allowing to avoid time consuming processing if the same calculation on the same dataset is requested more than once. Users of the integrated Bioclipse-OpenTox environment do not, therefore, need to care about the performance on their own computer, though we are also exploring the options to have Bioclipse itself run an OpenTox server. The latter is technically possible, and would convert the integrated platform into a standalone application that does not require web access.
From a technological perspective, the Bioclipse-OpenTox integration relies on semantic web technologies, which are seeing significant adoption in other areas of the life sciences too, including drug discovery, text mining, and neurosciences [33–35]. The OpenTox platform demonstrated the provision of a simple but well-defined and consistent ontology for the interaction with their services, providing functionality for both service discovery and service invocation. The SADI framework is the only known semantic alternative , but does currently not provide the same level of computational toxicology services as OpenTox does. However, while the integration is greatly simplified and semantically defines what services are available and do, the used technologies do neither solve the problem of the chemical validity of the molecular structures that are sent around, nor does it semantically define and specify in detail how to interpret the computational results of toxicity predictions. The first problem refers to the problem that even with explicit meaning we can make incorrect claims. For example, we can always define a triple stating that :water :isToxicAtLowConcentrationsTo :human, by using ontologies for all aspects, but that would not make it true. Semantic technologies are not about correctness. Instead, they make it much easier to find inconsistencies between knowledge bases. The same argument applies to semantically marked up molecular structures and other data passed between Bioclipse and the OpenTox cloud (cf. Figure 1).
An example of the second problem is that various services can indicate that a compound is mutagenic or carcinogenic, but express that statement in different ways. One service may return a binary yes/no answer, while another returns a more detailed answer, such as for which cell line or organism the prediction is made. Such semantic integration is currently outside the scope of this Bioclipse-OpenTox interoperability, but it is not a problem unique to our approach either.
To address these issues, the community needs to develop better capabilities to link automatically and reliably the various concepts in toxicology, such as links between chemical names and structures and links to toxicities based on current biological knowledge on effects, targets and pathways. The platform is ready for such semantic integration, but the community needs to develop a common language, which will be enabled through the creation of a public set of linked, harmonized and interoperable ontologies satisfying the predictive toxicology use cases of the future, supporting an integrated data analysis.
Availability and requirements
Project name: Bioclipe-OpenTox
Project home page: http://www.bioclipse.net/opentox/
Operating system(s): Platform independent
Programming language: Java
Other requirements: Java 6 or higher
License: Eclipse Public License
Any restrictions to use by non-academics: None
Authorization and Authentication
Application Programming Interface
Bioclipse Scripting Language
Chemistry Development Kit
Seventh Framework Programme
Graphical User Interface
Organisation for Economic Co-operation and Development
Quantitative Structure-Activity Relationship
Resource Description Framework
Registration, Evaluation, Authorisation and Restriction of Chemical substances
Representational State Transfer
SPARQL Protocol and RDF Query Language
Uniform Resource Identifier.
Diderichs R: Tools for Category Formation and Read-Across: Overview of the OECD (Q)SAR Application Toolbox. 2010, The Royal Society of Chemistry, 385-407. chap 16
QSAR ToolBox. [http://www.qsartoolbox.org/]
Patlewicz G, Jeliazkova N, Safford RJ, Worth AP, Aleksiev B: An evaluation of the implementation of the Cramer classification scheme in the Toxtree software. SAR and QSAR in environmental research. 2008, 19 (5-6): 495-524. 10.1080/10629360802083871.
Spjuth O, Helmus T, Willighagen E, Kuhn S, Eklund M, Wagener J, Rust PM, Steinbeck C, Wikberg J: Bioclipse: An open source workbench for chemo- and bioinformatics. BMC Bioinformatics. 2007, 8:
Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, Mäsak C, Torrance G, Wagener J, Willighagen E, Steinbeck C, Wikberg J: Bioclipse 2: A scriptable integration platform for the life sciences. BMC Bioinformatics. 2009, 10: 397-10.1186/1471-2105-10-397.
Spjuth O, Eklund M, Ahlberg Helgee E, Boyer S, Carlsson L: Integrated Decision Support for Assessing Chemical Liabilities. Journal of Chemical Information and Modeling. 2011, 51 (8): 1840-1847. 10.1021/ci200242c.
Hardy B, Douglas N, Helma C, Rautenberg M, Jeliazkova N, Jeliazkov V, Nikolova I, Benigni R, Tcheremenskaia O, Kramer S, Girschick T, Buchwald F, Wicker J, Karwath A, Gutlein M, Maunz A, Sarimveis H, Melagraki G, Afantitis A, Sopasakis P, Gallagher D, Poroikov V, Filimonov D, Zakharov A, Lagunin A, Gloriozova Ta, Novikov S, Skvortsova N, Druzhilovsky D, Chawla S, Ghosh I, Ray S, Patel H, Escher S: Collaborative development of predictive toxicology applications. Journal of Cheminformatics. 2010, 2: 7-10.1186/1758-2946-2-7.
Knudsen TB, Houck KA, Sipes NS, Singh AV, Judson RS, Martin MT, Weissman A, Kleinstreuer NC, Mortensen HM, Reif DM, Rabinowitz JR, Setzer RW, Richard AM, Dix DJ, Kavlock RJ: Activity profiles of 309 ToxCast™ chemicals evaluated across 292 biochemical targets. Toxicology. 2011, 282 (1-2): 1-15. 10.1016/j.tox.2010.12.010.
Schmidt U, Struck S, Gruening B, Hossbach J, Jaeger IS, Parol R, Lindequist U, Teuscher E, Preissner R: SuperToxic: a comprehensive database of toxic compounds. Nucleic Acids Research. 2009, 37 (suppl 1): D295-D299.
Kuhn M, Campillos M, Letunic I, Jensen LJJ, Bork P: A side effect resource to capture phenotypic effects of drugs. Molecular systems biology. 2010, 6 (343):
Williams-DeVane CR, Wolf MA, Richard AM: DSSTox chemical-index files for exposure-related experiments in ArrayExpress and Gene Expression Omnibus: enabling toxico-chemogenomics data linkages. Bioinformatics. 2009, 25 (5): 692-694. 10.1093/bioinformatics/btp042.
Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols. 2009, 4: 44-57.
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic acids research. 1999, 27: 29-34. 10.1093/nar/27.1.29.
Kelder T, Pico AR, Hanspers K, van Iersel MP, Evelo C, Conklin BR: Mining Biological Pathways Using WikiPathways Web Services. PLoS ONE. 2009, 4 (7): e6447+-
Rydberg P, Gloriam DE, Olsen L: The SMARTCyp cytochrome P450 metabolism prediction server. Bioinformatics. 2010, 26 (23): 2988-2989. 10.1093/bioinformatics/btq584.
European Parliament C: Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC. Tech rep. 2006, [http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32006R1907:en:NOT]
Jeliazkova N, Jeliazkov V: AMBIT RESTful web services: an implementation of the OpenTox application programming interface. Journal of Cheminformatics. 2011, 3: 18-10.1186/1758-2946-3-18.
Carroll JJ, Klyne G: Resource Description Framework (RDF): Concepts and Abstract Syntax. 2004, [http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/]
Prud'hommeaux E, Seaborne A: SPARQL Query Language for RDF. Tech rep, World-Wide-Web Consortium. 2008, [http://www.w3.org/TR/rdf-sparql-query/]
Willighagen E, Alvarsson J, Andersson A, Eklund M, Lampa S, Lapins M, Spjuth O, Wikberg J: Linking the Resource Description Framework to cheminformatics and proteochemometrics. Journal of Biomedical Semantics. 2011, 2 (Suppl 1): S6-10.1186/2041-1480-2-S1-S6.
Spjuth O, Willighagen E, Guha R, Eklund M, Wikberg J: Towards interoperable and reproducible QSAR analyses: Exchange of datasets. Journal of Cheminformatics. 2010, 2: 5-10.1186/1758-2946-2-5.
Bhagat J, Tanoh F, Nzuobontane E, Laurent T, Orlowski J, Roos M, Wolstencroft K, Aleksejevs S, Stevens R, Pettifer S, Lopez R, Goble CA: BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Research. 2010, 38 (suppl 2): W689-W694. [http://dx.doi.org/10.1093/nar/gkq394]
W3C OWL Working Group: OWL 2 Web Ontology Language Document Overview. Tech rep, W3C. 2009, [Http://www.w3.org/TR/2009/REC-owl2-overview-20091027/]
Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL: Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Current pharmaceutical design. 2006, 12 (17): 2111-2120. 10.2174/138161206777585274.
CC0 1.0 Universal Public Domain Dedication. [http://creativecommons.org/publicdomain/zero/1.0/]
ODC Public Domain Dedication and Licence 1.0. [http://www.opendatacommons.org/licenses/pddl/1-0/]
Open Database License 1.0. [http://opendatacommons.org/licenses/odbl/1.0/]
Open Data Commons Attribution License 1.0. [http://opendatacommons.org/licenses/by/1.0/]
Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D, Borkum M, Bechhofer S, Roos M, Li P, De Roure D: myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res. 2010, 38 (Suppl): W677-82.
Ideaconsult Ltd: ToxPredict. [http://toxpredict.org/]
Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004, 20 (17): 3045-3054. 10.1093/bioinformatics/bth361.
Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, Doherty D, Forsberg K, Gao Y, Kashyap V, Kinoshita J, Luciano J, Marshall MS, Ogbuji C, Rees J, Stephens S, Wong G, Wu E, Zaccagnini D, Hongsermeier T, Neumann E, Herman I, Cheung KH: Advancing translational research with the Semantic Web. BMC Bioinformatics. 2007, 8 (Suppl 3): S2-10.1186/1471-2105-8-S3-S2.
Splendiani A, Burger A, Paschke A, Romano P, Marshall M: Biomedical semantics in the Semantic Web. Journal of Biomedical Semantics. 2011, 2 (Suppl 1): S1-10.1186/2041-1480-2-S1-S1.
Willighagen EL, Brändle MP: Resource description framework technologies in chemistry. Journal of Cheminformatics. 2011, 3: 15-10.1186/1758-2946-3-15.
Chepelev L, Dumontier M: Semantic Web integration of cheminformatics resources with the SADI framework. Journal of Cheminformatics. 2011, 3: 16-10.1186/1758-2946-3-16.
This research was funded by a KoF grant from Uppsala University (KoF 07), the Swedish VR-M (04X-05957), the Swedish Cancer and Allergy Fund, the Swedish Research Council, the Swedish Fund for Research without Animal Experiments, OpenTox through the EU Seventh Framework Programme HEALTH-2007-1.3-3 (Health-F5-2008-200787), COLIPA, and ToxBank through the EU Seventh Framework Programme HEALTH-2010-4.2.9 Alternative Testing Strategies (Health-F5-2010-267042). Jonathan Alvarsson is acknowledged for his Bioclipse keyring extension which is used for the OpenTox authentication integration.
OS declares interest in Genetta Soft AB, Sweden. NJ declares interest in Ideaconsult Ltd., Bulgaria.
EW initiated the project at Uppsala University. OS and EW integrated the two platforms. NJ worked on OpenTox to improve internal consistency. BH and RG encouraged and discussed the work with co-authors and users. All authors contributed to the writing of the paper and approved the final version.
Electronic supplementary material
Additional file 2: Bioclipse Scripting Language script to calculate a molecular descriptor. Bioclipse Scripting Language script to calculate the first molecular descriptor it finds on the OpenTox server Ambit2 for two structures created from the molecular line notation format SMILES. A similar script is available from MyExperiment at http://www.myexperiment.org/workflows/1646.(JS 676 bytes)
About this article
Cite this article
Willighagen, E.L., Jeliazkova, N., Hardy, B. et al. Computational toxicology using the OpenTox application programming interface and Bioclipse. BMC Res Notes 4, 487 (2011). https://doi.org/10.1186/1756-0500-4-487