Computational toxicology using the OpenTox application programming interface and Bioclipse
© Willighagen et al; licensee BioMed Central Ltd. 2011
Received: 20 August 2011
Accepted: 10 November 2011
Published: 10 November 2011
Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications.
This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources.
A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers.
We here report the establishment of a new interoperable platform for computational toxicology that is able to dynamically discover computational services running the latest predictive algorithms and models, while hiding technicalities by reusing a graphics-oriented workbench for the life sciences. The OECD QSAR ToolBox [1, 2] and ToxTree [3, 4] are existing softwares that aggregate predictive toxicity models, but do not integrate with other functionality easily, such as online services. Bioclipse, however, is designed to integrate local and remote functionality [5–7]. In this paper we outline how we implemented a new platform, integrating the OpenTox Open Standards  and the interactive, but scriptable Open Source workbench for the life sciences, Bioclipse. This approach makes it possible for anyone to make new computational toxicology models available to Bioclipse without the need to change the software source code.
Predictive toxicology is a field where knowledge from many sources needs to be integrated to provide a weight of evidence on the toxicity of untested chemical compounds. Typical sources of information include databases with in vivo and in vitro experimental data such as ToxCast and SuperToxic [9, 10], literature databases summarizing adverse reactions like SIDER , and computational resources based on toxicity data for other compounds including DSSTox . Importantly, this information should be visualized, preferably linked to the chemical structure of the compound, or by visualizing relevant life science data, such as gene, protein and biological pathway information [13–15] or metabolic reactions . Bioclipse was designed to provide such interactive data analysis for the life sciences.
Moreover, predictive toxicology is an advancing science, aiming to develop new alternative testing methods, satisfying the demanding risk assessment requirements of the European REACH guidance . The dynamic discovery of new toxicology-related data and computational methods is therefore of utmost scientific and practical importance. The EU FP7 OpenTox project recently developed a framework to enable the feasibility of semantic integration of such new resources .
We describe here the subsequent technological interoperation of Bioclipse and the OpenTox platform, such as implemented by the AMBIT software . This short report outlines what functionality the new combined platform provides to the toxicologist and what development is ongoing. At the core of the interoperation lies the use of the Resource Description Framework (RDF)  and related Open Standards. OpenTox uses RDF as a primary exchange format and the RDF query language SPARQL  to discover data sets, algorithms and models. Bioclipse was recently extended to support these standards , simplifying the interoperation task with OpenTox.
We outline three applications that exemplify how the various used technologies make this interoperability possible, starting with a computational toxicology example. Advantage is taken of three technologies that drive the interoperability. First, it uses the SPARQL RDF query language to discover functionality on the OpenTox network. Secondly, it uses the OpenTox web services for remote computation. Finally, all graphical user interfaces use a new Bioclipse Scripting Language (BSL)  extension to interact with OpenTox servers, allowing all interaction to be scripted and automated too.
OpenTox provides web services to calculate a descriptor value for a given molecule. Using the linked resources idea of the semantic web, the descriptors discovered via the ontology server can be invoked via Bioclipse directly. As such, OpenTox-provided descriptor calculations can be mixed with descriptor calculations local to Bioclipse, or from other remote computational services, as described before . This creates a flexible application for the integration of numerical input for statistical modeling of toxicologically relevant end points, as well as the comparison of various predictive models for a more balanced property analysis.
BSL script commands for interacting with the OpenTox platform
Lists the predictive models available from the given service.
Returns information about a particular molecular feature (property).
Returns information about a set of molecular features.
Returns information for a computational model.
Returns information for a list of computational models.
Returns information for a computational algorithm.
Returns information for a list of computational algorithms.
Returns a list of algorithms.
Returns a list of descriptor algorithms.
Returns the data sets available at the given OpenTox server.
Returns matching data sets using a free text search.
Returns matching structures based on the InChI given.
Returns matching structures based on the molecule given.
calculateDescriptor(service, descriptor, molecules)
Calculates a descriptor value for a set of molecules.
calculateDescriptor(service, descriptor, molecule)
Calculates a descriptor value for a single molecule.
predictWithModel(service, model, molecules)
Predicts modeled properties for the given list of molecules.
predictWithModel(service, model, molecule)
Predicts modeled properties for the given molecule.
Creates a new data set on an OpenTox server.
Creates a new data set on an OpenTox server and adds the given molecules.
Creates a new data set on an OpenTox server and adds a single molecule.
Adds a molecule to an existing data set.
Adds a list of molecules to an existing data set.
Deletes a data set.
downloadCompoundAsMDLMolfile(service, dataset, molecule)
Downloads a molecule from a data set as a MDL molfile.
downloadDataSetAsMDLSDfile(service, dataset, file-name)
Download a complete data set as MDL SD file and saves it to a local file in the Bioclipse workspace.
Lists the molecules in a data set.
Authenticate the user with OpenSSO and login on the OpenTox network.
Logout from the OpenTox network.
Returns a security token when Bioclipse is logged in on the OpenTox network.
This use case shows nicely how the Bioclipse-OpenTox integration takes advantage of the fact that Bioclipse has all graphical user interface (GUI) functionality matched by a scripted equivalent. The use of the BSL directly, allows interaction with the OpenTox network to be automated, combined with other Bioclipse functionality into larger workflows, and makes it easier to share procedures with others, using social scientific sites like MyExperiment . An example BSL script for calculating molecular descriptors combines OpenTox functionality with cheminformatics functionality provided by the cdk script extensions (also available as Additional file. 2):
//requires an unspecified Bioclipse development version
service = ""; http://apps.ideaconsult.net:8080/ambit2/
serviceSPARQL = ""; http://apps.ideaconsult.net:8080/ontology/
stringMat = opentox.listDescriptors(serviceSPARQL);
stringMat.getColumn("algo");//returns the descriptor services
stringMat.getColumn("desc");//returns the BO entries
descriptor = stringMat.get(1,1);
molecules = cdk.createMoleculeList();
descriptor + " - " +
opentox.calculateDescriptor(service, descriptor, molecules)
- [0.11900000274181366, 2.2190001010894775] http://apps.ideaconsult.net:8080/ambit2/algorithm/org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor
Table 1 shows an overview of the available BSL commands for uploading data to and downloading data from OpenTox servers under the heading Data exchange.
The third demonstration of Bioclipse-OpenTox interoperability is the support for accessing protected resources within the OpenTox network. Despite preferences of the authors, we acknowledge that not all scientific data will be Open Data. As such, authentication and authorization (A&A) are important features of data access. OpenTox implements both aspects, and provides web services for A&A, allowing users to log in and out of OpenTox applications, accompanied by policy-based specification of OpenTox resource access permissions. Additionally, the same mechanism is used to restrict the access to calculation procedures, allowing to expose software with commercial licenses as protected OpenTox resources. Bioclipse was extended to support the OpenTox authentication, allowing the OpenTox servers to properly authorize the user access to particular web services and data sets. The OpenTox account information is registered with Bioclipse' keyring system, centralizing logging in and out onto remote services, providing the graphical user interface for adding a new OpenTox account and to log in and out. The corresponding script commands for the authentication are given in Authentication category in Table 1. Interested people can create a free account at http://www.opentox.org/join_form.
We have described here an interoperability advance, enabling users to interactively explore and evaluate the toxicity properties of molecules based on a semantic web approach to toxicology resources. The integration into Bioclipse makes various components of the OpenTox platform available to the user, both via the graphical user interface as well as via the Bioclipse Scripting Language. The Bioclipse-OpenTox plugin makes it possible to upload data sets to and download them from any OpenTox server, calculate molecular descriptors, and apply predictive toxicology models on molecular structures. All functionality has support for user authentication using the OpenTox-adopted OpenSSO technology. Other components of OpenTox, like model building and validation, have not been added yet, as Bioclipse currently does not have a clear GUI for such functionality yet. Such functionality is being worked on, but outside the scope of this report. The presented aspects make this integration fairly unique; creating a solution which is capable of dynamically discovering new services in the OpenTox network when it starts, which differentiates the software from specialized software like ToxTree and the OECD QSAR ToolBox. These tools aggregate several predictive models, but need to be updated manually by the developers for each new model. However, it is noted that these tools can also be extended to support the OpenTox platform. An added value is that updates to computational modules are only done on the server side, so that the client software (Bioclipse) does not need to be updated; a feature in common with web-based solutions like ToxPredict . The scripting functionality makes it easy to automate data workflows as do workflow applications such as Taverna  and KNIME (http://knime.org), but the combination with the rich Bioclipse user interface makes it possible at the same time to work with OpenTox interactively. The calculation results are cached by the OpenTox dataset service, allowing to avoid time consuming processing if the same calculation on the same dataset is requested more than once. Users of the integrated Bioclipse-OpenTox environment do not, therefore, need to care about the performance on their own computer, though we are also exploring the options to have Bioclipse itself run an OpenTox server. The latter is technically possible, and would convert the integrated platform into a standalone application that does not require web access.
From a technological perspective, the Bioclipse-OpenTox integration relies on semantic web technologies, which are seeing significant adoption in other areas of the life sciences too, including drug discovery, text mining, and neurosciences [33–35]. The OpenTox platform demonstrated the provision of a simple but well-defined and consistent ontology for the interaction with their services, providing functionality for both service discovery and service invocation. The SADI framework is the only known semantic alternative , but does currently not provide the same level of computational toxicology services as OpenTox does. However, while the integration is greatly simplified and semantically defines what services are available and do, the used technologies do neither solve the problem of the chemical validity of the molecular structures that are sent around, nor does it semantically define and specify in detail how to interpret the computational results of toxicity predictions. The first problem refers to the problem that even with explicit meaning we can make incorrect claims. For example, we can always define a triple stating that :water :isToxicAtLowConcentrationsTo :human, by using ontologies for all aspects, but that would not make it true. Semantic technologies are not about correctness. Instead, they make it much easier to find inconsistencies between knowledge bases. The same argument applies to semantically marked up molecular structures and other data passed between Bioclipse and the OpenTox cloud (cf. Figure 1).
An example of the second problem is that various services can indicate that a compound is mutagenic or carcinogenic, but express that statement in different ways. One service may return a binary yes/no answer, while another returns a more detailed answer, such as for which cell line or organism the prediction is made. Such semantic integration is currently outside the scope of this Bioclipse-OpenTox interoperability, but it is not a problem unique to our approach either.
To address these issues, the community needs to develop better capabilities to link automatically and reliably the various concepts in toxicology, such as links between chemical names and structures and links to toxicities based on current biological knowledge on effects, targets and pathways. The platform is ready for such semantic integration, but the community needs to develop a common language, which will be enabled through the creation of a public set of linked, harmonized and interoperable ontologies satisfying the predictive toxicology use cases of the future, supporting an integrated data analysis.
Availability and requirements
Project name: Bioclipe-OpenTox
Project home page: http://www.bioclipse.net/opentox/
Operating system(s): Platform independent
Programming language: Java
Other requirements: Java 6 or higher
License: Eclipse Public License
Any restrictions to use by non-academics: None
List of abbreviations
Authorization and Authentication
Application Programming Interface
Bioclipse Scripting Language
Chemistry Development Kit
Seventh Framework Programme
Graphical User Interface
Organisation for Economic Co-operation and Development
Quantitative Structure-Activity Relationship
Resource Description Framework
Registration, Evaluation, Authorisation and Restriction of Chemical substances
Representational State Transfer
SPARQL Protocol and RDF Query Language
Uniform Resource Identifier.
This research was funded by a KoF grant from Uppsala University (KoF 07), the Swedish VR-M (04X-05957), the Swedish Cancer and Allergy Fund, the Swedish Research Council, the Swedish Fund for Research without Animal Experiments, OpenTox through the EU Seventh Framework Programme HEALTH-2007-1.3-3 (Health-F5-2008-200787), COLIPA, and ToxBank through the EU Seventh Framework Programme HEALTH-2010-4.2.9 Alternative Testing Strategies (Health-F5-2010-267042). Jonathan Alvarsson is acknowledged for his Bioclipse keyring extension which is used for the OpenTox authentication integration.
- Diderichs R: Tools for Category Formation and Read-Across: Overview of the OECD (Q)SAR Application Toolbox. 2010, The Royal Society of Chemistry, 385-407. chap 16Google Scholar
- QSAR ToolBox. [http://www.qsartoolbox.org/]
- Patlewicz G, Jeliazkova N, Safford RJ, Worth AP, Aleksiev B: An evaluation of the implementation of the Cramer classification scheme in the Toxtree software. SAR and QSAR in environmental research. 2008, 19 (5-6): 495-524. 10.1080/10629360802083871.PubMedView ArticleGoogle Scholar
- ToxTree. [http://toxtree.sourceforge.net/]
- Spjuth O, Helmus T, Willighagen E, Kuhn S, Eklund M, Wagener J, Rust PM, Steinbeck C, Wikberg J: Bioclipse: An open source workbench for chemo- and bioinformatics. BMC Bioinformatics. 2007, 8:Google Scholar
- Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, Mäsak C, Torrance G, Wagener J, Willighagen E, Steinbeck C, Wikberg J: Bioclipse 2: A scriptable integration platform for the life sciences. BMC Bioinformatics. 2009, 10: 397-10.1186/1471-2105-10-397.PubMedPubMed CentralView ArticleGoogle Scholar
- Spjuth O, Eklund M, Ahlberg Helgee E, Boyer S, Carlsson L: Integrated Decision Support for Assessing Chemical Liabilities. Journal of Chemical Information and Modeling. 2011, 51 (8): 1840-1847. 10.1021/ci200242c.PubMedView ArticleGoogle Scholar
- Hardy B, Douglas N, Helma C, Rautenberg M, Jeliazkova N, Jeliazkov V, Nikolova I, Benigni R, Tcheremenskaia O, Kramer S, Girschick T, Buchwald F, Wicker J, Karwath A, Gutlein M, Maunz A, Sarimveis H, Melagraki G, Afantitis A, Sopasakis P, Gallagher D, Poroikov V, Filimonov D, Zakharov A, Lagunin A, Gloriozova Ta, Novikov S, Skvortsova N, Druzhilovsky D, Chawla S, Ghosh I, Ray S, Patel H, Escher S: Collaborative development of predictive toxicology applications. Journal of Cheminformatics. 2010, 2: 7-10.1186/1758-2946-2-7.PubMedPubMed CentralView ArticleGoogle Scholar
- Knudsen TB, Houck KA, Sipes NS, Singh AV, Judson RS, Martin MT, Weissman A, Kleinstreuer NC, Mortensen HM, Reif DM, Rabinowitz JR, Setzer RW, Richard AM, Dix DJ, Kavlock RJ: Activity profiles of 309 ToxCast™ chemicals evaluated across 292 biochemical targets. Toxicology. 2011, 282 (1-2): 1-15. 10.1016/j.tox.2010.12.010.PubMedView ArticleGoogle Scholar
- Schmidt U, Struck S, Gruening B, Hossbach J, Jaeger IS, Parol R, Lindequist U, Teuscher E, Preissner R: SuperToxic: a comprehensive database of toxic compounds. Nucleic Acids Research. 2009, 37 (suppl 1): D295-D299.PubMedPubMed CentralView ArticleGoogle Scholar
- Kuhn M, Campillos M, Letunic I, Jensen LJJ, Bork P: A side effect resource to capture phenotypic effects of drugs. Molecular systems biology. 2010, 6 (343):
- Williams-DeVane CR, Wolf MA, Richard AM: DSSTox chemical-index files for exposure-related experiments in ArrayExpress and Gene Expression Omnibus: enabling toxico-chemogenomics data linkages. Bioinformatics. 2009, 25 (5): 692-694. 10.1093/bioinformatics/btp042.PubMedView ArticleGoogle Scholar
- Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols. 2009, 4: 44-57.View ArticleGoogle Scholar
- Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic acids research. 1999, 27: 29-34. 10.1093/nar/27.1.29.PubMedPubMed CentralView ArticleGoogle Scholar
- Kelder T, Pico AR, Hanspers K, van Iersel MP, Evelo C, Conklin BR: Mining Biological Pathways Using WikiPathways Web Services. PLoS ONE. 2009, 4 (7): e6447+-PubMedPubMed CentralView ArticleGoogle Scholar
- Rydberg P, Gloriam DE, Olsen L: The SMARTCyp cytochrome P450 metabolism prediction server. Bioinformatics. 2010, 26 (23): 2988-2989. 10.1093/bioinformatics/btq584.PubMedView ArticleGoogle Scholar
- European Parliament C: Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC. Tech rep. 2006, [http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32006R1907:en:NOT]Google Scholar
- Jeliazkova N, Jeliazkov V: AMBIT RESTful web services: an implementation of the OpenTox application programming interface. Journal of Cheminformatics. 2011, 3: 18-10.1186/1758-2946-3-18.PubMedPubMed CentralView ArticleGoogle Scholar
- Carroll JJ, Klyne G: Resource Description Framework (RDF): Concepts and Abstract Syntax. 2004, [http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/]Google Scholar
- Prud'hommeaux E, Seaborne A: SPARQL Query Language for RDF. Tech rep, World-Wide-Web Consortium. 2008, [http://www.w3.org/TR/rdf-sparql-query/]Google Scholar
- Willighagen E, Alvarsson J, Andersson A, Eklund M, Lampa S, Lapins M, Spjuth O, Wikberg J: Linking the Resource Description Framework to cheminformatics and proteochemometrics. Journal of Biomedical Semantics. 2011, 2 (Suppl 1): S6-10.1186/2041-1480-2-S1-S6.PubMedPubMed CentralView ArticleGoogle Scholar
- Spjuth O, Willighagen E, Guha R, Eklund M, Wikberg J: Towards interoperable and reproducible QSAR analyses: Exchange of datasets. Journal of Cheminformatics. 2010, 2: 5-10.1186/1758-2946-2-5.PubMedPubMed CentralView ArticleGoogle Scholar
- Bhagat J, Tanoh F, Nzuobontane E, Laurent T, Orlowski J, Roos M, Wolstencroft K, Aleksejevs S, Stevens R, Pettifer S, Lopez R, Goble CA: BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Research. 2010, 38 (suppl 2): W689-W694. [http://dx.doi.org/10.1093/nar/gkq394]PubMedPubMed CentralView ArticleGoogle Scholar
- W3C OWL Working Group: OWL 2 Web Ontology Language Document Overview. Tech rep, W3C. 2009, [Http://www.w3.org/TR/2009/REC-owl2-overview-20091027/]Google Scholar
- Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL: Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Current pharmaceutical design. 2006, 12 (17): 2111-2120. 10.2174/138161206777585274.PubMedView ArticleGoogle Scholar
- CC0 1.0 Universal Public Domain Dedication. [http://creativecommons.org/publicdomain/zero/1.0/]
- ODC Public Domain Dedication and Licence 1.0. [http://www.opendatacommons.org/licenses/pddl/1-0/]
- Open Database License 1.0. [http://opendatacommons.org/licenses/odbl/1.0/]
- Open Data Commons Attribution License 1.0. [http://opendatacommons.org/licenses/by/1.0/]
- Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D, Borkum M, Bechhofer S, Roos M, Li P, De Roure D: myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res. 2010, 38 (Suppl): W677-82.PubMedPubMed CentralView ArticleGoogle Scholar
- Ideaconsult Ltd: ToxPredict. [http://toxpredict.org/]
- Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004, 20 (17): 3045-3054. 10.1093/bioinformatics/bth361.PubMedView ArticleGoogle Scholar
- Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, Doherty D, Forsberg K, Gao Y, Kashyap V, Kinoshita J, Luciano J, Marshall MS, Ogbuji C, Rees J, Stephens S, Wong G, Wu E, Zaccagnini D, Hongsermeier T, Neumann E, Herman I, Cheung KH: Advancing translational research with the Semantic Web. BMC Bioinformatics. 2007, 8 (Suppl 3): S2-10.1186/1471-2105-8-S3-S2.PubMedPubMed CentralView ArticleGoogle Scholar
- Splendiani A, Burger A, Paschke A, Romano P, Marshall M: Biomedical semantics in the Semantic Web. Journal of Biomedical Semantics. 2011, 2 (Suppl 1): S1-10.1186/2041-1480-2-S1-S1.PubMedPubMed CentralView ArticleGoogle Scholar
- Willighagen EL, Brändle MP: Resource description framework technologies in chemistry. Journal of Cheminformatics. 2011, 3: 15-10.1186/1758-2946-3-15.PubMedPubMed CentralView ArticleGoogle Scholar
- Chepelev L, Dumontier M: Semantic Web integration of cheminformatics resources with the SADI framework. Journal of Cheminformatics. 2011, 3: 16-10.1186/1758-2946-3-16.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.