ProteinTracker: an application for managing protein production and purification
© Ponko and Bienvenue; licensee BioMed Central Ltd. 2012
Received: 30 January 2012
Accepted: 2 May 2012
Published: 10 May 2012
Laboratories that produce protein reagents for research and development face the challenge of deciding whether to track batch-related data using simple file based storage mechanisms (e.g. spreadsheets and notebooks), or commit the time and effort to install, configure and maintain a more complex laboratory information management system (LIMS). Managing reagent data stored in files is challenging because files are often copied, moved, and reformatted. Furthermore, there is no simple way to query the data if/when questions arise. Commercial LIMS often include additional modules that may be paid for but not actually used, and often require software expertise to truly customize them for a given environment.
This web-application allows small to medium-sized protein production groups to track data related to plasmid DNA, conditioned media samples (supes), cell lines used for expression, and purified protein information, including method of purification and quality control results. In addition, a request system was added that includes a means of prioritizing requests to help manage the high demand of protein production resources at most organizations. ProteinTracker makes extensive use of existing open-source libraries and is designed to track essential data related to the production and purification of proteins.
ProteinTracker is an open-source web-based application that provides organizations with the ability to track key data involved in the production and purification of proteins and may be modified to meet the specific needs of an organization. The source code and database setup script can be downloaded from http://sourceforge.net/projects/proteintracker. This site also contains installation instructions and a user guide. A demonstration version of the application can be viewed at http://www.proteintracker.org.
KeywordsProtein Production Purification Reagent Tracking Prioritization Web Application
Background and purpose
A challenge for any organization that produces protein reagents is tracking batch information in a format that is easily accessible to all users. Laboratory notebooks (traditional paper or electronic) should serve as the primary repository of that information. However, due to the multiple steps involved in generating a single purified protein (molecular biology, cell culture, purification), the batch data often resides in multiple places. This makes it difficult to access all of the relevant information quickly. Over time, even a relatively small group of 5–10 researchers can generate hundreds, if not thousands of pieces of information that may be used by internal or external collaborators. The current high-throughput screening approaches that are employed by proteomics labs magnify this issue.
One solution is to capture important batch information into some form of spreadsheet, which has obvious limitations in versioning, ability to query the data, and reporting. Spreadsheets typically become the intermediate step before moving towards a commercial LIMS solution, if in-house expertise is available, or external resources are brought in to customize the LIMS. Regardless of the avenue that is selected, instituting a system for tracking this information early in the organization’s history is essential, as it only becomes more difficult as the size and complexity of the data set grows.
ProteinTracker provides smaller organizations with a LIMS solution that focuses solely on the key data required for protein production and purification, and therefore should not require the same level of site-specific customization as more complex commercial LIMS solutions. The most useful aspects of implementing ProteinTracker are traceability and transparency. The end users of the protein reagents have a wealth of information at their disposal without having to search through notebooks of 2–3 individuals to get the complete details of a particular batch. Users can make their own assessment of the batch quality. Changes in protein production methods are easily related to the activity, purity or yield of the final prep, since the “chain of custody” information from expression vector, conditioned media and purification are all linked and accessible.
Additionally, ProteinTracker has been particularly useful in facilitating the work performed with external contract organizations and other collaborators. By design, samples (expression constructs, purified proteins, etc.) are assigned a unique identifier that can be used as a means of easily identifying and tracking these samples at external sites. In one situation, nearly one thousand samples of conditioned media were sent to a collaborator, and a report summarizing the information on these samples was quickly generated using the data captured in ProteinTracker. In the case of a therapeutic protein being developed for human clinical trials, ProteinTracker was used to organize nearly a hundred different expression vectors that were generated for this program, which included multiple variations of the lead molecule and associated controls for in vitro and in vivo testing. For small organizations or academic groups involved in the preclinical development of protein therapeutics, ProteinTracker can be used as an intermediate form of data organization that is typically provided by formal quality control and quality assurance groups at larger companies.
The application was primarily designed to manage the data associated with protein reagent production, but not the experimental data generated with these reagents. However, it may potentially integrate with, or complement pre-existing workflow systems that do capture experimental data.
One enhancement made to ProteinTracker was the addition of a reagent request and prioritization feature. This proved to be very useful as a means of managing the work flow within the reagent production group, as well as keeping ‘customers’ informed of the status of the proteins they had requested. This facilitated the planning of experiments, while minimizing the amount of time spent updating numerous lab staff on their particular proteins of interest.
ProteinTracker has been in continuous use at VLST for the past ~5 years and has become a valuable resource for both the reagent service group and the researchers that depend on it for key components for their experiments. The application currently manages over 9,000 records, consisting of numerous plasmids, conditioned media records, cell lines and protein batches. The extensive use of open-source libraries and the source code licensing allow academic and smaller industrial institutions the ability to use and further enhance the system free of charge for their own specific situation.
Application design overview
The logic-tier consists of Java-based object components running within the Echo3 framework. These components receive client-browser events and process them to save, update, load, or delete model objects to and from the database, send events back to the client for updating the screen, send email notifications, validate data, authenticate users, handle application errors, and perform calculations. Relational database persistence is provided using the open-source Hibernate framework . The Hibernate framework maps object components to database tables using either annotations, or XML configuration files, and supports caching and an object-based query language (HQL) that is modelled after SQL. HQL supports querying relational data using the logic-tier objects, which removes the need to keep explicit database entity references in the logic-tier.
External applications may integrate with data in ProteinTracker at either the database level, or through the external links supported by ProteinTracker that allow other applications to link directly to either reagent or request records. The link format is described in the user documentation.
Entering reagent data
Users can view previously entered construct records by selecting the ‘View DNA’ link in the navigation panel. This displays a sortable, paged table display of all construct records. The page size is selectable and clicking on any construct record opens the record. It is also possible to page through the construct records one-by-one by selecting the ‘Previous’ and ‘Next’ buttons while viewing individual construct records. This holds true for cell line, supe and protein batch records as well. Selecting the ‘Search DNA’ link displays a search screen that exposes most of the construct attributes for searching. Search results are displayed using the same table format as that used by the ‘View DNA’ option.
A PDF report of the construct details can be displayed, saved, and printed using the print button displayed on the construct record. The cell line, supe and protein batch records are also printable, with the exception that when printing supes and protein batches, any related cell lines and constructs used to generate those reagents are included in the report.
Users may submit a request for supe(s) to be generated from a transient transfection or stable cell line, request aliquots from an existing batch of purified protein, request a new batch of purified protein or request the production of plasmid DNA. Each option is available in the application navigation panel.
Once it has been submitted into the system, a request may be viewed by selecting the appropriate type under ‘View Requests’. Each screen displays all requests that are Pending, Started or Fulfilled. All requests have a default status of ‘Pending’ after submission. Once lab staff begin to work on a request, they may log in and update the appropriate request status to ‘Started’. When all work for the request has been completed, the request status is updated to ‘Fulfilled’. Each status update will trigger an email to the request submitter notifying them of the status change that also provides a link to the request. Other fields in the requests may also be edited, and any request may be printed or saved as a PDF document.
Functionality comparison and performance
ProteinTracker has some similarities to other open-source LIMS such as PiMS  and OpenFreezer  that support reagent and workflow tracking; however, it focuses specifically on the most critical data that is generated in the course of protein reagent production. It also adds the ability for users to request reagents and be notified of status changes for requested reagents. ProteinTracker does not manage experimental data or workflow around such data. Smaller laboratories with limited staff and/or funding may find it difficult to invest the time or resources in configuring and supporting a more complex LIMS that supports workflow management for experimental data. This is particularly true for smaller laboratories that happen to have multiple, or rapidly changing workflows. ProteinTracker therefore represents an intermediate solution that lies between tracking reagent data manually and a more complex LIMS that may require significant customization.
ProteinTracker runs as a web application within Apache Tomcat  or any Servlet container supporting Java Servlet specification 2.4, and requires the installation of a PostgreSQL database. Installation from the source code distribution involves initial editing of the configuration files, compilation, and deployment to the Servlet container as a Web Application Archive (WAR) file. Per session memory consumption is minimal but increases temporarily during marshalling of data for presentation to the client tier. Memory settings for an installation will depend on the maximum number of concurrent number of users and amount of data accessed; however, a heap size setting of 1–2 GB for the Java virtual machine should be adequate for most installations. Smaller laboratories may require significantly less. For example, the public demonstration web site uses a heap size of 64 MB.
Application development and testing
Testing of the application was performed using a combination of unit, functional, and user acceptance tests. A regression suite that includes the unit and functional test code written using the JUnit  test framework is included in the source distribution. The unit and functional test code verify that model objects, calculations, database access, authentication, and various screen components perform as expected when the application is modified.
The application was designed in discrete stages with each stage designed to manage one additional type of protein production-related reagent e.g. constructs, supes, cell lines, etc. At the completion of each stage, application users were given a chance to fully test the application in a test environment, with data cloned from the production database, to confirm that the functionality matched expectations before deploying updates to the production environment. No application changes were made without final user and management acceptance. The application was developed early during the companies ramping-up of protein production, so some existing data had to be backfilled and relational integrity established as each stage was developed. The benefit was that users were able to make use of the application early during the development-test-release cycle and were therefore more engaged in testing and providing feedback on desired functionality.
This application is to be used by service groups tasked with generating DNA or protein-related research reagents to support preclinical research programs. This provides the users of these reagents with a structure for both requesting new materials as well as accessing all the pertinent information relating to the manufacture of existing reagents.
At this time, no further development is planned. The open-source licensing allows other users to download the application source code and libraries, and modify or further enhance the source code for their own specific work flows.
Availability and requirements
The source code for ProteinTracker is open-source and freely available. Source code, installation instructions and a user manual are provided on the project home page listed below. Installation instructions and licensing information are also provided as part of the source code download.
Project name: ProteinTracker
Project home page: http://www.proteintracker.org
Operating system(s): Platform independent
Programming language: Java
Other requirements: Apache Tomcat or similar application server supporting Servlet specification 2.4, Java 1.6, PostgreSQL 8.3 or higher, and Apache Ant 1.7  for compilation.
License: GNU LGPL v3.0 
Any restrictions to use by non-academics: none
Availability of supporting data
The data sets supporting the results of this article are included within the article (and its Additional file 1). The data sets supporting the results of this article are also available in the SourceForge repository, http://sourceforge.net/projects/proteintracker.
Hibernate Query Language
HyperText Markup Language
Laboratory Information Management System
Object Relational Mapping
Portable Document Format
Structured Query Language
Web Application Archive.
All work was funded and performed at VLST. We thank the Protein Sciences group at VLST for critical feedback on functionality and testing of ProteinTracker, and Daniel Van Atta for providing additional test code and refactoring during the migration from the Echo2 to Echo3 framework.
- Echo Web Framework.http://echo.nextapp.com/site/echo3,
- Morris C, Pajon A, Griffiths SL, Daniel E, Savitsky M, Lin B, Diprose JM, da Silva AW, Pilicheva K, Troshin P, van Niekerk J, Isaacs N, Naismith J, Nave C, Blake R, Wilson KS, Stuart DI, Henrick K, Esnouf RM: The Protein Information Management System (PiMS): a generic tool for any structural biology research laboratory. Acta Crystallogr D Biol Crystallogr. 2011, D67: 249-260.View ArticleGoogle Scholar
- Olhovsky M, Williton K, Dai AY, Pasculescu A, Lee JP, Goudreault M, Wells CD, Park JG, Gingras AC, Linding R, Pawson T, Colwill K: OpenFreezer: a reagent information management software system. Nat Methods. 2011, 8: 612-613. 10.1038/nmeth.1658.PubMedView ArticleGoogle Scholar
- Apache Tomcat.http://tomcat.apache.org,
- Apache Ant.http://ant.apache.org,
- GNU Lesser Public License.http://www.gnu.org/licenses/lgpl.html,
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.