A variety of high-throughput experimental techniques are available for studying how the protein complement of a sample (the proteome) changes under different cellular conditions, such as during disease processes. The changes observed in individual proteins, or groups of proteins, as experimental conditions vary allow researchers to begin understanding the underlying molecular mechanisms in the cell. Gel electrophoresis (GE) has been employed to study proteins for over four decades [1]. GE is frequently applied in two dimensions, whereby proteins are separated by charge followed by molecular weight [2]. More recently, the difference in-gel electrophoresis technique (DIGE) [3] has improved the relative quantification of proteins on 2-D gels. In DIGE, the whole proteomes of different samples are labelled with different fluorescent dyes, mixed and applied to a single gel, thus reducing gel to gel variability in protein migration. Despite the relative age of gel-based proteomic techniques, and recent advances in liquid chromatography-mass spectrometry (LC-MS) for protein quantification, gel-based techniques are still commonly used. For all proteomic techniques, it has been widely documented that the protocols employed can influence the results, for example introducing variability in the set of proteins detected or the estimation of their individual abundances. It is thus important to capture and report a detailed set of information (termed metadata) about how experiments were performed and analysed to allow groups to verify findings, employ similar protocols in their own labs or compare data sets generated in different experiments.
The Human Proteome Organisation - Proteomics Standards Initiative (HUPO-PSI, [4]) was created to help scientists share their data, deposit data sets in public databases and provide tools to assist other groups in performing large scale analysis of public proteomic data sets. In 2007, the PSI published the Minimum Information About a Proteomics Experiment (MIAPE) specification [5]. From this root document, a set of MIAPE modules for proteomics techniques were delivered: gel electrophoresis [6], gel image informatics [7], mass spectrometry [8], mass spectrometry informatics [9], column chromatography [10], capillary electrophoresis [11] and protein-protein or molecular interactions [12]. Each MIAPE module contains a minimal checklist of items that should be reported for the given technique. The items can be reported using plain language, for example describing specific points within the experimental protocols or the data analysis that has been performed, to allow other groups to interpret the published results without ambiguity as to how they were generated. The PSI has also developed data exchange formats, typically represented in Extensible Markup Language (XML). One of these, GelML [13], captures the data related to gel electrophoresis experiments. There are a number of public databases storing protein identification data from proteomics, including PRIDE [14], PeptideAtlas [15], Peptidome [16], the GPMDB [17] and the Swiss2DPAGE database storing GE experiments [18]. However the widely used protein identification repositories (PRIDE, PeptideAtlas etc) are primarily focussed on LC-MS studies and historically have either no GE data sets or no simple mechanism for deposition of data derived from gel-based experiments.
In this article we demonstrate how MIAPE GE (gel electrophoresis) and GI (gel informatics) compliant reports can be created easily in practice, through the MIAPE Generator tool [19], developed by ProteoRed - the Spanish network for proteomics. We have also developed a new tool, the PRIDESpotMapper, to work alongside the PRIDE Converter software [20] to enable GE studies to be captured in the PRIDE XML format and be submitted to the public PRIDE repository. The provision of both the MIAPE report and the public PRIDE record, enables other groups to download the complete data sets, including raw gel images, mass spectra and protein identifications, along with complete descriptions of the experimental protocols.
We have performed a study on the effects of salbutamol (an anabolic agent) on the proteome of rat muscle cells. Salbutamol is a type of beta2 adrenergic agonist, which is known to cause hypertrophy in muscle but the underlying molecular mechanisms are not well understood. The aims of the study are to use proteomic technologies to model changes in the development of skeletal muscle cells in vitro in the presence of salbutamol and to identify novel proteins and pathways within these cells that interact with these agents, and therefore could be potential targets for their action. DIGE was used to compare control and treated samples at 24 h and 96 h after addition of salbutamol. Gel spots with changed abundance were subjected to tandem mass spectrometry for protein identification. Bioinformatics analysis was performed using the Gene Ontology (GO) [21] and the DAVID tool [22] for determining categories of functions that appear to be enriched at the different time points.
In the supplementary material [Additional file 1], we include the protocols employed in the DIGE study, as they would be reported in a standard journal article. We have also used the ProteoRed MIAPE Generator to create MIAPE GE and GI compliant reports (described in [19]) and we use these examples to demonstrate how a standard set of materials and methods map into the MIAPE reports generated, to act as a practical guide to MIAPE for proteome scientists. We have also deposited the MS data sets and identifications in PRIDE, using the PRIDESpotMapper and PRIDE Converter, for public access and review.
Software development
The PRIDE Converter software [20] enables conversion from a variety of mass spectra and search engine file formats into the PRIDE XML format that can subsequently be used for uploading spectra and peptide/protein identifications into the PRIDE database. However, the PRIDE Converter has been designed primarily for "shotgun proteomics" experimental designs, where peptide to protein inference is performed across all input spectra, which is not well suited to gel-based studies. The software is capable of loading multiple identification files (e.g. Mascot dat files or Sequest .out files), but in its internal processing, the resulting proteins are inferred from a combined list containing all the identified peptides. For gel-based studies, typically each identification file (say one Mascot dat file) comes from a single gel spot and its identified peptides should not be combined with those from other spots. The PRIDE Converter also has no mechanism for uploading gel image coordinates, or additional information regarding protein quantification. To overcome these limitations, a custom version of the PRIDE Converter was developed by the PRIDE team, where every identified peptide was annotated with the name of the source gel spot. Simultaneously, we developed a new application called "the PRIDESpotMapper" as a complement to the PRIDE Converter for gel-based experiments. This was implemented in Java and modifies the PRIDE XML file generated using the custom PRIDE Converter, dividing the identified proteins according to the source identification file for each gel spot. Starting from a PRIDE XML file and either an XML or Excel spot map (see [Additional File 2] for the format specifications) the application ensures that records are created for each identified protein, derived from peptide identifications from each input file independently.
Once all the resulting files coming from the search engine (Mascot for this version) are joined in a single PRIDE XML using the PRIDE Converter, the execution of the PRIDESpotMapper is straightforward (Figure 1). First, either the XML or Excel spot map file should be entered. Second, the gel image can be loaded from a local file or from a URI, for example if gel images have been loaded into the ProteoRed MIAPE Generator database [19]. Third, the previously created PRIDE XML file is required. The application merges the two data files (Spot map file and PRIDE XML file) to create a new PRIDE XML file (internally called 2D PRIDE XML file), in which each spot is linked to one protein only with its corresponding peptides, alongside gel spot coordinates and relative quantification data. The file is then saved on the local drive, ready for upload to the PRIDE database.