Algorithm
The following algorithm detects equal mass shifts in overlapping peptides:
1. Let E1 be the peptide mass list from an experiment using protease A, and E2 be the peptide mass list from an experiments using protease B.
2. Let T1 be the list of theoretical peptide masses resulting from an in silico digestion using protease A, and T2 be the list of theoretical peptide masses resulting from an in silico digestion using protease B.
3. Remove from E1 all peaks corresponding to unmodified peptides in T1 and all peaks corresponding to autolytic peaks from protease A.
4. Repeat Step 3 with mass lists E2 and T2 from protease B.
5. Compare each mass ei ∈ E1 to each mass tj ∈ T1, and each mass ek ∈ E2 to each mass tm ∈ T2. Store the mass shifts (ei - tj) for all i and j and the mass shifts (ek - tm) for all k and m, in two lists M1 and M2, which now contain all possible mass shifts between corresponding experimental and theoretical data.
6. Let pj and pm be the theoretical peptides corresponding to tj and tm respectively. Compare M1 and M2 and find all pairs such that:
a. |(ei - tj) - (ek - tm)| = ω and (ω is the mass shift accuracy)
b. |(ei - tj)| > ε and |(ek - tm)| > ε and (ε is the mass shift threshold)
c. pj and pm overlap
The output is a list of overlapping peptides from E1 and E2 with equal mass shifts. The reason for the mass shifts, i.e., modification(s) or substitution(s), has to be positioned in the covered area, and have a mass equal to the detected mass shift. The list should be cross-checked against a database of known modifications and substitutions (e.g., UniMod [6, 7]), and/or the included peptides can be tested in additional experiments, i.e., by MALDI-TOF-TOF, verifying or rejecting the proposed modification or substitution.
Implementation
The described algorithm is implemented in Java [8] and available as a software tool, MassShiftFinder, at http://www.bioinfo.no/software/massShiftFinder.
The main input to MassShiftFinder is the protein sequence and the experimental masses from two PMF experiments on the same protein using different proteases. Before running the algorithm it is recommended to remove all identified peptides from the PMFs, e.g., by using MassSorter [2]. Unmodified peptides, autolytic protease peaks and known noise/contaminating peaks (e.g., keratin) can be filtered within adjustable accuracy limits in the program. Using filters limits the number of unnecessary mass shift comparisons (see additional file 1, Fig. 1 (TheoreticalExamples.pdf)).
In order to reduce search space and increase the possibility of detecting real mass shifts, the following parameters should be set to reasonable values. (i) Mass Shift Threshold, where mass shifts below this threshold are excluded to avoid spurious comparisons among very small mass shifts. We would in general recommend setting this value to 0.9 to achieve the inclusion of deamidations. (ii) Mass Shift Boundaries, determine the search limits for a mass shift being a modification or substitution. It can be set to a more limited mass range, e.g., 79–81 Da to search for phosphorylations. (iii) Mass Shift Accuracy, where equal mass shifts are recognized when the difference between two mass shifts are within this accuracy (in Da or ppm). We would in general recommend setting this parameter at 0.2 Da when 25 ppm accuracy limit is used for the experimental peptides, and to decrease it if the instrument is more exact. Note that this parameter refers to inaccuracy of the potential modification as calculated from the comparison of experimental data and the theoretical peptide sequence.
An example of output is shown in Figure 2. By selecting a row, the overlapping peptides are indicated in the protein sequence. The detected mass shifts are searched against a local version of the UniMod database. To reduce the amount of incorrect UniMod explanations, this search can be restricted by choosing the allowed modification types, e.g., amino acid substitutions, post-translational modifications, etc. Up to two modifications per peptide are supported. Note that changing the settings for the UniMod search only affects the number of suggested explanations for each mass shift, not the number of mass shifts. Unexplained mass shifts may correspond to unknown modifications or more than two modifications per peptide. An example showing detection of modifications in an artificial dataset is found in additional file 1 (TheoreticalExamples.pdf).
Experimental Example
We compared connexin43 (Cx43) [9] from three species. The experimental peak lists of Cx43 from Syrian hamster, Chinese hamster and rat were collected in MassSorter [2] using the Syrian hamster sequence as basis of comparison [10]. After removing autolytic protease peaks, peaks from the contaminating antibody and peaks in common with Syrian hamster, the remaining peaks were inserted into MassShiftFinder using the following parameters: Filter Accuracy and Unmodified Peptide Accuracy, 50 ppm (found under Edit/Preferences); Mass Shift Accuracy, 0.2 Da; Mass Shift Threshold, 0.9 Da; Mass Shift Boundaries, -200 to 200 Da; UniMod Accuracy, 0.1 Da; Missed Cleavages, 1; and including only amino acid substitutions in the search.
For Chinese hamster, MassShiftFinder pointed out a potential substitution within the area 347-IAAGHELQPL-356 with a mass shift of 17.96 Da. This would correspond to a substitution from I or L to M. The rat data also indicated a potential substitution in the same sequence with a mass shift of -14.02 Da. This could correspond to a substitution from A to G, E to D, or I or L to V. The Chinese hamster and rat peptides with m/z 1748.91 and m/z 1716.84 (corresponding to mass shifts of 17.95 Da and -14.02 Da relative to the Syrian hamster peptide with m/z 1730.96) were targeted for TOF-TOF analysis (Fig. 3). The only possible substitution in Chinese hamster that is consistent with all data is a change in position 347 from I (Syrian hamster) to M (Chinese hamster). For rat, both I347 to V and A348 to G are consistent with these data. The former is the correct alternative. This example shows that our approach can be used to narrow the range of possibilities when detecting amino acid substitutions. For more examples and details, see additional file 2 (ExperimentalExamples.pdf).