The web site mainly includes two applications: (i) common peak discovery across spectra using SSA; (ii) differential analysis of indexed peaks and FDR correction. Detailed instructions can be found on the web site. The applications can be applied to high-throughput MS data analysis with large sample sizes. The flowchart of the application is shown in Figure 1B. Both the spectra and the metadata are uploaded as MS raw data. After data upload, the MS data are processed with SSA for peak detection.
SSA can discover MS peaks across MS data sets. At first, SSA maps spectra onto a uniform m/z axis using linear interpolation. Then, a composite spectrum is generated in a two-step process by averaging the average spectrum of each group (in order to give equal weight to each group). To detect the peak location, an area under the curve (AUC) filter is applied to the composite spectrum. Each local maximum in an AUC-filtered composite spectrum is recorded as a peak location, and the peak edges are located as well. Then, the AUC filter is applied to each spectrum to find and quantify the peaks, using the peak edges determined previously. To remove the noisy peaks (peaks that are poorly reproducible between replicates), an F-test is applied on peak signal content, with a confidence threshold of 95-99%. The threshold can be adjusted to 80-90% for large-scale protein profiling. The detected peaks are normalized with expectation-maximization (EM) algorithm [6] to determine the scale factors for peak normalization. A chi-squared(χ2) statistic is calculated for each spectrum to discard the bad spectra. There are several parameters with default values for fine-tuning SSA results: the min.peak.widths, max.peak.widths, peakWidthSteps, m.z.regions, m.z.step are used to control the locating of peaks, while the F.test.threshold and chi.sq.threshold are used to remove the noisy peaks and spectra with poor quality, respectively. Tutorials of the parameters can be found online at the web site. SSA-discovered MS peaks are shown in Figure 2A where the red dots on the composite graph indicate the peaks.
The resulting peak list is then subjected to differential test analysis with t or U test that assigns p-values to each MS peak comparing the groups. Multiple hypotheses testing of features (protein MS peaks) is addressed by the subsequent FDR analysis. The total discoveries count features with Student’s t-test or Mann-Whitney U-test p-value lower than a predefined threshold. Thresholds of single feature test p-value can be surveyed comprehensively to reveal total or false discoveries to calculate gFDR. To rank the MS peaks, the lFDR assigns significance measures to each feature. The analysis results are summarized in graph (gFDR, Figure 2B) and table (lFDR, Figure 2C) forms. The users can get FDR at different levels by manipulating the downloadable excel files.