Data processing, peak calling and differential peak calling
We tested RepViz with public data from GEO and using available tools for peak calling and differential peak calling. Details of the sequencing data used in the examples are provided in Additional file 1: Table S1, and details of the peak caller and differential peak callers are provided in Additional file 1: Table S2. The quality of the sequencing data was assessed with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) and the fastq files were aligned against reference genome (mm10 and hg19 according to cases) with Bowtie 2 (2.2.6) [15]. The peaks were called using MACS2 (2.1.1) [16] with the parameters–broad–nomodel -q 0.05. The differential peak callers can be roughly divided in two categories: the one step methods (PePr [17], THOR [14] and diffReps [18]) that use their own peak callers and the two step method (DiffBind [19]) that requires an external peak caller. For DiffBind we used the peaks called with MACS2. The differential peak calling was done with the default settings of the software cited in Additional file 1: Table S2. To emphasize that the scope of this study is the visualization tool the differential peak callers were randomly numbered in the examples.
Results and discussion
Our R tool, RepViz, enables the user to take a snapshot of a defined genomic region with multiple data inputs and visualize it in an efficient manner. Unlike the commonly used visualization tools, it implements a replicate-driven approach, allowing user-friendly visualization of replicates within and between experimental conditions. Here we provide examples on how RepViz can aid visual inspection involved in the evaluation of outlier behavior, normalization, differential peak calling analysis and combined analysis of multiple data types. Details of the sequencing data, peak calling and differential peak calling used in the examples are provided in Additional file 1.
The first function of RepViz visualizes BAM files by presenting all the replicates on the same scale as well as their group-wise averages. This can be used to assess the similarity between the replicates within a given biological condition, or if the average signal is affected by outliers (Fig. 1b). The replicate-driven visualization is also a useful confirmatory step for normalization, enabling for instance, comparison of replicates after normalization at known house-keeping genes (Additional file 1: Fig. S1). With the current genomic browsers, this type of visualization can be a time-consuming task. For instance, IGV does not have an option to group tracks leading to the replicates being stacked on top of each other, whereas Gviz has an option to group samples together but does not allow comparing groups with a different number of grouped replicates (see Fig. 2 for more details of the comparison).
The second function of RepViz visualizes multiple BED files, which can help, for instance, to compare different peak calling software. By comparing the called peaks to the observed data for each replicate (BAM) the user can visually confirm the called features (Fig. 1b, Additional file 1: Fig. S2). For example, in the case of ChIP-seq studies, differential peak calls can be easily inspected in the light of replicate behavior, and peak calls that are driven by outliers can be detected (Fig. 1b). Additionally, the tool allows a replicate-driven inspection of the length of the called peak. This is useful because several peak callers tend to combine clusters of sharp peaks to broader peaks [11, 12]. Finally, the third function of RepViz visualizes the gene track to display the genes in the region of interest, such as gene promoters or their vicinity.
In addition to visualizing replicates within a particular data type, RepViz can visualize multiple data types (datasets) simultaneously by considering each dataset as a separate group in the input file. With multiple matched datasets, the replicate-driven visual inspection can be useful for both evaluating the quality of the samples as well as assessing the performance of the differential peak calling methods between datasets with different dynamics (Additional file 1: Fig. S3). Moreover, a combined visualization of matched histone marker and ATAC-seq data can provide replicate specific insights for the relationship of histone modification and open chromatin state (Fig. 3). Other potential applications of RepViz include, for example, the combination of chromatin marker or ATAC-seq data with eRNA [20] or non-coding RNA data to inspect replicate variability on chromatin level together with RNA expression variability at specific genomic regions. RepViz will be actively maintained and further developed.