callPeaks — callPeaks • epiwraps

This is a R-based implementation of the general MACS2 strategy (Zhang et al., Genome Biology 2008), taking some freedom here and there in comparison to the original. The function is still under development, especially with respect to single-end reads, where some optimization might still be needed. For paired-end reads, the results are nearly identical with those of MACS2, with two main differences: 1) the p-values are more conservative (and arguably better calibrated) and 2) because the implementation does not rely on sliding windows, with default settings the peaks are narrower.

Usage

callPeaks(
  bam,
  ctrl = NULL,
  paired = FALSE,
  type = c("narrow", "broad"),
  nullH = c("local", "global.nb", "global.bin"),
  blacklist = NULL,
  binSize = 10L,
  fragLength = 200L,
  minPeakCount = 5L,
  minFoldChange = 1.3,
  pthres = 10^-3,
  maxSize = NULL,
  bgWindow = c(1, 5, 10) * 1000,
  pseudoCount = 1L,
  useStrand = TRUE,
  outFormat = c("custom", "narrowPeak"),
  verbose = TRUE,
  ...
)

Arguments

bam: A signal bam file
ctrl: An optional (but highly recommended) control bam file
paired: Logical, whether the reads are paired
type: The type of peaks to identify ('narrow' or 'broad').
blacklist: An optional `GRanges` of regions to be excluded (or the path to such a file). Since the blacklisted regions are removed from both the signal and control peaks, this also has an important impact on the empirical FDR (when `ctrl` is given).
binSize: Binsize used to estimate peak shift
fragLength: Fragment length. Ignored if `paired=TRUE`. This is only used for the initial candidate region identification, and sizes are adjusted after, so it doesn't need to be very precise.
minPeakCount: The minimum summit count for a region to be considered. Decreasing this can substantially increase the running time.
minFoldChange: The minimum fold-change for a region to be considered. Decreasing this can substantially increase the running time.
pthres: The p-value threshold to use
bgWindow: The windows to consider (in addition to the peak itself) for local background.
outFormat: The output format ('custom' or 'narrowPeak')
verbose: Logical; whether to output progress messages
...: Passed to bamChrChunkApply

Value

A `GRanges`

Details

`callPeaks` takes about twice as long to run as MACS2, and uses more memory. If dealing with very large files (or a very low memory system), consider increasing the number of processing chunks, for instance with `nChunks=10`.

The function uses bamChrChunkApply to obtain the coverages, and can accept any argument of that function. This means that the `mapqFilter` and bam `flgs` arguments can be used to restrict the reads used.