getCounts
getCounts.Rd
Creates a SummarizedExperiment of fragment (or insertion) counts from bam files that overlap given regions.
Usage
getCounts(
bam_files,
regions,
paired,
extend = 0L,
shift = 0L,
type = c("full", "center", "start", "end", "ends"),
ov.type = "any",
maxgap = -1L,
minoverlap = 1L,
ignore.strand = TRUE,
strandMode = 1,
includeDuplicates = TRUE,
includeSecondary = FALSE,
minMapq = 1L,
minFragLength = 1L,
maxFragLength = 5000L,
splitByChr = 3,
randomAcc = FALSE,
verbose = TRUE
)
Arguments
- bam_files
A vector of paths to the bam files.
- regions
A `GRanges` of regions in which to counts.
- paired
Logical; whether the data is paired (assumed unpaired by default). Use `paired="auto"` for automatic detection using the first bam file.
- extend
The amount *by* which to extend single-end reads (e.g. fragment length minus read length). If `paired=TRUE` and `type` is either 'ends' or 'center', then the extension will be applied after taking the (shifted) fragment ends or centers, resulting in ranges of width equal to `extend`.
- shift
Shift (from 3' to 5') by which reads/fragments will be shifted. If `shift` is an integer vector of length 2, the first value will represent the shift for the positive strand, and the second for the negative strand.
- type
Type of the coverage to compile. Either full (full read/fragment), start (count read/fragment start locations), end, center, or 'ends' (both ends of the read/fragment).
- ov.type
Overlap type. See the `type` argument of
link[GenomicRanges]{countOverlaps}
.- maxgap
Maximum gap allowed for overlaps (see the corresponding argument of
link[GenomicRanges]{countOverlaps}
).- minoverlap
Minimum overlap (see the corresponding argument of
link[GenomicRanges]{countOverlaps}
).- ignore.strand
Logical; whether to ignore strand for the purpose of counting overlaps (default TRUE).
- strandMode
The strandMode of the data (whether the strand is given by the first or second mate, which depends on the library prep protocol). See strandMode for more information. This parameter has no effect unless one of the `strand`, `extend` parameters or a strand-specific `shift` are used.
- includeDuplicates
Logical, whether to include reads flagged as duplicates.
- includeSecondary
Logical; whether to include secondary alignments
- minMapq
Minimum mapping quality (1 to 255)
- minFragLength
Minimum fragment length (ignored if `paired=FALSE`)
- maxFragLength
Maximum fragment length (ignored if `paired=FALSE`)
- splitByChr
Whether to process chromosomes separately, and if so by how many chunks. The should not affect the output, and is simply slightly slower and consumes less memory. Can be a logical value (in which case each chromosome is processed separately), but we instead recommend giving a positive integer indicating the number of chunks.
- randomAcc
Logical, whether to use random access. This is disabled by default because the overhead of random access to a lot of regions is typically worse than reading the entire file. However, if you need to get counts in few regions, enabling this will be faster. Note however that when using random access, the output object will not contain depth information.
- verbose
Logical; whether to print progress messages
Examples
# get an example bam file
bam <- system.file("extdata", "ex1.bam", package="Rsamtools")
# create regions of interest
peaks <- GRanges(c("seq1","seq1","seq2"), IRanges(c(400,900,500), width=100))
getCounts(bam, peaks, paired=FALSE)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'seqlevelsInUse': object 'peaks' not found