bam_read() is a multithreaded sequential BAM reader built on top of
ompBAM. The interface is designed to be familiar to users of
Rsamtools::scanBam(), GenomicAlignments::readGAlignments(), and
GenomicAlignments::readGAlignmentPairs().
Usage
bam_read(
file,
param = NULL,
what = NULL,
tag = NULL,
as = c("DataFrame", "data.frame", "GAlignments", "GAlignmentPairs", "scanBam"),
seqqual_mode = c("compatible", "compact"),
threads = 1L,
BPPARAM = NULL,
auto_threads = FALSE,
use.names = FALSE,
with.which_label = FALSE,
include_unmapped = TRUE
)Arguments
- file
A BAM input. Supported values are:
a single BAM path (
character(1)) or multiple BAM paths,
- param
Optional
Rsamtools::ScanBamParam(or a compatible list for lightweight use). The following fields are honored:mapqFilter,flag,which,what, andtag.- what
Character vector of fields to return, similar to
scanBam(what=...). Supported fields areqname,flag,rname,strand,pos,qwidth,mapq,cigar,mrnm,mpos,isize,seq,qual.- tag
Character vector of 2-letter tag names to extract.
- as
Output format:
"DataFrame": returnsS4Vectors::DataFrame(default),"data.frame": returns basedata.frame,"GAlignments": returnsGenomicAlignments::GAlignments,"GAlignmentPairs": returnsGenomicAlignments::GAlignmentPairs,"scanBam": returns ascanBam()-shaped list-of-lists.
- seqqual_mode
Controls representation of
seq/qualwhen those fields are requested:"compatible"(default): return character vectors matchingscanBam-style expectations,"compact": return raw list-columns for faster/lower-overhead extraction. This mode is currently supported foras = "data.frame"oras = "DataFrame".
- threads
Requested number of OpenMP threads used for reading/decompression. May be capped when
auto_threads = TRUE.- BPPARAM
Optional
BiocParallelparameter used whenfilecontains more than one BAM. IfNULL, files are processed serially.- auto_threads
Logical; when
TRUEandBPPARAMhas multiple workers, BamScale automatically caps per-file OpenMP threads to avoid oversubscription.- use.names
Passed to alignment object conversion. When
TRUE, read names (qname) are used as object names.- with.which_label
Logical; if
TRUEandparamincludeswhich, an extrawhich_labelcolumn is returned.- include_unmapped
Logical; whether unmapped records are retained (subject to
param$flagconstraints).
Value
If file is length 1: one object in the format specified by as.
If file has length > 1 (or is a BamFileList): a named list of outputs,
one per BAM file.
Details
bam_read() is intentionally column-compatible with common BAM fields used by
Bioconductor workflows and can be used as a fast drop-in reader before
conversion to downstream classes.
Parallelism model:
BPPARAMparallelizes across files (one file per BiocParallel worker).threadsparallelizes within each file via OpenMP.Effective total concurrency is approximately
min(length(file), BiocParallel::bpnworkers(BPPARAM)) * threads.If
auto_threads = TRUEandBPPARAMhas multiple workers, per-file OpenMP threads are set tomax(1, min(threads, floor(available_cores / workers_eff))), whereworkers_eff = min(length(file), BiocParallel::bpnworkers(BPPARAM)).
Compatibility notes:
Region filtering via
param$whichis supported as a sequential filter (not index-jump random access).Flag filtering uses
ScanBamFlagsemantics by converting logical flag requirements into required-set and required-unset bit masks.Tag values are returned as character columns. Scalar tags are scalar strings;
Btags are comma-separated vectors.seqqual_mode = "compact"is optimized for throughput-oriented benchmarking and returns raw list-columns forseq/qual."GAlignments"and"GAlignmentPairs"output exclude unmapped records.as = "scanBam"returns a strict scan-like list-of-lists: withoutparam$which, it returns one unnamed batch; withparam$which, it returns one batch per range label (including empty ranges), with requestedwhatfields andtagvalues under$tag. If Biostrings is installed,seqandqualare returned asDNAStringSetandPhredQualityfor closerscanBam()compatibility.
Examples
if (requireNamespace("ompBAM", quietly = TRUE)) {
bam <- ompBAM::example_BAM("Unsorted")
# Familiar scanBam-like field selection
x <- bam_read(bam, what = c("qname", "flag", "rname", "pos", "cigar"))
# Include sequence + quality
y <- bam_read(bam, what = c("qname", "seq", "qual"), threads = 2)
# scanBam-shaped output
z <- bam_read(bam, what = c("qname", "flag"), tag = "NM", as = "scanBam")
}