Skip to contents

Convenience wrapper for fast_overlap_join(..., join = "left").

Usage

fast_left_overlap_join(
  query,
  subject,
  max_gap = -1L,
  min_overlap = 0L,
  type = c("any", "start", "end", "within", "equal"),
  ignore_strand = FALSE,
  threads = fast_default_threads(),
  deterministic = TRUE,
  query_prefix = "query_",
  subject_prefix = "subject_"
)

Arguments

query

An IRanges or GRanges query object.

subject

An IRanges/GRanges object or a fast_ranges_index. Use fast_build_index(subject) when the same subject is reused across many overlap queries.

max_gap

Integer scalar controlling how far apart two ranges may be and still count as a hit.

Use -1 to require a true overlap.

Use 0 to allow touching ranges for "any" and to keep Bioconductor's default tolerance behavior for the other overlap modes.

Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly.

Units are bases. The meaning is intentionally aligned with IRanges::findOverlaps() / GenomicRanges::findOverlaps().

min_overlap

Integer scalar minimum overlap width, in bases.

0 is the least strict setting.

Larger values require wider shared overlap and therefore return fewer hits.

This argument matters only when the selected type allows an actual overlap width to be measured.

type

Character scalar describing what "match" means.

"any" matches any overlap that satisfies max_gap / min_overlap.

"start" matches ranges with compatible start coordinates.

"end" matches ranges with compatible end coordinates.

"within" matches queries contained inside subjects.

"equal" matches queries and subjects with the same interval, or with start/end differences no larger than max_gap when tolerance is allowed.

ignore_strand

Logical scalar controlling strand handling for genomic ranges.

For GRanges, FALSE means "+", "-", and "*" are interpreted using standard Bioconductor strand rules.

TRUE means strand is ignored and only genomic coordinates are compared.

For IRanges, this argument has no effect because there is no strand.

threads

Integer scalar number of worker threads to use.

Use 1 for the most conservative behavior and easiest debugging.

Use larger values on multicore machines when throughput matters.

For repeated-query workloads, combine a prebuilt index from fast_build_index(subject) with a thread count chosen empirically on your hardware.

fastRanges is optimized for large and throughput-oriented workloads. For one-off or small jobs, Bioconductor's native overlap routines may be competitive.

deterministic

Logical scalar controlling output order.

TRUE returns a stable order, which is useful for testing, reproducible reports, and direct comparison across thread counts.

FALSE allows the implementation to return hits in an unspecified order, which can be noticeably faster for large multithreaded jobs because it avoids extra global ordering work.

query_prefix

Prefix added to columns derived from query. This helps you see which output columns came from the query object.

subject_prefix

Prefix added to columns derived from subject. This helps you see which output columns came from the subject object.

Value

A data.frame overlap join result.

Overlap semantics

query is the range set you ask about. subject is the range set you compare it against.

Core interval semantics (ASCII schematic):

The middle distance is the gap. A hit is allowed when this distance is <= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.

Beginner-friendly interpretation:

type = "any" asks "do these ranges touch or overlap closely enough to count?"

type = "start" and type = "end" are useful when interval boundaries are biologically meaningful, for example transcription start or end sites.

type = "within" asks whether each query lies inside a subject interval.

type = "equal" asks whether query and subject describe the same interval, optionally with endpoint tolerance when max_gap >= 0.

This argument grammar is intentionally aligned with Bioconductor overlap APIs (IRanges / GenomicRanges).

Examples

q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3))
s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5))
fast_left_overlap_join(q, s)
#>   query_id subject_id query_start query_end query_width subject_start
#> 1        1          1           1         5           5             3
#> 2        2          2          10        13           4             9
#> 3        3          3          20        22           3            18
#>   subject_end subject_width
#> 1           6             4
#> 2          14             6
#> 3          22             5