Query lines from a tabixed bedfile
Usage
tabix(
bedfiles,
regions,
aligner = "biscuit",
col.names = NULL,
raw = FALSE,
nthreads = NULL
)
Arguments
- bedfiles
The bedfiles to be queried
- regions
A vector, data frame or GenomicRanges of genomic regions. See details.
- aligner
The aligner used to produce the BED files - one of "biscuit", "bismark", "bsbolt". Will set the result data.table's column names based on this argument.
- col.names
A vector of column names for the result data.table. Set if your bedfile is not from the supported aligners or is a general bedfile.
- raw
Set true to give a named list of raw strings from the regions in the style of
Rsamtools::scanTabix
instead of a data.table- nthreads
Set number of threads to use overriding the
"iscream.threads"
option. See?set_threads
for more information.
Details
Query method
'iscream has two methods to query records from BED files:
the tabix shell executable: fast since its output can be redirected to a file (which
data.table::fread()
can then read) instead of having to allocate memory and store it during the queryiscream's tabix implementation, based on the tabix executable using htslib, but slower on large queries since it stores the records as they are found instead of writing to a file. However it's able to store each regions records independently instead of in a single file and is used in
query_all()
andsummarize_regions()
.
When iscream is attached, it checks that the tabix executable is
available with Sys.which()
and, if available, sets options("tabix.method" = "shell")
. tabix()
then uses the tabix executable to make
queries, except if raw = TRUE
. If tabix is not found, iscream uses its
tabix implementation. To use only iscream's tabix implementation, set
options("tabix.method" = "htslib")
.
Input region formats
The input regions may be string vector in the form "chr:start-end", a
dataframe with "chr", "start" and "end" columns or a GRanges object. Input
regions must be 1-based. If the input is a GRanges, the output will also be
GRanges with any associated metadata columns (joined onto the result using
GenomicRanges::findOverlaps()
). When making GRanges
, the 0-based records
from BED-files will be converted to 1-based with
GenomicRanges::makeGRangesFromDataFrame()
. Bismark's coverage files will
not be converted as they are already 1-based and the ranges
slot will be
only one position.
Examples
bedfiles <- system.file("extdata", package = "iscream") |>
list.files(pattern = "[a|b|c|d].bed.gz$", full.names = TRUE)
regions <- c("chr1:1-6", "chr1:7-10", "chr1:11-14")
tabix(bedfiles[1], regions, col.names = c("chr", "start", "end", "beta", "coverage"))
#> chr start end beta coverage
#> <char> <int> <int> <num> <int>
#> 1: chr1 0 2 1.0 1
#> 2: chr1 2 4 1.0 1
#> 3: chr1 4 6 0.0 2
#> 4: chr1 6 8 0.0 1
#> 5: chr1 8 10 0.5 2
#> 6: chr1 10 12 1.0 2
#> 7: chr1 12 14 1.0 3