Skip to contents

Queries the provided regions and produces M/beta and Coverage matrices and their genomic positions. Parallelized across files using threads from the "iscream.threads" option. The output of make_bsseq_mat may be used to create a BSseq object: do.call(BSseq, make_bsseq_mat(...)).

Usage

make_bsseq_mat(
  bedfiles,
  regions,
  aligner = "biscuit",
  mval = TRUE,
  merged = TRUE,
  sparse = FALSE,
  prealloc = 10000,
  nthreads = NULL
)

Arguments

bedfiles

A vector of bedfile paths

regions

A vector, data frame or GenomicRanges of genomic regions. See details.

aligner

The aligner used to produce the BED files - one of "biscuit", "bismark", "bsbolt".

mval

Whether to return M-values or beta-values with the coverage matrix. Defaults to M-value. Set mval=FALSE to get beta value matrix.

merged

Whether the input strands have been merged/collapsed

sparse

Whether to return M and coverage matrices as sparse matrices ("dgCMatrix"). Set this TRUE only for scWGBS data

prealloc

The number of rows to initialize the matrices with. If the number of methyltion loci are approximately known, this can reduce runtime as fewer resizes need to be made.

nthreads

Set number of threads to use overriding the "iscream.threads" option. See ?set_threads for more information.

Value

A named list of

  • coverage and either a beta- or M-value matrix

  • a character vector of chromosomes and numeric vector of corresponding CpG base positions

  • a character vector of the input sample names

Details

The input regions may be string vector in the form "chr:start-end" or a GRanges object. If a data frame is provided, they must have "chr", "start", and "end" columns.

Bitpacking limits

If the coverage values exceed 32,767, the upper limit of a 16-bit signed integer, it will be capped at the limit. Beta values will also be capped similarly, but any such values would be a bug in the aligner that produced the data.

Examples

bedfiles <- system.file("extdata", package = "iscream") |>
  list.files(pattern = "[a|b|c|d].bed.gz$", full.names = TRUE)
# examine the bedfiles
colnames <- c("chr", "start", "end", "beta", "coverage")
lapply(bedfiles, function(i) knitr::kable(read.table(i, col.names = colnames)))
#> [[1]]
#> 
#> 
#> |chr  | start| end| beta| coverage|
#> |:----|-----:|---:|----:|--------:|
#> |chr1 |     0|   2|  1.0|        1|
#> |chr1 |     2|   4|  1.0|        1|
#> |chr1 |     4|   6|  0.0|        2|
#> |chr1 |     6|   8|  0.0|        1|
#> |chr1 |     8|  10|  0.5|        2|
#> |chr1 |    10|  12|  1.0|        2|
#> |chr1 |    12|  14|  1.0|        3|
#> 
#> [[2]]
#> 
#> 
#> |chr  | start| end| beta| coverage|
#> |:----|-----:|---:|----:|--------:|
#> |chr1 |     0|   2|    0|        2|
#> |chr1 |     4|   6|    1|        2|
#> |chr1 |     6|   8|    1|        1|
#> |chr1 |    10|  12|    0|        2|
#> |chr1 |    12|  14|    1|        1|
#> 
#> [[3]]
#> 
#> 
#> |chr  | start| end| beta| coverage|
#> |:----|-----:|---:|----:|--------:|
#> |chr1 |     2|   4|    1|        2|
#> |chr1 |     6|   8|    0|        2|
#> |chr1 |     8|  10|    1|        1|
#> 
#> [[4]]
#> 
#> 
#> |chr  | start| end| beta| coverage|
#> |:----|-----:|---:|----:|--------:|
#> |chr1 |     0|   2|  1.0|        1|
#> |chr1 |     2|   4|  1.0|        2|
#> |chr1 |     6|   8|  0.0|        1|
#> |chr1 |     8|  10|  0.5|        2|
#> |chr1 |    12|  14|  1.0|        1|
#> 

# make a vector of regions
regions <- c("chr1:1-6", "chr1:7-10", "chr1:11-14")
make_bsseq_mat(bedfiles, regions)
#> [17:51:25.663312] [iscream::query_all] [info] Querying 3 regions from 4 bedfiles
#> 
#> [17:51:25.663737] [iscream::query_all] [info] Creating metadata vectors
#> [17:51:25.663792] [iscream::query_all] [info] 7 loci found - 9993 extra rows allocated with 0 resizes
#> [17:51:25.663798] [iscream::query_all] [info] Creating dense matrix
#> $M
#>      a b c d
#> [1,] 1 0 0 1
#> [2,] 1 0 2 2
#> [3,] 0 2 0 0
#> [4,] 0 1 0 0
#> [5,] 1 0 1 1
#> [6,] 2 0 0 0
#> [7,] 3 1 0 1
#> 
#> $Cov
#>      a b c d
#> [1,] 1 2 0 1
#> [2,] 1 0 2 2
#> [3,] 2 2 0 0
#> [4,] 1 1 2 1
#> [5,] 2 0 1 2
#> [6,] 2 2 0 0
#> [7,] 3 1 0 1
#> 
#> $pos
#> [1]  0  2  4  6  8 10 12
#> 
#> $chr
#> [1] "chr1" "chr1" "chr1" "chr1" "chr1" "chr1" "chr1"
#> 
#> $sampleNames
#> [1] "a" "b" "c" "d"
#> 
# for BSseq object run
if (FALSE) { # \dontrun{
library(bsseq)
do.call(BSseq, make_bsseq_mat(bedfiles, regions))
} # }