Run a function on the CpGs in bedfiles across genomic regions. Currently
supported functions are aggregate and average. Parallelized across files
using threads from the "iscream.threads"
option.
Usage
region_map(
bedfiles,
regions,
fun = "aggregate",
bismark = FALSE,
mval = TRUE,
set_region_rownames = FALSE,
nthreads = NULL
)
Arguments
- bedfiles
A vector of bedfile paths
- regions
A vector of genomic regions strings. If a named vector is provided, the names will be used in the feature column instead of the genomic regions string
- fun
Function to apply over the region. See details.
- bismark
hether the input is a bismark coverage (or BSbolt Bedgraph) file
- mval
Whether to calculate the M value (coverage \(\times \beta\)) or use the beta value when applying the function.
- set_region_rownames
Use the region strings as the returned data frame's rownames. Can be useful if you have a named vector of regions and want both the rownames and the feature names.
- nthreads
Set number of threads to use overriding the
"iscream.threads"
option. See?set_threads
for more information.
Details
Available functions:
"aggregate"
sums the values in the region with aggregated beta values ifmval =
FALSE and aggregated M values ifmval =
TRUE"average"
averages the values in the region with average beta values ifmval =
FALSE and average M values ifmval =
TRUE
Examples
bedfiles <- system.file("extdata", package = "iscream") |>
list.files(pattern = "[a|b|c|d].bed.gz$", full.names = TRUE)
# examine the bedfiles
colnames <- c("chr", "start", "end", "beta", "coverage")
lapply(bedfiles, function(i) knitr::kable(read.table(i, col.names = colnames)))
#> [[1]]
#>
#>
#> |chr | start| end| beta| coverage|
#> |:----|-----:|---:|----:|--------:|
#> |chr1 | 1| 2| 1.0| 1|
#> |chr1 | 3| 4| 1.0| 1|
#> |chr1 | 5| 6| 0.0| 2|
#> |chr1 | 7| 8| 0.0| 1|
#> |chr1 | 9| 10| 0.5| 2|
#> |chr1 | 11| 12| 1.0| 2|
#> |chr1 | 13| 14| 1.0| 3|
#>
#> [[2]]
#>
#>
#> |chr | start| end| beta| coverage|
#> |:----|-----:|---:|----:|--------:|
#> |chr1 | 1| 2| 0| 2|
#> |chr1 | 5| 6| 1| 2|
#> |chr1 | 7| 8| 1| 1|
#> |chr1 | 11| 12| 0| 2|
#> |chr1 | 13| 14| 1| 1|
#>
#> [[3]]
#>
#>
#> |chr | start| end| beta| coverage|
#> |:----|-----:|---:|----:|--------:|
#> |chr1 | 3| 4| 1| 2|
#> |chr1 | 7| 8| 0| 2|
#> |chr1 | 9| 10| 1| 1|
#>
#> [[4]]
#>
#>
#> |chr | start| end| beta| coverage|
#> |:----|-----:|---:|----:|--------:|
#> |chr1 | 1| 2| 1.0| 1|
#> |chr1 | 3| 4| 1.0| 2|
#> |chr1 | 7| 8| 0.0| 1|
#> |chr1 | 9| 10| 0.5| 2|
#> |chr1 | 13| 14| 1.0| 1|
#>
# make a vector of regions
regions <- c("chr1:1-6", "chr1:7-10", "chr1:11-14")
region_map(bedfiles, regions)
#> [16:50:07.825190] [iscream::region_map] [info] Aggregating 3 regions from 4 bedfiles
#>
#> Feature Cell coverage M
#> 1 chr1:1-6 a 4 2
#> 2 chr1:7-10 a 3 1
#> 3 chr1:11-14 a 5 5
#> 4 chr1:1-6 b 4 2
#> 5 chr1:7-10 b 1 1
#> 6 chr1:11-14 b 3 1
#> 7 chr1:1-6 c 2 2
#> 8 chr1:7-10 c 3 1
#> 9 chr1:11-14 c 0 0
#> 10 chr1:1-6 d 3 3
#> 11 chr1:7-10 d 3 1
#> 12 chr1:11-14 d 1 1
region_map(bedfiles, regions, mval = FALSE)
#> [16:50:07.827519] [iscream::region_map] [info] Aggregating 3 regions from 4 bedfiles
#>
#> Feature Cell coverage beta
#> 1 chr1:1-6 a 4 2.0
#> 2 chr1:7-10 a 3 0.5
#> 3 chr1:11-14 a 5 2.0
#> 4 chr1:1-6 b 4 1.0
#> 5 chr1:7-10 b 1 1.0
#> 6 chr1:11-14 b 3 1.0
#> 7 chr1:1-6 c 2 1.0
#> 8 chr1:7-10 c 3 1.0
#> 9 chr1:11-14 c 0 0.0
#> 10 chr1:1-6 d 3 2.0
#> 11 chr1:7-10 d 3 0.5
#> 12 chr1:11-14 d 1 1.0
region_map(bedfiles, regions, fun = "average")
#> [16:50:07.829643] [iscream::region_map] [info] Averaging 3 regions from 4 bedfiles
#>
#> Feature Cell coverage M
#> 1 chr1:1-6 a 1.333333 0.6666667
#> 2 chr1:7-10 a 1.500000 0.5000000
#> 3 chr1:11-14 a 2.500000 2.5000000
#> 4 chr1:1-6 b 2.000000 1.0000000
#> 5 chr1:7-10 b 1.000000 1.0000000
#> 6 chr1:11-14 b 1.500000 0.5000000
#> 7 chr1:1-6 c 2.000000 2.0000000
#> 8 chr1:7-10 c 1.500000 0.5000000
#> 9 chr1:11-14 c 0.000000 0.0000000
#> 10 chr1:1-6 d 1.500000 1.5000000
#> 11 chr1:7-10 d 1.500000 0.5000000
#> 12 chr1:11-14 d 1.000000 1.0000000