Skip to contents

The examples here show how iscream output can be converted into other data structures for further analysis.

## iscream using 1 thread by default but parallelly::availableCores() detects 4 possibly available threads. See `?set_threads` for information on multithreading before trying to use more.
data_dir <- system.file("extdata", package = "iscream")
bedfiles <- list.files(data_dir, pattern = "[a|b|c|d].bed.gz$", full.names = TRUE)
regions <- c(A = "chr1:1-6", B = "chr1:7-10", C = "chr1:11-14")

GRanges

GRanges can be used as the input to all of iscream’s querying functions. The output of tabix() and query_all() can also be turned into GRanges.

From tabix queries

tabix(bedfiles[1], regions) |> GRanges()
## GRanges object with 7 ranges and 2 metadata columns:
##       seqnames    ranges strand |      beta  coverage
##          <Rle> <IRanges>  <Rle> | <numeric> <numeric>
##   [1]     chr1       1-2      * |       1.0         1
##   [2]     chr1       3-4      * |       1.0         1
##   [3]     chr1       5-6      * |       0.0         2
##   [4]     chr1       7-8      * |       0.0         1
##   [5]     chr1      9-10      * |       0.5         2
##   [6]     chr1     11-12      * |       1.0         2
##   [7]     chr1     13-14      * |       1.0         3
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

If the input is already a GRanges object, tabix() will also return a GRanges object along with any attached metadata. tabix() uses the input findOverlaps() to join input metadata with the queried data.

gr <- GRanges(regions)
values(gr) <- DataFrame(
    gene = c("gene1", "gene2", "gene3"),
    some_metadata = c("s1", "s2", "s3")
)
gr
## GRanges object with 3 ranges and 2 metadata columns:
##     seqnames    ranges strand |        gene some_metadata
##        <Rle> <IRanges>  <Rle> | <character>   <character>
##   A     chr1       1-6      * |       gene1            s1
##   B     chr1      7-10      * |       gene2            s2
##   C     chr1     11-14      * |       gene3            s3
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
tabix(bedfiles, gr)
## GRanges object with 20 ranges and 5 metadata columns:
##        seqnames    ranges strand |      beta  coverage      sample        gene
##           <Rle> <IRanges>  <Rle> | <numeric> <numeric> <character> <character>
##    [1]     chr1       1-2      * |       1.0         1           a       gene1
##    [2]     chr1       3-4      * |       1.0         1           a       gene1
##    [3]     chr1       5-6      * |       0.0         2           a       gene1
##    [4]     chr1       7-8      * |       0.0         1           a       gene2
##    [5]     chr1      9-10      * |       0.5         2           a       gene2
##    ...      ...       ...    ... .       ...       ...         ...         ...
##   [16]     chr1       1-2      * |       1.0         1           d       gene1
##   [17]     chr1       3-4      * |       1.0         2           d       gene1
##   [18]     chr1       7-8      * |       0.0         1           d       gene2
##   [19]     chr1      9-10      * |       0.5         2           d       gene2
##   [20]     chr1     13-14      * |       1.0         1           d       gene3
##        some_metadata
##          <character>
##    [1]            s1
##    [2]            s1
##    [3]            s1
##    [4]            s2
##    [5]            s2
##    ...           ...
##   [16]            s1
##   [17]            s1
##   [18]            s2
##   [19]            s2
##   [20]            s3
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

From summarize_regions

With a named set of regions, you’ll need to set_region_rownames as the input to create the GRanges object:

(summary <- summarize_regions(
  bedfiles,
  regions,
  set_region_rownames = TRUE,
  fun = c("sum", "mean"))
)
## [18:39:33.637460] [iscream::summarize_regions] [info] Summarizing 3 regions from 4 bedfiles
## [18:39:33.637535] [iscream::summarize_regions] [info] using sum, mean
##            Feature Sample coverage.sum M.sum coverage.mean    M.mean
## chr1:1-6         A      a            4     2      1.333333 0.6666667
## chr1:7-10        B      a            3     1      1.500000 0.5000000
## chr1:11-14       C      a            5     5      2.500000 2.5000000
## chr1:1-6         A      b            4     2      2.000000 1.0000000
## chr1:7-10        B      b            1     1      1.000000 1.0000000
## chr1:11-14       C      b            3     1      1.500000 0.5000000
## chr1:1-6         A      c            2     2      2.000000 2.0000000
## chr1:7-10        B      c            3     1      1.500000 0.5000000
## chr1:11-14       C      c           NA    NA            NA        NA
## chr1:1-6         A      d            3     3      1.500000 1.5000000
## chr1:7-10        B      d            3     1      1.500000 0.5000000
## chr1:11-14       C      d            1     1      1.000000 1.0000000
GRanges(rownames(summary), summary = summary)
## GRanges object with 12 ranges and 6 metadata columns:
##        seqnames    ranges strand | summary.Feature summary.Sample
##           <Rle> <IRanges>  <Rle> |     <character>    <character>
##    [1]     chr1       1-6      * |               A              a
##    [2]     chr1      7-10      * |               B              a
##    [3]     chr1     11-14      * |               C              a
##    [4]     chr1       1-6      * |               A              b
##    [5]     chr1      7-10      * |               B              b
##    ...      ...       ...    ... .             ...            ...
##    [8]     chr1      7-10      * |               B              c
##    [9]     chr1     11-14      * |               C              c
##   [10]     chr1       1-6      * |               A              d
##   [11]     chr1      7-10      * |               B              d
##   [12]     chr1     11-14      * |               C              d
##        summary.coverage.sum summary.M.sum summary.coverage.mean summary.M.mean
##                   <numeric>     <numeric>             <numeric>      <numeric>
##    [1]                    4             2               1.33333       0.666667
##    [2]                    3             1               1.50000       0.500000
##    [3]                    5             5               2.50000       2.500000
##    [4]                    4             2               2.00000       1.000000
##    [5]                    1             1               1.00000       1.000000
##    ...                  ...           ...                   ...            ...
##    [8]                    3             1                   1.5            0.5
##    [9]                   NA            NA                    NA             NA
##   [10]                    3             3                   1.5            1.5
##   [11]                    3             1                   1.5            0.5
##   [12]                    1             1                   1.0            1.0
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

If the regions vector is not named, pass the Feature column as the GRanges input regions:

(summary <- summarize_regions(
  bedfiles,
  unname(regions),
  fun = c("sum", "mean"))
)
## [18:39:33.758934] [iscream::summarize_regions] [info] Summarizing 3 regions from 4 bedfiles
## [18:39:33.758964] [iscream::summarize_regions] [info] using sum, mean
##       Feature Sample coverage.sum M.sum coverage.mean    M.mean
## 1    chr1:1-6      a            4     2      1.333333 0.6666667
## 2   chr1:7-10      a            3     1      1.500000 0.5000000
## 3  chr1:11-14      a            5     5      2.500000 2.5000000
## 4    chr1:1-6      b            4     2      2.000000 1.0000000
## 5   chr1:7-10      b            1     1      1.000000 1.0000000
## 6  chr1:11-14      b            3     1      1.500000 0.5000000
## 7    chr1:1-6      c            2     2      2.000000 2.0000000
## 8   chr1:7-10      c            3     1      1.500000 0.5000000
## 9  chr1:11-14      c           NA    NA            NA        NA
## 10   chr1:1-6      d            3     3      1.500000 1.5000000
## 11  chr1:7-10      d            3     1      1.500000 0.5000000
## 12 chr1:11-14      d            1     1      1.000000 1.0000000
GRanges(summary$Feature, summary = summary[, -1])
## GRanges object with 12 ranges and 5 metadata columns:
##        seqnames    ranges strand | summary.Sample summary.coverage.sum
##           <Rle> <IRanges>  <Rle> |    <character>            <numeric>
##    [1]     chr1       1-6      * |              a                    4
##    [2]     chr1      7-10      * |              a                    3
##    [3]     chr1     11-14      * |              a                    5
##    [4]     chr1       1-6      * |              b                    4
##    [5]     chr1      7-10      * |              b                    1
##    ...      ...       ...    ... .            ...                  ...
##    [8]     chr1      7-10      * |              c                    3
##    [9]     chr1     11-14      * |              c                   NA
##   [10]     chr1       1-6      * |              d                    3
##   [11]     chr1      7-10      * |              d                    3
##   [12]     chr1     11-14      * |              d                    1
##        summary.M.sum summary.coverage.mean summary.M.mean
##            <numeric>             <numeric>      <numeric>
##    [1]             2               1.33333       0.666667
##    [2]             1               1.50000       0.500000
##    [3]             5               2.50000       2.500000
##    [4]             2               2.00000       1.000000
##    [5]             1               1.00000       1.000000
##    ...           ...                   ...            ...
##    [8]             1                   1.5            0.5
##    [9]            NA                    NA             NA
##   [10]             3                   1.5            1.5
##   [11]             1                   1.5            0.5
##   [12]             1                   1.0            1.0
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Making SummarizedExperiment objects

From query_all matrices

(mats <- query_all(bedfiles, regions))
## [18:39:36.173591] [iscream::query_all] [info] Querying 3 regions from 4 bedfiles
## 
## [18:39:36.174088] [iscream::query_all] [info] Creating metadata vectors
## [18:39:36.174137] [iscream::query_all] [info] nrows 10000 - 9993 extra rows allocated with 0 resizes
## [18:39:36.174143] [iscream::query_all] [info] Creating dense matrix
## $M
##      a b c d
## [1,] 1 0 0 1
## [2,] 1 0 2 2
## [3,] 0 2 0 0
## [4,] 0 1 0 0
## [5,] 1 0 1 1
## [6,] 2 0 0 0
## [7,] 3 1 0 1
## 
## $Cov
##      a b c d
## [1,] 1 2 0 1
## [2,] 1 0 2 2
## [3,] 2 2 0 0
## [4,] 1 1 2 1
## [5,] 2 0 1 2
## [6,] 2 2 0 0
## [7,] 3 1 0 1
## 
## $pos
## [1]  1  3  5  7  9 11 13
## 
## $chr
## [1] "chr1" "chr1" "chr1" "chr1" "chr1" "chr1" "chr1"
## 
## $sampleNames
## [1] "a" "b" "c" "d"
(mats.sparse <- query_all(bedfiles, regions, sparse = TRUE))
## [18:39:36.175598] [iscream::query_all] [info] Querying 3 regions from 4 bedfiles
## 
## [18:39:36.176029] [iscream::query_all] [info] Creating metadata vectors
## [18:39:36.176053] [iscream::query_all] [info] nrows 10000 - 9993 extra rows allocated with 0 resizes
## [18:39:36.176064] [iscream::query_all] [info] Creating sparse matrix
## $M
## 7 x 4 sparse Matrix of class "dgCMatrix"
##      a b c d
## [1,] 1 . . 1
## [2,] 1 . 2 2
## [3,] . 2 . .
## [4,] . 1 . .
## [5,] 1 . 1 1
## [6,] 2 . . .
## [7,] 3 1 . 1
## 
## $Cov
## 7 x 4 sparse Matrix of class "dgCMatrix"
##      a b c d
## [1,] 1 2 . 1
## [2,] 1 . 2 2
## [3,] 2 2 . .
## [4,] 1 1 2 1
## [5,] 2 . 1 2
## [6,] 2 2 . .
## [7,] 3 1 . 1
## 
## $pos
## [1]  1  3  5  7  9 11 13
## 
## $chr
## [1] "chr1" "chr1" "chr1" "chr1" "chr1" "chr1" "chr1"
## 
## $sampleNames
## [1] "a" "b" "c" "d"
gr <- GRanges(mats$chr, mats$pos)

# dense
SummarizedExperiment(assays = list(M = mats$M, Cov = mats$Cov), rowRanges = gr)
## class: RangedSummarizedExperiment 
## dim: 7 4 
## metadata(0):
## assays(2): M Cov
## rownames: NULL
## rowData names(0):
## colnames(4): a b c d
## colData names(0):
# sparse
SummarizedExperiment(
  assays = list(M = mats.sparse$M, Cov = mats.sparse$Cov),
  rowRanges = gr
)
## class: RangedSummarizedExperiment 
## dim: 7 4 
## metadata(0):
## assays(2): M Cov
## rownames: NULL
## rowData names(0):
## colnames(4): a b c d
## colData names(0):

Making BSseq objects

A BSseq object is a type of SummarizedExperiment, but it cannot handle sparse matrices:

library(bsseq)
do.call(BSseq, mats)

Session info

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SummarizedExperiment_1.36.0 Biobase_2.66.0             
##  [3] MatrixGenerics_1.18.1       matrixStats_1.5.0          
##  [5] GenomicRanges_1.58.0        GenomeInfoDb_1.42.3        
##  [7] IRanges_2.40.1              S4Vectors_0.44.0           
##  [9] BiocGenerics_0.52.0         iscream_0.0.0.9000         
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9              SparseArray_1.6.1       lattice_0.22-6         
##  [4] digest_0.6.37           evaluate_1.0.3          grid_4.4.2             
##  [7] fastmap_1.2.0           jsonlite_1.9.0          Matrix_1.7-1           
## [10] httr_1.4.7              UCSC.utils_1.2.0        textshaping_1.0.0      
## [13] jquerylib_0.1.4         abind_1.4-8             cli_3.6.4              
## [16] rlang_1.1.5             crayon_1.5.3            XVector_0.46.0         
## [19] parallelly_1.42.0       cachem_1.1.0            DelayedArray_0.32.0    
## [22] yaml_2.3.10             S4Arrays_1.6.0          tools_4.4.2            
## [25] parallel_4.4.2          GenomeInfoDbData_1.2.13 R6_2.6.1               
## [28] lifecycle_1.0.4         zlibbioc_1.52.0         fs_1.6.5               
## [31] stringfish_0.16.0       ragg_1.3.3              desc_1.4.3             
## [34] pkgdown_2.1.1           RcppParallel_5.1.10     bslib_0.9.0            
## [37] data.table_1.16.4       Rcpp_1.0.14             systemfonts_1.2.1      
## [40] xfun_0.51               knitr_1.49              htmltools_0.5.8.1      
## [43] rmarkdown_2.29          compiler_4.4.2