BISCUIT - Understand Sequencing Data with Bisulfite Conversion

Get started now View it on GitHub


BISulfite-seq CUI Toolkit (BISCUIT) is a utility suite for analyzing bulk and single-cell sodium bisulfite- or enzyme-based DNA methylation/modification data, such as WGBS, capture bisulfite sequencing, RRBS, NOMe-seq, and EM-seq. It was written to perform read alignment, DNA methylation and mutation calling, and allele specific methylation from bisulfite or bisulfite-like sequencing data.

BISCUIT was developed by Wanding Zhou while he was a member of the Shen Lab at Van Andel Institute. He now holds a faculty position at University of Pennsylvania and Children’s Hospital of Philadelphia. BISCUIT is currently maintained by Jacob Morrison (who also developed the User’s Guide website) in the Shen Lab. Current versions of BISCUIT are available at https://github.com/huishenlab/biscuit, while legacy versions are located at https://github.com/zhou-lab/biscuit.

Quick Start

In order to get started with performing analyses with BISCUIT, precompiled binaries are available for download on the BISCUIT release page. Note, binaries are only available for Linux and macOS. (See Download and Install for more information about downloading and installing BISCUIT).

The basic workflow to align and extract methylation information using BISCUIT is:

  1. Create an index of the reference genome (only needs to be done once for each reference).
  2. Align sequencing reads to the reference.
  3. Create a pileup VCF of DNA methylation and genetic information.
  4. Extract DNA methylation into BED format.

Practically, the commands to run are:

# Create index of the reference genome (only needs to be run once for each reference)
# Gzipped FASTA references can also be used
biscuit index my_reference.fa

# Align sequencing reads to the reference
# Gzipped FASTQ files can also be used
biscuit align -@ NTHREADS -R "my_rg" /path/to/my_reference.fa read1.fastq read2.fastq |
    dupsifter /path/to/my_reference.fa | samtools sort -@ NTHREADS -o my_output.bam -O BAM -
samtools index my_output.bam

# Create a pileup VCF of DNA methylation and genetic information
# Also compresses and indexes the VCF
biscuit pileup -@ NTHREADS -o my_pileup.vcf /path/to/my_reference.fa my_output.bam
bgzip -@ NTHREADS my_pileup.vcf
tabix -p vcf my_pileup.vcf.gz

# Extract DNA methylation into BED format
# Also compresses and indexes the BED
biscuit vcf2bed my_pileup.vcf.gz > my_methylation_data.bed
bgzip my_methylation_data.bed
tabix -p bed my_methylation_data.bed.gz

This basic order of commands will produce all the necessary files needed to read data into R using the R/Bioconductor companion package, biscuiteer.

An overview of all available functionalities can be found below in the Overview of Functionalities section.

Download and Install

BISCUIT is available as a precompiled binary (for macOS and Linux), as source code for compilation on your own machine, as a conda recipe, or as a Docker container.

Download Precompiled Binaries

Precompiled binaries can be found on the latest release page on GitHub. Currently, there are only precompiled binaries for the latest versions of Linux and macOS. You can also download the binaries directly from the terminal using the following one-liner:

On macOS,

curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
    grep browser_download_url | grep darwin_amd64 | cut -d '"' -f 4)
mv biscuit_* biscuit
chmod +x biscuit

On Linux,

curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
    grep browser_download_url | grep linux_amd64 | cut -d '"' -f 4)
mv biscuit_* biscuit
chmod +x biscuit

To download the scripts to generate the QC asset files, generate QC files, and flip PBAT strands post-alignment, run

# QC asset build
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
    grep browser_download_url | grep build_biscuit_QC_assets.pl | cut -d '"' -f 4

# QC bash script
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
    grep browser_download_url | grep QC.sh | cut -d '"' -f 4

# Flip PBAT strands script
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
    grep browser_download_url | grep flip_pbat_strands.sh | cut -d '"' -f 4

These commands work on both macOS and Linux.

Download Source Code and Compile

Version 1.4.0 and Newer

As of version 1.4.0, BISCUIT uses a CMake-based build system. Regardless of whether you use git or curl to download the source code, you will cmake (minimum version 3.21), zlib, ncurses, pthread, and curl installed to build BISCUIT.

The source can be retrieved with either of these two commands:

# git
git clone git@github.com:huishenlab/biscuit.git
cd biscuit

# curl
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
    grep browser_download_url | grep release-source.zip | cut -d '"' -f 4)
unzip release-source.zip
cd biscuit-release

After retrieving the source code (regardless of retrieval method), building BISCUIT proceeds as follows:

mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=../ ../
make && make install

This will create a directory called bin in top level directory of BISCUIT where the biscuit binary and the QC, asset creator, and strand-flipping scripts can be found. You can also specify a different directory to install your files (replace -DCMAKE_INSTALL_PREFIX=../ with -DCMAKE_INSTALL_PREFIX=/path/to/your/other/location). If you don’t include the -DCMAKE_INSTALL_PREFIX option, you can specify the install location via: cmake --install --prefix /path/to/your/install/location. If you don’t run the install commands, the BISCUIT binary can be found in build/src/biscuit (relative to the top level directory of BISCUIT) and the scripts can be found in the scripts/ directory.

Version 1.3.0 and Earlier

The source code for BISCUIT version 1.3.0 and earlier can be downloaded from the GitHub releases page, specifically the release-source.zip file. Compilation requires that zlib and ncurses are installed.

unzip release-source.zip
cd biscuit-release
make

The QC, asset creator, and strand-flipping scripts can be found in the scripts/ directory.

Download with Conda

Note, this requires that conda has been installed. To download with conda, run:

conda install -c bioconda biscuit

This will also install both QC.sh and build_biscuit_QC_assets.pl.

Download the Docker Container

The Docker container can be downloaded from GitHub via:

git clone git@github.com:huishenlab/sv_calling_docker.git

For more information about the docker container, see Structural Variant Calling.

Overview of Functionalities

The following list provides an overview of the different subcommands and the various functionalities provided by biscuit. You can also find much of this by typing biscuit in the terminal. Help for each subcommand can be found on the BISCUIT Subcommands page or by typing biscuit (subcommand) in the terminal.

Read Mapping

  • index Index reference genome (see Read Mapping)
  • align Map bisulfite converted short reads to reference (see Read Mapping)

BAM Operation

  • tview View read mapping in terminal with bisulfite coloring (see Visualization under the Read Mapping tab)
  • bsstrand Investigate bisulfite conversion strand label (see Quality Control under the Read Mapping tab)
  • bsconv Investigate bisulfite conversion rate (see Quality Control under the Read Mapping tab)
  • cinread Print cytosine-read pair in a long form (see Quality Control under the Read Mapping tab)

Methylation and SNP Extraction

Epi-read & Epi-allele

Other

  • version Print biscuit and library versions
  • qc Generate QC files from BAM (see Quality Control)
  • bc Extract cell barcodes from reads (see Extract Barcodes)

About the project

This package is made by the folks from Van Andel Institute with help from prior code base from the internet.

Acknowledgement

  • lib/aln was adapted from Heng Li’s BWA-mem code.
  • lib/htslib was submoduled from the htslib library.
  • lib/klib was submoduled from Heng Li’s klib.
  • This work is supported by NIH/NCI R37CA230748.

Reference

In preparation