BISCUIT - Understand Sequencing Data with Bisulfite Conversion
Get started now View it on GitHub
BISulfite-seq CUI Toolkit (BISCUIT) is a utility suite for analyzing bulk and single-cell sodium bisulfite- or enzyme-based DNA methylation/modification data, such as WGBS, capture bisulfite sequencing, RRBS, NOMe-seq, and EM-seq. It was written to perform read alignment, DNA methylation and mutation calling, and allele specific methylation from bisulfite or bisulfite-like sequencing data.
BISCUIT was developed by Wanding Zhou while he was a member of the Shen Lab at Van Andel Institute. He now holds a faculty position at University of Pennsylvania and Children’s Hospital of Philadelphia. BISCUIT is currently maintained by Jacob Morrison (who also developed the User’s Guide website) in the Shen Lab. Current versions of BISCUIT are available at https://github.com/huishenlab/biscuit, while legacy versions are located at https://github.com/zhou-lab/biscuit.
Reference
Wanding Zhou, Benjamin K Johnson, Jacob Morrison, et al., BISCUIT: an efficient, standards-compliant tool suite for simultaneous genetic and epigenetic inference in bulk and single-cell studies, Nucleic Acids Research, Volume 52, Issue 6, 12 April 2024, Page e32, https://doi.org/10.1093/nar/gkae097
Quick Start
In order to get started with performing analyses with BISCUIT, precompiled binaries are available for download on the BISCUIT release page. Note, binaries are only available for Linux and macOS. (See Download and Install for more information about downloading and installing BISCUIT).
The basic workflow to align and extract methylation information using BISCUIT is:
- Create an index of the reference genome (only needs to be done once for each reference).
- Align sequencing reads to the reference.
- Create a pileup VCF of DNA methylation and genetic information.
- Extract DNA methylation into BED format.
Practically, the commands to run are:
# Create index of the reference genome (only needs to be run once for each reference)
# Gzipped FASTA references can also be used
biscuit index my_reference.fa
# Align sequencing reads to the reference
# Gzipped FASTQ files can also be used
biscuit align -@ NTHREADS -R "my_rg" /path/to/my_reference.fa read1.fastq read2.fastq |
dupsifter /path/to/my_reference.fa | samtools sort -@ NTHREADS -o my_output.bam -O BAM -
samtools index my_output.bam
# Create a pileup VCF of DNA methylation and genetic information
# Also compresses and indexes the VCF
biscuit pileup -@ NTHREADS -o my_pileup.vcf /path/to/my_reference.fa my_output.bam
bgzip -@ NTHREADS my_pileup.vcf
tabix -p vcf my_pileup.vcf.gz
# Extract DNA methylation into BED format
# Also compresses and indexes the BED
biscuit vcf2bed my_pileup.vcf.gz > my_methylation_data.bed
bgzip my_methylation_data.bed
tabix -p bed my_methylation_data.bed.gz
This basic order of commands will produce all the necessary files needed to read data into R using the R/Bioconductor companion package, biscuiteer.
An overview of all available functionalities can be found below in the Overview of Functionalities section.
Download and Install
BISCUIT is available as a precompiled binary (for macOS and Linux), as source code for compilation on your own machine, as a conda recipe, or as a Docker container.
Download Precompiled Binaries
Precompiled binaries can be found on the latest release page on GitHub. Currently, there are only precompiled binaries for the latest versions of Linux and macOS. You can also download the binaries directly from the terminal using the following one-liner:
On macOS,
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
grep browser_download_url | grep darwin_amd64 | cut -d '"' -f 4) --output biscuit
chmod +x biscuit
On Linux,
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
grep browser_download_url | grep linux_amd64 | cut -d '"' -f 4) --output biscuit
chmod +x biscuit
To download the scripts to generate the QC asset files, generate QC files, and flip PBAT strands post-alignment, run
# QC asset build
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
grep browser_download_url | grep build_biscuit_QC_assets.pl | cut -d '"' -f 4
# QC bash script
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
grep browser_download_url | grep QC.sh | cut -d '"' -f 4
# Flip PBAT strands script
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
grep browser_download_url | grep flip_pbat_strands.sh | cut -d '"' -f 4
These commands work on both macOS and Linux.
Download Source Code and Compile
Version 1.4.0 and Newer
As of version 1.4.0, BISCUIT uses a CMake-based build system. Regardless of whether you use git
or curl
to download the source code, you will cmake
(minimum version 3.21), zlib
, ncurses
, pthread
, and curl
installed to build BISCUIT.
The source can be retrieved with either of these two commands:
# git
git clone git@github.com:huishenlab/biscuit.git
cd biscuit
# curl
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest |
grep browser_download_url | grep release-source.zip | cut -d '"' -f 4)
unzip release-source.zip
cd biscuit-release
After retrieving the source code (regardless of retrieval method), building BISCUIT proceeds as follows:
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=../ ../
make && make install
This will create a directory called bin
in top level directory of BISCUIT where the biscuit
binary and the QC, asset creator, and strand-flipping scripts can be found. You can also specify a different directory to install your files (replace -DCMAKE_INSTALL_PREFIX=../
with -DCMAKE_INSTALL_PREFIX=/path/to/your/other/location
). If you don’t include the -DCMAKE_INSTALL_PREFIX
option, you can specify the install location via: cmake --install --prefix /path/to/your/install/location
. If you don’t run the install commands, the BISCUIT binary can be found in build/src/biscuit
(relative to the top level directory of BISCUIT) and the scripts can be found in the scripts/
directory.
Version 1.3.0 and Earlier
The source code for BISCUIT version 1.3.0 and earlier can be downloaded from the GitHub releases page, specifically the release-source.zip
file. Compilation requires that zlib
and ncurses
are installed.
unzip release-source.zip
cd biscuit-release
make
The QC, asset creator, and strand-flipping scripts can be found in the scripts/
directory.
Download with Conda
Note, this requires that conda
has been installed. To download with conda, run:
conda install -c bioconda biscuit
This will also install both QC.sh
and build_biscuit_QC_assets.pl
.
Download the Docker Container
The Docker container can be downloaded from GitHub via:
git clone git@github.com:huishenlab/sv_calling_docker.git
For more information about the docker container, see Structural Variant Calling.
Overview of Functionalities
The following list provides an overview of the different subcommands and the various functionalities provided by biscuit
. You can also find much of this by typing biscuit
in the terminal. Help for each subcommand can be found on the BISCUIT Subcommands page or by typing biscuit (subcommand)
in the terminal.
Read Mapping
index
Index reference genome (see Read Mapping)align
Map bisulfite converted short reads to reference (see Read Mapping)
BAM Operation
tview
View read mapping in terminal with bisulfite coloring (see Visualization under the Read Mapping tab)bsstrand
Investigate bisulfite conversion strand label (see Quality Control under the Read Mapping tab)bsconv
Investigate bisulfite conversion rate (see Quality Control under the Read Mapping tab)cinread
Print cytosine-read pair in a long form (see Quality Control under the Read Mapping tab)
Methylation and SNP Extraction
pileup
Generate standard-compliant VCF (see Read Pileup)vcf2bed
Extract mutation or methylation from VCF (see Extracting Methylation and Mutation Information)mergecg
Merge neighboring C and G in CpG context (see Extracting Methylation and Mutation Information)
Epi-read & Epi-allele
epiread
Convert BAM to epibed format (see Epireads and the epiBED Format)rectangle
Convert epiread format to rectangle format (see Epireads and the epiBED Format)asm
Test allele-specific methylation. (see Allele-specific Methylation)
Other
version
Printbiscuit
and library versionshelp
Print usage and exitqc
Generate QC files from BAM (see Quality Control)bc
Extract cell barcodes from reads (see Extract Barcodes)
About the project
This package is made by the folks from Van Andel Institute with help from prior code base from the internet.
Acknowledgement
- lib/aln was adapted from Heng Li’s BWA-mem code.
- This work is supported by NIH/NCI R37CA230748.