BISulfite-seq CUI Toolkit (BISCUIT) is a utility suite for analyzing bulk and single-cell sodium bisulfite- or enzyme-based DNA methylation/modification data, such as WGBS, capture bisulfite sequencing, RRBS, NOMe-seq, and EM-seq. It was written to perform read alignment, DNA methylation and mutation calling, and allele specific methylation from bisulfite or bisulfite-like sequencing data.
BISCUIT was developed by Wanding Zhou while he was a member of the Shen Lab at Van Andel Institute. He now holds a faculty position at University of Pennsylvania and Children’s Hospital of Philadelphia. BISCUIT is currently maintained by Jacob Morrison (who also developed the User’s Guide website) in the Shen Lab. Current versions of BISCUIT are available at https://github.com/huishenlab/biscuit, while legacy versions are located at https://github.com/zhou-lab/biscuit.
In order to get started with performing analyses with BISCUIT, precompiled binaries are available for download on the BISCUIT release page. Note, binaries are only available for Linux and macOS. (See Download and Install for more information about downloading and installing BISCUIT).
The basic workflow to align and extract methylation information using BISCUIT is:
- Create an index of the reference genome (only needs to be done once for each reference).
- Align sequencing reads to the reference.
- Create a pileup VCF of DNA methylation and genetic information.
- Extract DNA methylation into BED format.
Practically, the commands to run are:
# Create index of the reference genome (only needs to be run once for each reference) # Gzipped FASTA references can also be used biscuit index my_reference.fa # Align sequencing reads to the reference # Gzipped FASTQ files can also be used biscuit align -@ NTHREADS -R "my_rg" /path/to/my_reference.fa read1.fastq read2.fastq | dupsifter /path/to/my_reference.fa | samtools sort -@ NTHREADS -o my_output.bam -O BAM - samtools index my_output.bam # Create a pileup VCF of DNA methylation and genetic information # Also compresses and indexes the VCF biscuit pileup -@ NTHREADS -o my_pileup.vcf /path/to/my_reference.fa my_output.bam bgzip -@ NTHREADS my_pileup.vcf tabix -p vcf my_pileup.vcf.gz # Extract DNA methylation into BED format # Also compresses and indexes the BED biscuit vcf2bed my_pileup.vcf.gz > my_methylation_data.bed bgzip my_methylation_data.bed tabix -p bed my_methylation_data.bed.gz
This basic order of commands will produce all the necessary files needed to read data into R using the R/Bioconductor companion package, biscuiteer.
An overview of all available functionalities can be found below in the Overview of Functionalities section.
Precompiled binaries can be found on the latest release page on GitHub. Currently, there are only precompiled binaries for the latest versions of Linux and macOS. You can also download the binaries directly from the terminal using the following one-liner:
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest | grep browser_download_url | grep darwin_amd64 | cut -d '"' -f 4) mv biscuit_* biscuit chmod +x biscuit
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest | grep browser_download_url | grep linux_amd64 | cut -d '"' -f 4) mv biscuit_* biscuit chmod +x biscuit
To download the scripts to generate the QC asset files, generate QC files, and flip PBAT strands post-alignment, run
# QC asset build curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest | grep browser_download_url | grep build_biscuit_QC_assets.pl | cut -d '"' -f 4 # QC bash script curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest | grep browser_download_url | grep QC.sh | cut -d '"' -f 4 # Flip PBAT strands script curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest | grep browser_download_url | grep flip_pbat_strands.sh | cut -d '"' -f 4
These commands work on both macOS and Linux.
The source code for BISCUIT can be downloaded using either
curl. Compilation requires that
ncurses are installed.
git clone --recursive firstname.lastname@example.org:huishenlab/biscuit.git cd biscuit make
Note, after v0.2.0, if downloading via
git, make sure to use the
--recursive flag to get the submodules. If an SSH key has not been set up, and you receive a “permission denied” error, replace the first line with
git clone --recursive https://github.com/huishenlab/biscuit.git
curl -OL $(curl -s https://api.github.com/repos/huishenlab/biscuit/releases/latest | grep browser_download_url | grep release-source.zip | cut -d '"' -f 4) unzip release-source.zip cd biscuit-release make
The QC, asset creator, and strand-flipping scripts can be found in the
Note, this requires that
conda has been installed. To download with conda, run:
conda install -c bioconda biscuit
This will also install both
The Docker container can be downloaded from GitHub via:
git clone email@example.com:huishenlab/sv_calling_docker.git
For more information about the docker container, see Structural Variant Calling.
The following list provides an overview of the different subcommands and the various functionalities provided by
biscuit. You can also find much of this by typing
biscuit in the terminal. Help for each subcommand can be found on the BISCUIT Subcommands page or by typing
biscuit (subcommand) in the terminal.
indexIndex reference genome (see Read Mapping)
alignMap bisulfite converted short reads to reference (see Read Mapping)
tviewView read mapping in terminal with bisulfite coloring (see Visualization under the Read Mapping tab)
bsstrandInvestigate bisulfite conversion strand label (see Quality Control under the Read Mapping tab)
bsconvInvestigate bisulfite conversion rate (see Quality Control under the Read Mapping tab)
cinreadPrint cytosine-read pair in a long form (see Quality Control under the Read Mapping tab)
pileupGenerate standard-compliant VCF (see Read Pileup)
vcf2bedExtract mutation or methylation from VCF (see Extracting Methylation and Mutation Information)
mergecgMerge neighboring C and G in CpG context (see Extracting Methylation and Mutation Information)
epireadConvert BAM to epibed format (see Epireads and the epiBED Format)
rectangleConvert epiread format to rectangle format (see Epireads and the epiBED Format)
asmTest allele-specific methylation. (see Allele-specific Methylation)
biscuitand library versions
qcGenerate QC files from BAM (see Quality Control)
bcExtract cell barcodes from reads (see Extract Barcodes)
This package is made by the folks from Van Andel Institute with help from prior code base from the internet.
- lib/aln was adapted from Heng Li’s BWA-mem code.
- lib/htslib was submoduled from the htslib library.
- lib/klib was submoduled from Heng Li’s klib.
- This work is supported by NIH/NCI R37CA230748.