Quality Control

Generate QC Files for MultiQC

BISCUIT provides two methods for generating QC files that can be picked up by MultiQC. The first, a subcommand (biscuit qc), outputs alignment-related, bisulfite strand, and some conversion-related QC metrics. The second, a bash script (QC.sh), runs the subcommand, while also generating coverage and base-averaged conversion metrics.

Usage for biscuit qc:

biscuit qc [-s] /path/to/my_reference.fa input.bam sample_name

Additional help can be found by running biscuit qc on the command line or visiting the qc help page.

Usage for QC.sh:

QC.sh \
    [--single-end] \
    [--vcf input.vcf] \
    [--outdir output_dir] \
    [--keep-tmp-files] \
    [--no-cov-qc] \
    /path/to/reference_assets \
    /path/to/my_reference.fa \
    sample_name \
    input.bam

Additional help for QC.sh can be found by running QC.sh --help.

The bash script is available in the scripts/ directory if the source code has been downloaded, otherwise it can be downloaded from the latest release page on GitHub. Additionally, zip files with supplementary QC asset files (/path/to/reference_assets) on the release page. These asset files are provided for hg19, hg38, and mm10. Asset files for other genome builds will need to be generated by the user. The build_biscuit_QC_assets.pl perl script can create the files from a provided reference genome, while step-by-step instructions can be found on the FAQ page.

Filter Reads by Bisulfite Conversion

For some library preparations, incomplete conversions are enriched in a subset of reads which need to be filtered. The bsconv subcommand transforms an input BAM into one that contains a ZN tag (like ZN:Z:CA_R0C11,CC_R1C14,CG_R0C2,CT_R1C5). This tag summarizes counts of retention and conversion for four different cytosine contexts CpA, CpC, CpG and CpT. The -p flag outputs the counts in a table, instead of as a tag in the BAM file. Additionally, biscuit bsconv can be used to filter reads with user-defined levels of cytosine retention in CpH, CpY, CpA, CpC, or CpT contexts.

biscuit bsconv /path/to/my_reference.fa input.bam

For help on available flags, run biscuit bsconv on the command line or visiting the bsconv help page.

Validate Bisulfite Conversion Label

Sometimes, the bisulfite conversion labels in a given alignment are inaccurate, conflicting, or ambiguous. The bsstrand subcommand summarizes these labels, given the number of C→T and G→A substitutions. As an option, it can also correct inaccurate labels.

biscuit bsstrand /path/to/my_reference.fa input.bam

returns something similar to:

Mapped reads: 2865
Unmapped reads: 44
Corrected reads: 0 (0.00%)

Strand Distribution:
strand\BS      BSW (f)      BSC (r)
     R1 (f):   67           528
     R1 (r):   692          47
     R2 (f):   765          92
     R2 (r):   107          567


R1 mapped to OT/OB:   114
R1 mapped to CTOT/CTOB: 1220
R2 mapped to OT/OB:   1332
R2 mapped to CTOT/CTOB: 199

Confusion counts:
orig\infer      BSW (f)      BSC (r)      Conflict (c) Unknown (u)
     BSW (f):   1483         48           11           89
     BSC (r):   27           1084         6            117
Conflict (c):   0            0            0            0
 Unknown (u):   0            0            0            0

If unfamiliar with the OT/OB/CTOT/CTOB definitions, see the Glossary

The inferred YD tag gives the following

  • f: OT/CTOT (BSW) strand
  • r: OB/CTOB (BSC) strand
  • c: conflicting strand information
  • u: unintelligible strand source (unknown)

YD is inferred based on the number of C→T (nCT) and G→A (nGA) observations in each read according to the following rules:

  • If both nCT and nGA are zero, YD = u
  • If nCT > nGA and (nGA == 0 or min(nGA,nCT) / max(nGA,nCT) <= 0.5), then YD = f
  • If nCT <= nGA and (nCT == 0 or min(nGA,nCT) / max(nGA,nCT) <= 0.5), then YD = r
  • All other scenarios give YD = c.

The flag -y appends nCT (YC tag) and nGA (YG tag) in the output BAM file.

For more help with bsstrand, run biscuit bsstrand in the terminal or visit the bsstrand help page.

When -b 1 Was Specified in Mapping

-b 1 forces the mapping to be conducted in a stranded manner. Using bsstrand, the user can validate whether this enforcement was successful. For example,

biscuit bsstrand /path/to/my_reference.fa stranded.bam

returns

Mapped reads: 2918
Unmapped reads: 93
Corrected reads: 0 (0.00%)

Strand Distribution:
strand\BS      BSW (f)      BSC (r)
     R1 (f):   840          0
     R1 (r):   0            551
     R2 (f):   0            565
     R2 (r):   962          0


R1 mapped to OT/OB:   1391
R1 mapped to CTOT/CTOB: 0
R2 mapped to OT/OB:   0
R2 mapped to CTOT/CTOB: 1527

Confusion counts:
orig\infer      BSW (f)      BSC (r)      Conflict (c) Unknown (u)
     BSW (f):   715          916          9            162
     BSC (r):   499          500          12           105
Conflict (c):   0            0            0            0
 Unknown (u):   0            0            0            0

In this case, read 1 always mapped to the OT or OB strand and read 2 always mapped to the CTOT or CTOB strand.

Investigate Cytosine-Read Pair in Tabular Form

The cinread subcommand can print information about cytosines in different contexts in a tab-separated format. By default the read name, whether the read comes from read 1 or read 2, the bisulfite strand, reference base, and base found in the read are printed. The printed information can be chosen using the -p flag.

biscuit cinread /path/to/my_reference.fa input.bam

For more information about the available options, run biscuit cinread in the terminal or visit the cinread help page.