Quality Control
Generate QC Files for MultiQC
BISCUIT provides two methods for generating QC files that can be picked up by MultiQC. The first, a subcommand (biscuit qc
), outputs alignment-related, bisulfite strand, and some conversion-related QC metrics. The second, a bash script (QC.sh
), runs the subcommand, while also generating coverage and base-averaged conversion metrics.
Usage for biscuit qc
:
biscuit qc [-s] /path/to/my_reference.fa input.bam sample_name
Additional help can be found by running biscuit qc
on the command line or visiting the qc help page.
Usage for QC.sh
:
QC.sh \
[--single-end] \
[--vcf input.vcf] \
[--outdir output_dir] \
[--keep-tmp-files] \
[--no-cov-qc] \
/path/to/reference_assets \
/path/to/my_reference.fa \
sample_name \
input.bam
Additional help for QC.sh
can be found by running QC.sh --help
.
The bash script is available in the scripts/
directory if the source code has been downloaded, otherwise it can be downloaded from the latest release page on GitHub. Additionally, zip files with supplementary QC asset files (/path/to/reference_assets
) on the release page. These asset files are provided for hg19, hg38, and mm10. Asset files for other genome builds will need to be generated by the user. The build_biscuit_QC_assets.pl
perl script can create the files from a provided reference genome, while step-by-step instructions can be found on the FAQ page.
Filter Reads by Bisulfite Conversion
For some library preparations, incomplete conversions are enriched in a subset of reads which need to be filtered. The bsconv
subcommand transforms an input BAM into one that contains a ZN
tag (like ZN:Z:CA_R0C11,CC_R1C14,CG_R0C2,CT_R1C5
). This tag summarizes counts of retention and conversion for four different cytosine contexts CpA
, CpC
, CpG
and CpT
. The -p
flag outputs the counts in a table, instead of as a tag in the BAM file. Additionally, biscuit bsconv
can be used to filter reads with user-defined levels of cytosine retention in CpH, CpY, CpA, CpC, or CpT contexts.
biscuit bsconv /path/to/my_reference.fa input.bam
For help on available flags, run biscuit bsconv
on the command line or visiting the bsconv help page.
Validate Bisulfite Conversion Label
Sometimes, the bisulfite conversion labels in a given alignment are inaccurate, conflicting, or ambiguous. The bsstrand
subcommand summarizes these labels, given the number of C→T and G→A substitutions. As an option, it can also correct inaccurate labels.
biscuit bsstrand /path/to/my_reference.fa input.bam
returns something similar to:
Mapped reads: 2865
Unmapped reads: 44
Corrected reads: 0 (0.00%)
Strand Distribution:
strand\BS BSW (f) BSC (r)
R1 (f): 67 528
R1 (r): 692 47
R2 (f): 765 92
R2 (r): 107 567
R1 mapped to OT/OB: 114
R1 mapped to CTOT/CTOB: 1220
R2 mapped to OT/OB: 1332
R2 mapped to CTOT/CTOB: 199
Confusion counts:
orig\infer BSW (f) BSC (r) Conflict (c) Unknown (u)
BSW (f): 1483 48 11 89
BSC (r): 27 1084 6 117
Conflict (c): 0 0 0 0
Unknown (u): 0 0 0 0
If unfamiliar with the OT/OB/CTOT/CTOB definitions, see the Glossary
The inferred YD
tag gives the following
- f: OT/CTOT (BSW) strand
- r: OB/CTOB (BSC) strand
- c: conflicting strand information
- u: unintelligible strand source (unknown)
YD
is inferred based on the number of C→T (nCT
) and G→A (nGA
) observations in each read according to the following rules:
- If both
nCT
andnGA
are zero,YD = u
- If
nCT > nGA
and (nGA == 0
ormin(nGA,nCT) / max(nGA,nCT) <= 0.5
), thenYD = f
- If
nCT <= nGA
and (nCT == 0
ormin(nGA,nCT) / max(nGA,nCT) <= 0.5
), thenYD = r
- All other scenarios give
YD = c
.
The flag -y
appends nCT
(YC
tag) and nGA
(YG
tag) in the output BAM file.
For more help with bsstrand
, run biscuit bsstrand
in the terminal or visit the bsstrand help page.
When -b 1
Was Specified in Mapping
-b 1
forces the mapping to be conducted in a stranded manner. Using bsstrand
, the user can validate whether this enforcement was successful. For example,
biscuit bsstrand /path/to/my_reference.fa stranded.bam
returns
Mapped reads: 2918
Unmapped reads: 93
Corrected reads: 0 (0.00%)
Strand Distribution:
strand\BS BSW (f) BSC (r)
R1 (f): 840 0
R1 (r): 0 551
R2 (f): 0 565
R2 (r): 962 0
R1 mapped to OT/OB: 1391
R1 mapped to CTOT/CTOB: 0
R2 mapped to OT/OB: 0
R2 mapped to CTOT/CTOB: 1527
Confusion counts:
orig\infer BSW (f) BSC (r) Conflict (c) Unknown (u)
BSW (f): 715 916 9 162
BSC (r): 499 500 12 105
Conflict (c): 0 0 0 0
Unknown (u): 0 0 0 0
In this case, read 1 always mapped to the OT or OB strand and read 2 always mapped to the CTOT or CTOB strand.
Investigate Cytosine-Read Pair in Tabular Form
The cinread
subcommand can print information about cytosines in different contexts in a tab-separated format. By default the read name, whether the read comes from read 1 or read 2, the bisulfite strand, reference base, and base found in the read are printed. The printed information can be chosen using the -p
flag.
biscuit cinread /path/to/my_reference.fa input.bam
For more information about the available options, run biscuit cinread
in the terminal or visit the cinread help page.