Understanding the Output BISCUIT BAM
BISCUIT follows the SAM/BAM specification for the SAM/BAM output from biscuit align
. Definitions for optional tags can be found below, along with how BISCUIT defines the TLEN column (which differs from how BWA defines it).
Useful Tags to Know
NM
Number of non-cytosine-conversion mismatches. Note, this does not match exactly with the hts-spec for the NM tag. To recreate the exact NM tag as defined by the spec, add the values in the NM and ZC tags.MD
Location of mismatches, following samtools conventions.ZC
Number of cytosine conversions.ZR
Number of cytosine retentions.AS
Best alignment score.XS
Suboptimal alignment score. This is usually equal to or less thanAS
. In rare cases, pairing could causeXS
to be greater thanAS
.RG
Read group.SA
Other parts of a chimeric primary mapping.PA
Ratio of best score to alternate score (AS/XS
). The higher the ratio, the more accurate the position.XL
Read length excluding adapter.XA
Location of suboptimal alignments.XB
Integer pair. The first integer indicates the number of suboptimal mappings in the primary/non-decoy chromosomes. The second integer in the pair indicates the number of suboptimal mappings in the ALT/decoy chromosomes. For example,10,5
means ten suboptimal alignments exist on primary/non-decoy chromosomes and five exist on ALT/decoy chromosomes.XR
Reference/chromosome annotation.YD
Bisulfite conversion strand label.f
for OT/CTOT strands (C→T from IGV)r
for OB/CTOB strands (G→A from IGV)
MC
CIGAR string for mate/next segmentMQ
Mapping quality for mate/next segmentZN
Cytosine retention and conversion. Not included by default, but can be added by runningbiscuit bsconv
. See Quality Control for more details.YC
Number of C→T observations. Not included by default, but can be added by runningbiscuit bsstrand
. See Quality Control for more details.YG
Number of G→A observations. Not included by default, but can be added by runningbiscuit bsstrand
. See Quality Control for more details.CB
Extracted cell barcode. See Barcode Extraction for more details.RX
Extracted unique molecular index (UMI). See Barcode Extraction for more details.
Insert Size
In BISCUIT, the insert size/TLEN column is given in a different way than BWA.
The insert size as defined by BISCUIT:
(right-most coordinate of reverse-mate read) - (left-most coordinate of forward-mate read)
The insert size as defined by BWA:
-(p0 - p1 + offset)
where
p0 = { left-most coordinate of read if read on forward strand }
{ right-most coordinate of read if read on reverse strand }
p1 = { left-most coordinate of mate if mate on forward strand }
{ right-most coordinate of mate if mate on reverse strand }
{ +1 if p0 > p1 }
offset = { 0 if p0 = p1 }
{ -1 if p0 < p1 }