Understanding the Output BISCUIT BAM
BISCUIT follows the SAM/BAM specification for the SAM/BAM output from biscuit align. Definitions for optional tags can be found below, along with how BISCUIT defines the TLEN column (which differs from how BWA defines it).
Useful Tags to Know
NMNumber of non-cytosine-conversion mismatches. Note, this does not match exactly with the hts-spec for the NM tag. To recreate the exact NM tag as defined by the spec, add the values in the NM and ZC tags.MDLocation of mismatches, following samtools conventions.ZCNumber of conversions (C→T for OT/CTOT reads or G→A for OB/CTOB).ZRNumber of retentions (retained C’s for OT/CTOT reads or G’s for OB/CTOB).ASBest alignment score.XSSuboptimal alignment score. This is usually equal to or less thanAS. In rare cases, pairing could causeXSto be greater thanAS.RGRead group.SAOther parts of a chimeric primary mapping.PARatio of best score to alternate score (AS/XS). The higher the ratio, the more accurate the position.XLRead length excluding adapter.XALocation of suboptimal alignments.XBInteger pair. The first integer indicates the number of suboptimal mappings in the primary/non-decoy chromosomes. The second integer in the pair indicates the number of suboptimal mappings in the ALT/decoy chromosomes. For example,10,5means ten suboptimal alignments exist on primary/non-decoy chromosomes and five exist on ALT/decoy chromosomes.XRReference/chromosome annotation.YDBisulfite conversion strand label.ffor OT/CTOT strands (C→T from IGV)rfor OB/CTOB strands (G→A from IGV)
MCCIGAR string for mate/next segmentMQMapping quality for mate/next segmentZNCytosine retention and conversion. Not included by default, but can be added by runningbiscuit bsconv. See Quality Control for more details.YCNumber of C→T observations. Not included by default, but can be added by runningbiscuit bsstrand. See Quality Control for more details.YGNumber of G→A observations. Not included by default, but can be added by runningbiscuit bsstrand. See Quality Control for more details.CBExtracted cell barcode. See Barcode Extraction for more details.RXExtracted unique molecular index (UMI). See Barcode Extraction for more details.
Insert Size
In BISCUIT, the insert size/TLEN column is given in a different way than BWA.
The insert size as defined by BISCUIT:
(right-most coordinate of reverse-mate read) - (left-most coordinate of forward-mate read)
The insert size as defined by BWA:
-(p0 - p1 + offset)
where
p0 = { left-most coordinate of read if read on forward strand }
{ right-most coordinate of read if read on reverse strand }
p1 = { left-most coordinate of mate if mate on forward strand }
{ right-most coordinate of mate if mate on reverse strand }
{ +1 if p0 > p1 }
offset = { 0 if p0 = p1 }
{ -1 if p0 < p1 }