Understanding the Output BISCUIT BAM

BISCUIT follows the SAM/BAM specification for the SAM/BAM output from biscuit align. Definitions for optional tags can be found below, along with how BISCUIT defines the TLEN column (which differs from how BWA defines it).

Useful Tags to Know

  • NM Number of non-cytosine-conversion mismatches. Note, this does not match exactly with the hts-spec for the NM tag. To recreate the exact NM tag as defined by the spec, add the values in the NM and ZC tags.
  • MD Location of mismatches, following samtools conventions.
  • ZC Number of cytosine conversions.
  • ZR Number of cytosine retentions.
  • AS Best alignment score.
  • XS Suboptimal alignment score. This is usually equal to or less than AS. In rare cases, pairing could cause XS to be greater than AS.
  • RG Read group.
  • SA Other parts of a chimeric primary mapping.
  • PA Ratio of best score to alternate score (AS/XS). The higher the ratio, the more accurate the position.
  • XL Read length excluding adapter.
  • XA Location of suboptimal alignments.
  • XB Integer pair. The first integer indicates the number of suboptimal mappings in the primary/non-decoy chromosomes. The second integer in the pair indicates the number of suboptimal mappings in the ALT/decoy chromosomes. For example, 10,5 means ten suboptimal alignments exist on primary/non-decoy chromosomes and five exist on ALT/decoy chromosomes.
  • XR Reference/chromosome annotation.
  • YD Bisulfite conversion strand label.
    • f for OT/CTOT strands (C→T from IGV)
    • r for OB/CTOB strands (G→A from IGV)
  • MC CIGAR string for mate/next segment
  • MQ Mapping quality for mate/next segment
  • ZN Cytosine retention and conversion. Not included by default, but can be added by running biscuit bsconv. See Quality Control for more details.
  • YC Number of C→T observations. Not included by default, but can be added by running biscuit bsstrand. See Quality Control for more details.
  • YG Number of G→A observations. Not included by default, but can be added by running biscuit bsstrand. See Quality Control for more details.
  • CB Extracted cell barcode. See Barcode Extraction for more details.
  • RX Extracted unique molecular index (UMI). See Barcode Extraction for more details.

Insert Size

In BISCUIT, the insert size/TLEN column is given in a different way than BWA.

The insert size as defined by BISCUIT:

(right-most coordinate of reverse-mate read) - (left-most coordinate of forward-mate read)

The insert size as defined by BWA:

-(p0 - p1 + offset)

where

p0     = { left-most  coordinate of read if read on forward strand }
         { right-most coordinate of read if read on reverse strand }

p1     = { left-most  coordinate of mate if mate on forward strand }
         { right-most coordinate of mate if mate on reverse strand }

         { +1 if p0 > p1 }
offset = {  0 if p0 = p1 }
         { -1 if p0 < p1 }