QC Metrics
The qc-metrics command generates a comprehensive quality control report from annotation outputs and, optionally, from aligned BAM files and gene count matrices. The output is an interactive HTML dashboard with all metrics and plots, plus individual TSV files compatible with MultiQC.
Available Metrics
The metrics fall into three tiers, depending on what input files you provide:
Tier 1: Annotation-Based Metrics (Always Available)
These require only the annotation parquet files:
- Read counts:total, valid, invalid, and validity rate
- Demux summary:demultiplexed vs ambiguous read counts (if barcode-corrected)
- Read-length distributions:for all, valid, invalid, demuxed, and ambiguous reads
- Segment length distributions:per-segment box plots
- PolyA/T tail-length distribution
- Read orientation balance:forward vs reverse complement
- Cell-barcode knee plot:log-log rank-count curve for cell barcodes
- Per-cell read counts
- Barcode edit distance distributions:per barcode type
Tier 2: BAM-Based Metrics (Requires --bam)
Providing an aligned BAM file (with CB and UB tags) enables:
- Sequencing saturation curve:UMI saturation as a function of sequencing depth
- Unique UMIs per cell
- Mapping rate per cell
- Duplicate rate per cell
- Gene body coverage (also requires
--gene-body-bed)
Tier 3: Gene Quantification Metrics (Requires --counts-matrix and --gtf)
Providing a featureCounts counts matrix and GTF annotation enables:
- Genes detected per cell
- UMIs per cell
- Mitochondrial read fraction per cell
- Ribosomal read fraction per cell
- Library complexity (genes vs UMIs colored by mitochondrial %)
- Top expressed genes
- Gene biotype breakdown
- featureCounts assignment summary
Usage
Basic (Annotation Only)
tranquillyzer qc-metrics \
--threads 4 \
INPUT_DIRWith BAM and Gene Counts
tranquillyzer qc-metrics \
--threads 4 \
--bam aligned_files/dup_marked.bam \
--counts-matrix counts_matrix.tsv \
--gtf gencode.v44.annotation.gtf \
--gene-body-bed hg38.HouseKeepingGenes.bed.gz \
INPUT_DIRGene Body Coverage
The gene body coverage plot follows the RSeQC convention: the user supplies a curated BED12 (one transcript per line) via --gene-body-bed, and each line is treated independently in the pileup. No transcript is auto-picked from the GTF — supplying a curated BED keeps the percentile axis aligned with the canonical mRNA species and avoids artifacts from extended-UTR or retained-intron isoforms. Both plain and gzipped BED12 are accepted.
Common sources for the BED:
- RSeQC housekeeping BEDs (SourceForge) — pre-curated single-isoform-per-gene sets for hg38, hg19, mm10, etc. The fastest path to a clean curve.
- MANE_Select export — for human, the ~19,000 MANE_Select transcripts converted to BED12 (e.g. via UCSC
gtfToGenePred+genePredToBed). - Custom curated set — any BED12 the user trusts (e.g. APPRIS principal isoforms, project-specific genes of interest).
Without --gene-body-bed, the gene body coverage plot is omitted from the report (other QC sections are unaffected).
Command Line Options
| Option | Default | Description | When to change |
|---|---|---|---|
INPUT_DIR |
required | Directory containing annotation_metadata/ | |
--output-dir |
INPUT_DIR/qc_metrics |
Where to write the report | |
--threads |
4 | Threads for parallel metric computation | Increase for faster report generation |
--sample-name |
directory name | Label used in the report | Set for clearer report titles |
--valid-file |
auto-detect | Path to valid annotations parquet | Only if non-standard location |
--invalid-file |
auto-detect | Path to invalid annotations parquet | Only if non-standard location |
--bam |
None | Coordinate-sorted BAM with CB/UB tags | Provide for saturation and alignment metrics |
--counts-matrix |
None | featureCounts counts matrix TSV | Provide for gene-level QC |
--gtf |
None | GTF annotation file | Required with --counts-matrix |
--gene-body-bed |
None | BED12 file (plain or .gz) for gene body coverage, RSeQC-style |
Provide a curated single-transcript-per-gene BED (e.g. RSeQC HouseKeepingGenes.bed.gz, MANE_Select export) to enable the gene body coverage plot |
--read-len-bin-width |
100 | Bin width for read-length histograms | Decrease for finer resolution |
Output
All outputs are written to the QC metrics directory:
report.html: self-contained interactive HTML dashboard (Plotly-based; zoomable, pannable, with hover tooltips). This is the primary QC artifact.plot_data/*.tsv: individual metric TSV files for MultiQC integration:barcode_assignment_mqc.tsvread_architecture_mqc.tsvedit_distance_mqc.tsvread_length_dist_mqc.tsvknee_plot_mqc.tsvsaturation_curve_mqc.tsv(if BAM provided)genes_per_cell_mqc.tsv(if counts matrix provided)- and others
Recommendations
- The QC report can be generated at any point after annotation. You do not need to wait for alignment or deduplication. Run it early with just annotations, then re-run with
--bamand--counts-matrixfor the full picture. - Metrics are computed independently in parallel, so increasing
--threadsdirectly speeds up report generation. - The HTML report is fully self-contained and can be shared or viewed without a web server.