Barcode Correction and Demultiplexing
After annotation, barcode sequences need to be corrected (matched to known barcodes) and reads need to be assigned to cells (demultiplexed). Tranquillyzer offers two approaches depending on whether you have a barcode whitelist.
Whitelist-Based Barcode Correction
If you have a barcode whitelist (e.g., from 10x Genomics), barcode correction can happen either integrated with annotation or as a standalone step.
Integrated Correction (Recommended)
The fastest approach. Barcode correction and demultiplexing run concurrently with annotation. See the Annotation page for details:
tranquillyzer annotate-reads \
--run-barcode-correction \
--run-demux \
--gpu-mem 48 \
OUTPUT_DIR \
WHITELIST_FILEStandalone Correction
Use barcode-correct when you have already annotated reads and want to correct barcodes separately. For example, if you want to try a different whitelist or adjust the Levenshtein threshold without re-running annotation:
tranquillyzer barcode-correct \
--bc-lv-threshold 2 \
--threads 12 \
INPUT_DIR \
WHITELIST_FILECorrection Strategy
For each read, the corrected barcode is determined by:
- Exact match: check if the extracted barcode is directly in the whitelist (fastest path).
- Reverse complement: check if the reverse complement matches (handles strand ambiguity).
- Fuzzy matching: find the closest whitelist barcode within a Levenshtein edit distance threshold (
--bc-lv-threshold, default 2)
If no match is found within the threshold, the barcode is labeled NMF (No Match Found). If multiple whitelist barcodes tie at the minimum distance, the read is marked as ambiguous.
For multi-barcode protocols (e.g., combinatorial indexing with CBC + i5 + i7), all barcode columns are corrected independently, and a cell ID is assigned via product matching against the whitelist.
Concurrent Demultiplexing
Add --run-demux to export demultiplexed FASTA/FASTQ files at the same time as correction:
tranquillyzer barcode-correct \
--run-demux \
--output-fmt fasta \
--threads 12 \
INPUT_DIR \
WHITELIST_FILEStandalone Correction Options
| Option | Default | Description | When to change |
|---|---|---|---|
INPUT_DIR |
required | Directory containing annotation_metadata/ | |
WHITELIST_FILE |
required | TSV with barcode columns | |
--output-dir |
INPUT_DIR |
Where to write corrected output | Set if you want separate output |
--model-name |
10x3p_sc_ont_013 |
Model for barcode column resolution | Match your annotation model |
--bc-lv-threshold |
2 | Max Levenshtein distance for fuzzy matching | Increase for noisier barcodes |
--run-demux |
off | Demultiplex concurrently | Enable to get FASTA/FASTQ output |
--output-fmt |
fasta |
Demux format | Use fastq if quality scores available |
--include-barcode-quals |
off | Include barcode quality scores in headers | Enable for downstream QC |
--include-polya |
off | Append polyA/T tail to demuxed reads | Enable if needed downstream |
--chunk-size |
100,000 | Rows per processing chunk | Rarely needs changing |
--threads |
12 | CPU threads | Match your available cores |
--resume / --no-resume |
--resume |
Resume from checkpoint | Disable to force full re-run |
Whitelist-Free Barcode Discovery
If you do not have a barcode whitelist, Tranquillyzer can discover cell barcodes directly from the annotation output using count-based knee-point detection.
Annotation Without a Whitelist
tranquillyzer annotate-reads \
--model-name 10x3p_sc_ont_013 \
--gpu-mem 48 \
OUTPUT_DIRBarcode Discovery
tranquillyzer generate-whitelist \
--model-name 10x3p_sc_ont_013 \
--expected-cells 5000 \
OUTPUT_DIRDiscovery Process
Count barcodes: scan all valid annotations and count occurrences of each unique barcode (or barcode tuple for multi-barcode protocols). Barcodes are canonicalized via reverse complement (lexicographic minimum of sequence and its RC) to collapse strand-ambiguous duplicates.
Knee-point detection: sort barcodes by count (descending) and find the “knee” in the log-log rank-count curve where real cell barcodes transition to background noise. If
--expected-cellsis provided, it guides the detection; otherwise, the knee is found automatically using the kneedle algorithm.Near-duplicate merging: barcodes within edit distance 1 of a higher-count barcode are merged into it. This uses deletion neighborhood hashing for efficient O(K*L) computation instead of O(K^2) pairwise comparison.
Output whitelist:the surviving barcodes are written as a whitelist TSV, ready for
barcode-correct.
Discovery Options
| Option | Default | Description | When to change |
|---|---|---|---|
OUTPUT_DIR |
required | Annotation output directory | |
--model-name |
10x3p_sc_ont_013 |
Model for barcode column resolution | Match your annotation model |
--expected-cells |
None (auto) | Hint for expected number of cells | Provide if known for better knee detection |
--min-cell-ratio |
0.50 | Knee threshold as fraction of cliff-top count | Lower to include more cells; raise for stricter filtering |
--min-reads-per-barcode |
3 | Minimum reads for a barcode to be considered | Increase for noisier data |
--barcode-columns |
auto (from model) | Comma-separated barcode column names | Only if model config is unavailable |
--chunk-size |
100,000 | Rows per streaming chunk | Rarely needs changing |
Discovery Output
annotation_metadata/discovered_whitelist.tsv: discovered barcode whitelist (use as input tobarcode-correct)annotation_metadata/barcode_discovery_stats.json: summary statistics (unique counts, knee threshold, merge mapping)annotation_metadata/barcode_counts.tsv: all observed barcodes with counts and above/below-knee statusannotation_metadata/barcode_rank_plot.png: log-log rank plot showing the knee threshold
Correction and Demultiplexing
Use the discovered whitelist with barcode-correct:
tranquillyzer barcode-correct \
--run-demux \
--output-fmt fasta \
--threads 12 \
OUTPUT_DIR \
OUTPUT_DIR/annotation_metadata/discovered_whitelist.tsvStandalone Demultiplexing
If barcode correction has already been run and you just need to re-export FASTA/FASTQ files (e.g., in a different format), use demux-reads:
tranquillyzer demux-reads \
--output-fmt fastq \
INPUT_DIRThis reads the corrected annotation parquet and exports demultiplexed reads without re-running correction.
| Option | Default | Description |
|---|---|---|
INPUT_DIR |
required | Directory containing corrected annotations |
--output-dir |
INPUT_DIR |
Where to write demux output |
--output-fmt |
fasta |
Output format: fasta or fastq |
Demux Output
demuxed_fasta/demuxed.fasta.gz:reads assigned to cells (gzipped)demuxed_fasta/ambiguous.fasta.gz:reads with ambiguous barcode matches (gzipped)
FASTA/FASTQ headers include cell ID, corrected barcodes, UMI, and orientation:
>read_001_42_ACGTACGT cell_id:42|Barcodes:CBC:ATCGATCGATCGATCG|UMI:ACGTACGTACGT|orientation:+