simulate-data CLI
Input Parameters to simulate-data
Careful tuning of the following parameters allows the simulation to match the empirical error profile of specific sequencing platforms or chemistries.
| CLI Option | Type | What It Controls | Default |
|---|---|---|---|
model_name |
TEXT | Library model key defining segment order and motifs | required |
output_dir |
TEXT | Output directory for simulated data (simulated_data/) | required |
--training-seq-orders-file |
TEXT | Path to training_seq_orders.tsv defining segment order and sequences | utils/training_seq_orders.tsv |
--num-reads |
INT | Number of primary reads to simulate (before reverse-complement doubling) | 50000 |
--mismatch-rate |
FLOAT | Base substitution probability during training read simulation | 0.05 |
--insertion-rate |
FLOAT | Base insertion probability during training read simulation | 0.05 |
--deletion-rate |
FLOAT | Base deletion probability during training read simulation | 0.06 |
--min-cdna |
INT | Minimum cDNA length used in training read simulation | 100 |
--max-cdna |
INT | Maximum cDNA length used in training read simulation | 500 |
--polyt-error-rate |
FLOAT | Error rate within polyA/polyT segments during training simulation | 0.02 |
--max-insertions |
INT | Maximum number of insertions allowed after a single base | 1 |
--threads |
INT | CPU threads used for training read simulation | 2 |
--rc / --no-rc |
FLAG | Include reverse-complemented reads (doubles training set size) | --rc |
--transcriptome |
TEXT | Transcriptome FASTA used for cDNA generation (else random transcripts) | None |
--invalid-fraction |
FLOAT | Fraction of training reads generated as invalid/artifactual | 0.3 |
--help |
FLAG | Show help message | — |
Example Use Case
tranquillyzer simulate-data \
10x3p_sc_ont \
training_out \
--num-reads 50000 \
--mismatch-rate 0.05 \
--insertion-rate 0.05 \
--deletion-rate 0.06 \
--min-cdna 100 \
--max-cdna 500 \
--polyt-error-rate 0.02 \
--max-insertions 1 \
--invalid-fraction 0.3 \
--rc \
--threads 2 \
--transcriptome gencode.v44.transcripts.fa