Quick Start
Available Models
| Parameter \ Model Name | 10x5p_sc_ont |
10x3p_sc_ont |
|---|---|---|
| Batch Size | 128 | 128 |
| Training Fraction | 0.8 | 0.8 |
| Vocab Size | 5 | 5 |
| Embedding Dimension | 128 | 128 |
| Convolutional Layers | 4 | 3 |
| Convolutional Filters | 128 | 128 |
| Convolutional Kernel Size | 25 | 25 |
| LSTM Layers | 1 | 1 |
| LSTM Units | 96 | 96 |
| Bidirectional | True | True |
| CRF Layer | True | True |
| Attention Heads | 0 | 0 |
| Dropout Rate | 0.35 | 0.35 |
| Regularization | 0.01 | 0.01 |
| Learning Rate | 0.001 | 0.001 |
| # Epochs | 5 | 1 |
Software Dependencies
- It is suggested to use Docker (or another containerization tool like Singularity or Apptainer). This handles all dependencies for you and provides easier portability across systems.
- If you are building Tranquillyzer yourself, you will need either
mambaorconda.pipis also used during the installation process, though this will be installed viamamba/condaand does not need to be installed on its own. - Required dependencies for Tranquillyzer are provided in the
environment.ymlfile at the top level of the Tranquillyzer repository. - TensorFlow has its own requirements to run. Those can be found on TensorFlow’s documentation site.
Important Notes for Running Tranquillyzer
Will be added as useful notes on running Tranquillyzer are come by
Basic Pipeline for Running Tranquillyzer
More details about each of the commands can be found on Usage page.
# Preprocessing raw FASTQs
# Notes:
# Can be run on CPU-only computer
# Best to run with >1 thread, though only 1 thread will be used if 1 file is
# input
# User-defined elements:
# N_THREADS - number of threads to use when processing
# DATA_DIR - path to your raw FASTAs or FASTQs
# OUTPUT_DIR - path to where the preprocessing output will be written
tranquillyzer preprocess \
--threads N_THREADS \
DATA_DIR \
OUTPUT_DIR
# Create read length distribution plot
# Notes:
# Can be run on a CPU-only computer
# Only one (1) thread is needed
# User-defined elements:
# OUTPUT_DIR - same OUTPUT_DIR directory from preprocessing step
tranquillyzer readlengthdist \
OUTPUT_DIR
# Annotate reads
# Notes:
# Best run on a GPU-enabled computer
# Also utilizes CPU multiprocessing, so run with >1 threads
# User-defined elements:
# MODEL_NAME - base name of neural net model to use
# MODEL_TYPE - type of model to use
# GPU_MEM - amount of vRAM in your GPU(s)
# OUTPUT_FORMAT - whether to output data as FASTA or FASTQ
# SEQ_ORDER_FILE - path to the sequence order file used to define valid
# reads
# OUTPUT_DIR - same OUTPUT_DIR from preprocessing
# WHITELIST_FILE - TSV file with sequences that define each cell (see
# [Usage](usage.qmd) for more details)
tranquillyzer annotate-reads \
--model-name MODEL_NAME \
--model-type MODEL_TYPE \
--gpu-mem GPU_MEM \
--output-fmt OUTPUT_FORMAT \
--seq-order-file SEQ_ORDER_FILE \
--threads N_THREADS \
OUTPUT_DIR \
WHITELIST_FILE
# Align annotated reads
# Notes:
# Needs samtools and minimap2 to run (should be installed with tranquillyzer)
# Can be run on a CPU-only computer
# Best to run with >1 threads
# User-defined elements:
# N_THREADS - number of CPU threads to use
# OUTPUT_DIR - same OUTPUT_DIR from preprocessing (NOTE: use the same
# directory twice as shown)
# REFERENCE - reference FASTA for minimap2
tranquillyzer align \
--threads N_THREADS \
OUTPUT_DIR \
REFERENCE \
OUTPUT_DIR
# Deduplicate aligned BAM
# Notes:
# Uses samtools
# Can be run on a CPU-only computer
# Best to run with >1 threads
# User-defined elements:
# N_THREADS - number of CPU threads to use
# OUTPUT_DIR - same OUTPUT_DIR from preprocessing
tranquillyzer dedup \
--threads N_THREADS \
OUTPUT_DIR