Quick Start

Available Models

Parameter \ Model Name	`10x5p_sc_ont`	`10x3p_sc_ont`
Batch Size	128	128
Training Fraction	0.8	0.8
Vocab Size	5	5
Embedding Dimension	128	128
Convolutional Layers	4	3
Convolutional Filters	128	128
Convolutional Kernel Size	25	25
LSTM Layers	1	1
LSTM Units	96	96
Bidirectional	True	True
CRF Layer	True	True
Attention Heads	0	0
Dropout Rate	0.35	0.35
Regularization	0.01	0.01
Learning Rate	0.001	0.001
# Epochs	5	1

Software Dependencies

It is suggested to use Docker (or another containerization tool like Singularity or Apptainer). This handles all dependencies for you and provides easier portability across systems.
If you are building Tranquillyzer yourself, you will need either mamba or conda. pip is also used during the installation process, though this will be installed via mamba/conda and does not need to be installed on its own.
Required dependencies for Tranquillyzer are provided in the environment.yml file at the top level of the Tranquillyzer repository.
TensorFlow has its own requirements to run. Those can be found on TensorFlow’s documentation site.

Important Notes for Running Tranquillyzer

Will be added as useful notes on running Tranquillyzer are come by

Basic Pipeline for Running Tranquillyzer

More details about each of the commands can be found on Usage page.

# Preprocessing raw FASTQs
# Notes:
#     Can be run on CPU-only computer
#     Best to run with >1 thread, though only 1 thread will be used if 1 file is
#         input
# User-defined elements:
#     N_THREADS  - number of threads to use when processing
#     DATA_DIR   - path to your raw FASTAs or FASTQs
#     OUTPUT_DIR - path to where the preprocessing output will be written
tranquillyzer preprocess \
    --threads N_THREADS \
    DATA_DIR \
    OUTPUT_DIR

# Create read length distribution plot
# Notes:
#     Can be run on a CPU-only computer
#     Only one (1) thread is needed
# User-defined elements:
#     OUTPUT_DIR - same OUTPUT_DIR directory from preprocessing step
tranquillyzer readlengthdist \
    OUTPUT_DIR

# Annotate reads
# Notes:
#     Best run on a GPU-enabled computer
#     Also utilizes CPU multiprocessing, so run with >1 threads
# User-defined elements:
#     MODEL_NAME     - base name of neural net model to use
#     MODEL_TYPE     - type of model to use
#     GPU_MEM        - amount of vRAM in your GPU(s)
#     OUTPUT_FORMAT  - whether to output data as FASTA or FASTQ
#     SEQ_ORDER_FILE - path to the sequence order file used to define valid
#                      reads
#     OUTPUT_DIR     - same OUTPUT_DIR from preprocessing
#     WHITELIST_FILE - TSV file with sequences that define each cell (see
#                      [Usage](usage.qmd) for more details)
tranquillyzer annotate-reads \
        --model-name MODEL_NAME \
        --model-type MODEL_TYPE \
        --gpu-mem GPU_MEM \
        --output-fmt OUTPUT_FORMAT \
        --seq-order-file SEQ_ORDER_FILE \
        --threads N_THREADS \
        OUTPUT_DIR \
        WHITELIST_FILE

# Align annotated reads
# Notes:
#    Needs samtools and minimap2 to run (should be installed with tranquillyzer)
#    Can be run on a CPU-only computer
#    Best to run with >1 threads
# User-defined elements:
#     N_THREADS  - number of CPU threads to use
#     OUTPUT_DIR - same OUTPUT_DIR from preprocessing (NOTE: use the same
#                  directory twice as shown)
#     REFERENCE  - reference FASTA for minimap2
tranquillyzer align \
    --threads N_THREADS \
    OUTPUT_DIR \
    REFERENCE \
    OUTPUT_DIR

# Deduplicate aligned BAM
# Notes:
#     Uses samtools
#     Can be run on a CPU-only computer
#     Best to run with >1 threads
# User-defined elements:
#     N_THREADS  - number of CPU threads to use
#     OUTPUT_DIR - same OUTPUT_DIR from preprocessing
tranquillyzer dedup \
    --threads N_THREADS \
    OUTPUT_DIR