Our cellbender repo but implemented in Nextflow.
There are two branches:
main — this branch contains the script for running cellbender on the FARM using Nextflow command line
nextflow-tower — this branch contains the script for running cellbender on the FARM using Nextflow Tower
main.nfthe Nextflow pipeline that executes cellbender.nextflow.config— the configuration script that allows the processes to be submitted to IBM LSF on Sanger's HPC and ensures correct environment is set via singularity container (this is an absolute path). Global default parameters are also set in this file.examples/sample_table.csv— example CSV file with sample IDs and local filesystem paths to CellRanger output directories for each sampleexamples/sample_table_irods.csv— example CSV file with sample IDs and iRODS catalog paths to STARsolo output directories for each sample (excluded.h5files)examples/sample_table_preset.csv— example CSV file designed for use with--mapper_presetoption to automatically estimate cell/droplet parameters from CellRanger/STARsolo output (excluded.mtxand.h5files)examples/sample_table_exclude_features.csv— example CSV file for demonstrating feature exclusion workflows with different CellBender versionsexamples/run_cellbender.sh— example of bash script to run this pipelinedocker/Dockerfile_v2— aDockerfilewith image forcellbenderof version0.2.2docker/Dockerfile_v3— aDockerfilewith image forcellbenderof version0.3.2
Running Cellbender version 0.2 using local data
nextflow run main.nf --version "0.2" --sample_table examples/sample_table.csv --cells <val> --droplets <val>
Running Cellbender version 0.3 (used by default) using data on iRODS
nextflow run main.nf --sample_table examples/sample_table_irods.csv --on_irods
Running Cellbender version 0.2. The parameter --mapper_preset is applied for version 0.2 by default
nextflow run main.nf --sample_table examples/sample_table_preset.csv --version "0.2" --on_irods
Running Cellbender version 0.3 with --mapper_preset
nextflow run main.nf --sample_table examples/sample_table_preset.csv --on_irods --mapper_preset --version "0.3"
Only "All" is available for version 0.2
nextflow run main.nf --version "0.2" --sample_table examples/sample_table_exclude_features.csv --on_irods --exclude_features "All"
Specify a list of comma-separated features you want to exclude for version 0.3
nextflow run main.nf --version "0.3" --sample_table examples/sample_table_exclude_features.csv --on_irods --exclude_features "Peaks,Multiplexing Capture,CRISPR Guide Capture" --mapper_preset
Use mapper preset with feature exclusion for version 0.2. Load the data from iRODS (do not load .bam and .bz2 files) and change the name of output directory to my-cellbender-v2-results
nextflow run main.nf --version "0.2" --sample_table examples/sample_table_exclude_features.csv --on_irods --exclude_features "All" --ignore_extensions "bam,bz2" --output_dir "my-cellbender-v2-results"
Use mapper preset with feature exclusion for version 0.3. Load the data from iRODS (do not load .bam and .bz2 files) and change the name of output directory to my-cellbender-v3-results
nextflow run main.nf --version "0.3" --sample_table examples/sample_table_exclude_features.csv --on_irods --exclude_features "Peaks,Multiplexing Capture,CRISPR Guide Capture" --ignore_extensions "bam,bz2" --output_dir "my-cellbender-v3-results"
--sample_table— Path to a .csv file containing a list of sample IDs and paths to one of the following:CellRanger/STARsolooutput directory (works for all flavors ofCellRanger),.h5file,.mtxdirectory. For more details seeexamples/sample_table.csvfile.--cells— Number of cells. Required for version0.2when.h5file or.mtxfile is provided in--sample_table. Otherwise--mapper_presetis used for version0.2orCellBender's parameter estimation is used for version0.3.--droplets— Number of droplets. Required for version0.2when.h5file or.mtxfile is provided in--sample_table. Otherwise--mapper_presetis used for version0.2orCellBender's parameter estimation is used for version0.3.
--help— Display this help message--on_irods— Set this flag if the path in--sample_tablefile points to IRODS catalog--ignore_extensions- Specify file extensions to drop those files during catalog loading fromiRODS(default: "bam,cram,fastq,fq,fastq.gz,fq.gz,fastq.bz2,fq.bz2,fastq.xz,fq.xz,fastq.lz4,fq.lz4,mate1.bz2,mate2.bz2")--mapper_preset- UseCellRanger's orSTARsolo's output to estimate--cells,--dropletsand--min_umiparameters. Works only if the whole output directory is specified as path in--sample_table--starsolo_mapper- SpecifySTARsolo's output type to use forCellBender(default: "GeneFull")--exclude_features— Specify a list of features to exclude. Available options include:"Antibody Capture"— only available for version0.3ofcellbender"CRISPR Guide Capture"— only available for version0.3ofcellbender"Custom"— only available for version0.3ofcellbender"Peaks"— only available for version0.3ofcellbender"Multiplexing Capture"— only available for version0.3ofcellbender"VDJ"— only available for version0.3ofcellbender"VDJ-T"— only available for version0.3ofcellbender"VDJ-T-GD"— only available for version0.3ofcellbender"VDJ-B"— only available for version0.3ofcellbender"Antigen Capture"— only available for version0.3ofcellbender"All"— only available for version0.2ofcellbender
--epochs— Number of epochs (default: "")--fpr— False positive rate (default: "")--lr— Learning rate (default: "")--min_umi— Lower bound for empty-droplet UMI count (default: "")--force_empty_umi_prior- Higher bound for empty-droplet UMI count (default: "")--estimator- An estimator that is used for posterior generation (default: "mckp")--version— Cellbender version (available:0.2,0.3;default: 0.3)--qc_mode— Quality control mode (default: 3)--output_dir— Output directory (default: results)
The image is based on
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04and includes installations of cellbender and R-4.4.2. The up to date image can be loaded from quay repository
Run tests
mkdir -p logs
N=26
bsub -J "test-cellbender[1-$N]" -env "all, N=$N" < tests/scripts/run_tests.bsub
Count successful runs
echo "PASSED: $(grep -l "PASSED" logs/*Output*.log | wc -l), FAILED: $(grep -l "FAILURE" logs/*Output*.log | wc -l), RUNNING $(grep -L "Your job looked like:" logs/*Output*.log | wc -l)"
echo "FAILED TEST LIST:"; grep -l "FAILURE" logs/*Output*.log