Clique aims to be a high-performance, unified package developed for the efficient processing and analysis of lineage data from Illumina and long-read technologies like Nanopore. It effectively collapses diverse unique molecular identifiers (UMIs) and static identifiers (SIs) and generates consensus sequences. The package aligns these consensus sequences to amplicon sequences, matching them to the best candidate sequence through a variety of alignment strategies, such as affine gap, convex alignments, and Hidden Markov Models (HMMs).
Clique supports a broad range of single-cell sequencing technologies, and it provides comprehensive statistics on read recovery, UMI complexity, alignment quality, and single-cell alignment. The package also enables the creation of lineage trees and statistics, empowering researchers with a deeper understanding of the genetic relationships between individual cells.
clique is a Rust + Python toolkit for high‑accuracy, amplicon‑based variant calling. It takes raw long‑read data (ONT / PacBio), collapses identical molecules, filters background contaminants, then emits per‑molecule consensus calls and a final VCF—all in a single command chain.
Below is the minimum “happy‑path” workflow. You only need the Rust toolchain and Python ≥ 3.9.
Define every constant and variable segment of your amplicon in a YAML file.
Variable regions are marked with successive integers ({1}, {2}, …) as described in the wiki.
name: MY_AMPLICON
pieces:
- seq: "GGGCGT" # left flank – fixed
- len: 12 # variable segment {1}
- seq: "AACCGGTT" # internal anchor – fixed
- len: 8 # variable segment {2}
- seq: "AGCT" # right flank – fixedFull spec: https://github.com/mckennalab/clique/wiki/YAML-input-file.
Save the file as my_amplicon.yaml.
If the run is noisy (e.g. whole‑cell prep), add off‑target genomes or transcriptomes so clique can subtract reads that align elsewhere.
# Bundle your amplicon with hg38 + plasmid FASTA
python scripts/make_bg_reference.py --target my_amplicon.fa --genome hg38.fa --output bg_reference.faSkip this step for clean, single‑amplicon runs.
You can point clique at any SAM/BAM/CRAM produced by minimap2, but the repo ships two ultra‑fast Rust aligners (clique-align for noisy ONT; clique-align-hifi for PacBio‑HiFi):
# Rust aligner (fastest for ONT reads)
clique-align -r my_amplicon.fa -i reads.fastq -o aligned.sam
# or classic aligner
minimap2 -ax map-ont my_amplicon.fa reads.fastq > aligned.samclique-collapse --input aligned.sam --yaml my_amplicon.yaml --out collapsed.bam --report collapse_metrics.jsonclique-collapse groups reads by tag, removes PCR/sequencing duplicates, and writes QC metrics (coverage per variable site, duplication rate, read‑length histograms, etc.).
clique-call --bam collapsed.bam --yaml my_amplicon.yaml --output variants.vcfThe caller walks each molecule, builds a consensus for every {n} variable region, and outputs a standard VCF plus optional per‑site CSV summaries.
- Inspect
collapse_metrics.json—low per‑site coverage often signals primer failure or barcode dropout. - Tune caller thresholds with
--min-depthand--min-freqfor ultra‑rare‑variant detection. - For multi‑amplicon panels, feed a panel YAML (array of amplicons) to the same commands; clique auto‑routes reads.
For deeper dives—error‑modeling flags, GPU options, and interactive notebooks—see the project wiki and examples/ directory.
Happy collapsing!
