This repository contains scripts for data processing, analysis and figure generation using data from phase 1 of the TenK10K project for our paper:
Cuomo et al., Impact of rare and common genetic variation on cell type-specific gene expression, medRxiv, 2025.
This includes:
- scRNA-seq processing
- cell-state abundance QTLs
- manuscript main and suppl figures
- Create a file summarising sequencing libraries included in the latest run (by running this)
- New pool-donor info (for vireo) may not be in the files used here, but they will be in the shared data tracking spreadsheet, so download the updated list from there.
- Ambient RNA
- CellBender runner, qsub script to run CellBender for each sequencing library
- Doublet Detection + Demultiplexing
- scds runner, qsub script to run scds for each sequencing library using Demuxafy image
- scDblFinder runner, qsub script to run scDblFinder for each sequencing library using Demuxafy image
- vireo runner, qsub script to run vireo for each sequencing library using Demuxafy image (requires CellBender results, genotype info) -- note that running vireo requires knowing which individuals we expect in each pool, detailed in scripts in this folder
- Demuxafy combiner script, qsub script to run Demuxafy combiner for each sequencing library (requires scds, scDblFinder, vireo results)
- Cell Typing
- Consortium WG2 (scPred + Azimuth)
- make Seurat objects scripts, qsub script running the R script building Seurat objects for each sequencing library prior to cell typing using Azimuth / scPred
- Azimuth cell typing runner, qsub script to perform celltyping using Azimuth for each sequencing library using sceQTLGen WG2 image (requires Seurat objects)
- hierarchical scPred cell typing runner, qsub script to perform celltyping using hierarchical scPred for each sequencing library using sceQTLGen WG2 image (requires Seurat objects)
- Consortium WG2 cell typing combiner script, qsub script running the R script which combines cell types prediction for each sequencing library obtained using Azimuth and scPred (requires azimuth and hierarchical scPred results)
- Celltypist
- Celltypist runner, qsub script running the Python script performing cell typing using Cell Typist for each sequencing library
- Consortium WG2 (scPred + Azimuth)
- Scanpy data wrangling & data integration (to do AFTER everything else)
- Add info runner, qsub script running the Python script adding all metadata to scanpy object for each sequencing library (requires results from CellBender, Demuxafy combiner, WG2 cell typing combiner, celltypist results) and performing initial QC
- Python script combining results into a single AnnData object (concatenate), and add gene and donor info (Python script)
- Python script making AnnData objects for each cell type + chromosome combination. Python script, qsub runner.
- Python script making TSV objects containing expression PCs for each cell type. Python script, qsub runner.
- TO DO: Python script performing QC & processing on combined AnnData + plotting.
- CellBender for ambient RNA detection
- Demuxafy for demultiplexing and doublet detection, specifically:
- vireo for demultiplexing, as it allows to include the number of donors expected per pool, regardless of whether we have genotype data for them
- majority voting of vireo, scds and scDblFinder for doublet detection
- QC & normalisation using Scanpy
- batch correction / integration using Harmony
- cell typing using scPred, Azimuth, and CellTypist
