Skip to content

powellgenomicslab/tenk10k_phase1

Repository files navigation

Impact of rare and common genetic variation on cell type-specific gene expression

This repository contains scripts for data processing, analysis and figure generation using data from phase 1 of the TenK10K project for our paper:

Cuomo et al., Impact of rare and common genetic variation on cell type-specific gene expression, medRxiv, 2025.

This includes:

scRNA-seq processing pipeline

  • Create a file summarising sequencing libraries included in the latest run (by running this)
  • New pool-donor info (for vireo) may not be in the files used here, but they will be in the shared data tracking spreadsheet, so download the updated list from there.
  • Ambient RNA
  • Doublet Detection + Demultiplexing
    • scds runner, qsub script to run scds for each sequencing library using Demuxafy image
    • scDblFinder runner, qsub script to run scDblFinder for each sequencing library using Demuxafy image
    • vireo runner, qsub script to run vireo for each sequencing library using Demuxafy image (requires CellBender results, genotype info) -- note that running vireo requires knowing which individuals we expect in each pool, detailed in scripts in this folder
    • Demuxafy combiner script, qsub script to run Demuxafy combiner for each sequencing library (requires scds, scDblFinder, vireo results)
  • Cell Typing
  • Scanpy data wrangling & data integration (to do AFTER everything else)
    • Add info runner, qsub script running the Python script adding all metadata to scanpy object for each sequencing library (requires results from CellBender, Demuxafy combiner, WG2 cell typing combiner, celltypist results) and performing initial QC
    • Python script combining results into a single AnnData object (concatenate), and add gene and donor info (Python script)
    • Python script making AnnData objects for each cell type + chromosome combination. Python script, qsub runner.
    • Python script making TSV objects containing expression PCs for each cell type. Python script, qsub runner.
    • TO DO: Python script performing QC & processing on combined AnnData + plotting.

Pre-processing tools used

  • CellBender for ambient RNA detection
  • Demuxafy for demultiplexing and doublet detection, specifically:
    • vireo for demultiplexing, as it allows to include the number of donors expected per pool, regardless of whether we have genotype data for them
    • majority voting of vireo, scds and scDblFinder for doublet detection
  • QC & normalisation using Scanpy
  • batch correction / integration using Harmony
  • cell typing using scPred, Azimuth, and CellTypist

Processing pipeline data flow diagram

singlecell pipeline data flow (for sharing) (2)

About

Repo for analyses of the pilot phase of TenK10K

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages