Impact of rare and common genetic variation on cell type-specific gene expression

This repository contains scripts for data processing, analysis and figure generation using data from phase 1 of the TenK10K project for our paper:

Cuomo et al., Impact of rare and common genetic variation on cell type-specific gene expression, medRxiv, 2025.

This includes:

scRNA-seq processing pipeline

Create a file summarising sequencing libraries included in the latest run (by running this)
New pool-donor info (for vireo) may not be in the files used here, but they will be in the shared data tracking spreadsheet, so download the updated list from there.
Ambient RNA
- CellBender runner, qsub script to run CellBender for each sequencing library
Doublet Detection + Demultiplexing
- scds runner, qsub script to run scds for each sequencing library using Demuxafy image
- scDblFinder runner, qsub script to run scDblFinder for each sequencing library using Demuxafy image
- vireo runner, qsub script to run vireo for each sequencing library using Demuxafy image (requires CellBender results, genotype info) -- note that running vireo requires knowing which individuals we expect in each pool, detailed in scripts in this folder
- Demuxafy combiner script, qsub script to run Demuxafy combiner for each sequencing library (requires scds, scDblFinder, vireo results)
Cell Typing
- Consortium WG2 (scPred + Azimuth)
  - make Seurat objects scripts, qsub script running the R script building Seurat objects for each sequencing library prior to cell typing using Azimuth / scPred
  - Azimuth cell typing runner, qsub script to perform celltyping using Azimuth for each sequencing library using sceQTLGen WG2 image (requires Seurat objects)
  - hierarchical scPred cell typing runner, qsub script to perform celltyping using hierarchical scPred for each sequencing library using sceQTLGen WG2 image (requires Seurat objects)
  - Consortium WG2 cell typing combiner script, qsub script running the R script which combines cell types prediction for each sequencing library obtained using Azimuth and scPred (requires azimuth and hierarchical scPred results)
- Celltypist
  - Celltypist runner, qsub script running the Python script performing cell typing using Cell Typist for each sequencing library
Scanpy data wrangling & data integration (to do AFTER everything else)
- Add info runner, qsub script running the Python script adding all metadata to scanpy object for each sequencing library (requires results from CellBender, Demuxafy combiner, WG2 cell typing combiner, celltypist results) and performing initial QC
- Python script combining results into a single AnnData object (concatenate), and add gene and donor info (Python script)
- Python script making AnnData objects for each cell type + chromosome combination. Python script, qsub runner.
- Python script making TSV objects containing expression PCs for each cell type. Python script, qsub runner.
- TO DO: Python script performing QC & processing on combined AnnData + plotting.

Pre-processing tools used

CellBender for ambient RNA detection
Demuxafy for demultiplexing and doublet detection, specifically:
- vireo for demultiplexing, as it allows to include the number of donors expected per pool, regardless of whether we have genotype data for them
- majority voting of vireo, scds and scDblFinder for doublet detection
QC & normalisation using Scanpy
batch correction / integration using Harmony
cell typing using scPred, Azimuth, and CellTypist

Name		Name	Last commit message	Last commit date
Latest commit History 1,084 Commits
CellBender		CellBender
Celltyping		Celltyping
Demuxafy		Demuxafy
Scanpy		Scanpy
cell_cycle_scoring		cell_cycle_scoring
cell_state_abundance_qtl/GeNA		cell_state_abundance_qtl/GeNA
images		images
plotting_notebooks		plotting_notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Impact of rare and common genetic variation on cell type-specific gene expression

scRNA-seq processing pipeline

Pre-processing tools used

Processing pipeline data flow diagram

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

powellgenomicslab/tenk10k_phase1

Folders and files

Latest commit

History

Repository files navigation

Impact of rare and common genetic variation on cell type-specific gene expression

scRNA-seq processing pipeline

Pre-processing tools used

Processing pipeline data flow diagram

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages