coreSC

This pipeline serves as the master process to execute multi-sample preprocessing, normalization, dimensional reduction, integration, and clustering for 10X single cell RNAseq data using the Seurat framework. This documentation assumes a working understanding of Seurat.

Features

Dependencies

coreSC is run with the Singularity container execution framework.

Setup

First Time

# clone the repo
git clone https://github.com/ChoBioLab/coreSC.git --recurse-submodules

# put config templates in-place
cd coreSC/config && cp templates/* .

Config

Required configuration
- env
- params.csv
- samples.csv
- ann
Confirm parallel memory use with future
- The future.mem param gives RAM / thread. Each individual task needs an adequate threshold of RAM to complete its work. Also future.mem * future.workers gives the total memory allocation. This should live under the available system RAM for the job as a whole. If either of these considerations are not met, the run will fail!

Usage

Execution can be carried out with the run script.

./run

-v version [REQUIRED]
-h harmonize [NULL]
-a atac-multi subroutine [NULL]
-c cite-seq subroutine [NULL]
Executing run alone will perform a standard, single-modal regularization and integration if relevant.
Specifying a Seurat container version to use is required. Versions can be found at https://gallery.ecr.aws/chobiolab

# examples

./run -v v4-r2           # regularize and integrate with SCTransform method

./run -v v4-r2 -h TRUE   # substitute integration method for harmony

./run -v v4-r2 -a TRUE   # alternatively, pass samples through atac-multiome routine

Output

coreSC/output/output_2022-12-21_18.59.26
├── all_markers.csv                 # output of FindAllMarkers()
├── combined_dimplot_red.pdf
├── individual_clustered.RDS        # list object of all individual objects (regularized & clustered)
├── integrated.RDS                  # single, integrated object (regularized & clustered)
├── log.txt
├── params.csv                      # input config params.csv
├── sample1_individual_dimplot.pdf
├── sample1_unfilt_scatter.pdf
├── sample1_unfilt_vln.pdf
├── sample1_var_features.pdf
├── sample2_individual_dimplot.pdf
├── sample2_unfilt_scatter.pdf
├── sample2_unfilt_vln.pdf
├── sample2_var_features.pdf
└── samples.csv                     # input config samples.csv

coreSC/output/output_2023-07-06_19.32.36
├── all_markers.csv                     # output of FindAllMarkers() unannotated
├── annotation                          # CELLTYPIST OUTPUT
│   ├── annd_all_markers.csv            # output of FindAllMarkers() annotated
│   ├── celltypist-log.txt
│   ├── decision_matrix.csv             # celltypist output
│   ├── integrated-annd_2023-07-06.RDS  # ANNOTATED OBJECT
│   ├── predicted_labels.csv            # celltypist output
│   ├── probability_matrix.csv          # celltypist output
│   └── qc.csv                          # qc metrics for annotation
├── combined_dimplot_red.pdf
├── individual_clustered.RDS            # list of individual, norm'd, clustered objects
├── integrated.h5seurat                 # integrated hdf5 object unannotated
├── integrated.RDS                      # integrated seurat object unannotated
├── log.txt
├── params.csv                          # copy of process params
├── sample1_individual_dimplot.pdf
├── sample1_unfilt_scatter.pdf
├── sample1_unfilt_vln.pdf
├── sample1_var_features.pdf
└── samples.csv                         # copy of sample sheet

1 directory, 19 files

Reference

File Tree

coreSC/
├── annotation
│   ├── README.md
│   ├── run                     # annotation execution script
│   └── src
│       ├── apply-ann.R         # process to apply annotation back to input
│       ├── getopts             # annotation run arguments
│       ├── matrix-convert.R    # seurat to sparse matrix conversion
│       └── qc.R                # qc metrics generation process
├── config                      # COPY CONFIG FILES HERE
│   └── templates
│       ├── ann                 # env vars for annotation
│       ├── env                 # general env vars
│       ├── params.csv          # pipeline parameters
│       └── samples.csv         # sample details
├── LICENSE
├── main                        # orchestration script
├── output
├── README.md
├── run                         # EXECUTION SCRIPT
└── scripts
    ├── atac-multi-wnn.R        # full atac-multiome routine
    ├── cite-wnn.R              # full cite-seq multiome routine
    ├── create-multi-norm.R     # create object routine for multiomic samples
    ├── create-object-norm.R    # create object routine with classical normalization
    ├── create-object-sct.R     # create object routine with SCTransform normalization
    ├── getopts                 # run script arguments
    ├── harmonize.R             # harmony integration subroutine
    ├── integrate.R             # standard seurat integration subroutine
    ├── main.R                  # legacy
    ├── mixscape.R              # method for crispr preprocessing
    └── preamble.R              # place setting subroutine

6 directories, 26 files

params.csv

Var	Description
min.cells	minimum number of cells
min.count.rna	minimum number of rna molecules per cell
max.count.rna	max number of rna molecules per cell
min.count.atac	minimum number of atac fragments per cell
max.count.atac	max number of atac fragments per cell
min.features	minimum number of genes per cell
max.features	max number of genes per cell
max.percent.mt	max threshold for mitochondrial content
min.percent.mt	minimum threshold for mitochondrial content
pct.reads.peaks	proportion of reads found in peaks
nucleosome	nucleosome score
tss.score	transcription start site score
dims	number of dimensions to include
res	clustering resolution
future.mem	memory allocation per thread in GB
future.workers	number of parallel threads

samples.csv

Var	Description
name	sample name (must be unique, cannot be integer)
dir	path to directory containing matrices
project	project variable name (applied as object metadata)
group	group label (applied as object metadata)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

coreSC

Features

Dependencies

Setup

First Time

Config

Usage

Output

Reference

File Tree

params.csv

samples.csv

About

Uh oh!

Releases 9

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
annotation @ ec0714f		annotation @ ec0714f
config		config
output		output
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
main		main
run		run

License

ChoBioLab/coreSC

Folders and files

Latest commit

History

Repository files navigation

coreSC

Features

Dependencies

Setup

First Time

Config

Usage

Output

Reference

File Tree

params.csv

samples.csv

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Languages

Packages