scClassify2

We proposed scClassify2, a novel framework for adjacent cell state identification with a transferable model across platforms with three key innovations:

(1) we use a dual layer architecture to enable joint learning from both the expression information of single cells and big-data-based gene co-expression patterns. We achieved this through a message passing neural network which captures two layers of information through edges and nodes representation,

(2) we adopt ordinal regression as the classifier of the network with a conditional training procedure specifically designed to identify adjacent cell state transition,

(3) we utilise log-ratio of expression values to capture the relationship between two genes. The log-ratio is shown to be more stable and hence more generalisable across datasets compared to individual gene expression.

The network architecture is:

Installation

To correctly use scClassify2 via your local device, we suggest first create a conda environment by:

conda create -n scclassify2 python=3.9
conda activate scclassify2
conda install -c conda-forge anndata
conda install -c conda-forge scanpy
conda install pytorch torchvision torchaudio pytorch-cuda -c pytorch -c nvidia
pip install wandb
pip install torchviz
pip install --user scikit-misc

Note, as we tested on different platforms, sometimes you may need to reinstall h5py package by:

pip uninstall h5py
pip install h5py

Once success, you have the right environment to use scClassify2.

Usage

Before actual training process, you need to change some paths of input files. You can either go into the main.py file to change arguments in lines

parser.add_argument('--trainset_path', type=str, default='./train_set.h5ad')
parser.add_argument('--label_path', type=str, default='./label.json')

according to comments of corresponding codes, or directly pass these paths when invoking the program by

python main.py --trainset_path <YourPath> --label_path <YourPath>

scClassify2 also keep the hierarchical identification property of scClassify by using HOPACK tree, details could be found in the folder "code_with_HOPACK".

1.try with demo data

If you want to have a quick experience of scClassify2 traning, then directly use this:

conda activate scclassify2  
python main.py --trainset_path ./train_set.h5ad --label_path ./label.json

2.try with demo custom data

If you want to try your own dataset, then you must prepare you dataset first.

a. Convert your data into H5AD format. The H5AD format is the HDF5 disk representation commonly used to share single-cell datasets. Both R and Python have many packages for users to easily transfer their data into H5AD format. During the process, you must store the cell state lables into "CellState" column. Name the converted file as input.h5ad and copy it into current path (the scClassiy2 folder).

b. Prepare cell state labels as a list and stored it into a JSON file (please check ./label.json file as template). Name the JSON file as input.json and copy it into current path..

The use this:

conda activate scclassify2  
python main.py --trainset_path ./input.h5ad --label_path ./input.json

WebAPP

To provide a resource for the scientific community, we developed a user-friendly web server called scClassify-catalogue, offering a comprehensive catalogue of scClassify2 models trained from datasets covering almost 1,000 cell types and 30 tissue types.

scClassify-catalogue directly delivers annotation results for the selected tissue within a relatively short time. The input of scClassify-catalogue is flexible and allows either the direct output from 10x cellranger software, or a csv file or h5ad file containing the gene expression matrix. Once the job is submitted, a job ID will be made available. A file containing the predicted outcome as well as an HTML file will be emailed to the email address entered by the user once the job finishes. The HTML file visualises the predicted cell type such that the user can easily check the prediction result.

Citation

If you find our codes useful, please consider citing our work:

@article{scClassify2,
  title={A Message Passing Framework for Precise Cell State Identification with scClassify2},
  author={Wenze Ding, Yue Cao, Helen Fu, , Marni Torkel and Jean Yang},
  journal={},
  year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
code_with_HOPACH		code_with_HOPACH
README.md		README.md
Server.png		Server.png
data.py		data.py
environment.yaml		environment.yaml
gene2vec_dim_200_iter_9.txt		gene2vec_dim_200_iter_9.txt
label.json		label.json
loss_function.py		loss_function.py
main.py		main.py
model.py		model.py
network_architecture.png		network_architecture.png
scClassify2.png		scClassify2.png
train_set.h5ad		train_set.h5ad
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scClassify2

Table of Contents

Installation

Usage

1.try with demo data

2.try with demo custom data

WebAPP

Citation

About

Releases

Packages

Languages

SydneyBioX/scClassify2

Folders and files

Latest commit

History

Repository files navigation

scClassify2

Table of Contents

Installation

Usage

1.try with demo data

2.try with demo custom data

WebAPP

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages