Skip to content

A GCN and pathology image large model-based method to predict single-cell resolution spatial gene expression by integrating multimodal information.

License

Notifications You must be signed in to change notification settings

wenwenmin/scstGCN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scstGCN

Overview

Ideal ST data should have single-cell resolution and cover the entire tissue surface,but generating such ST data with existing platforms remains challenging scstGCN is a GCN-based method that leverages a weakly supervised learning framework to integrate multimodal information and then infer super-resolution gene expression at single-cell level. It first extract high-resolution multimodal feature map, including histological feature map, positional feature map, and RGB feature map. and then use the GCN module to predict super-resolution gene expression from multimodal feature map by a weakly supervised framework. scstGCN can predict super-resolution gene expression accurately, aid researchers in discovering biologically meaningful differentially expressed genes and pathways. Additionally, it can predict expression both outside the spots and in external tissue sections.

Overview.png

Installations

  • NVIDIA GPU (a single Nvidia GeForce RTX 3090)
  • pip install -r requiremnt.txt

Data

All the datasets used in this paper can be downloaded from url:https://zenodo.org/records/12800375.

Data format

  • he-raw.jpg: The original histological image.
  • cnts.csv: Spot-based gene expression data, where each row represents a spot and each column represents a gene.
  • locs-raw.csv: All spots’ two-dimensional coordinates information, where each row represents a spot corresponding to cnts.csv. The first and second columns in this files represent x-coordinate and y-coordinate, respectively. The units of coordinates information are pixels corresponding to the histological image.
  • pixel-size-raw.txt: The actual physical size corresponding to each pixel in the histological image, measured in micrometers.
  • radius-raw.txt: The number of pixels in histological image corresponding to the radius of a spot.

Data preprocessing

If you want to experiment with Visium HD data at single-cell resolution, you need to go through the following steps to get the spot-based Pseudo-ST data:

  • get_pseudo_loc.py: You can obtain the position coordinates of the spot-based Pseudo-ST data through this script. The entire detected tissue will be covered by Pseudo-spots. Depending on the data characteristics, you may need to adjust the diameter variable in this script.
  • pixel_to_spot.py: You can obtain the spatial gene expression of the spot-based Pseudo-ST data through this script. The genes of all superpixels covered by the spot will be summed as its gene expression.

Getting access

In our multimodal feature mapping extractor, the ViT architecture utilizes a self-pretrained model called UNI. You need to request access to the model weights from the Huggingface model page at:https://huggingface.co/mahmoodlab/UNI. It is worth noting that you need to apply for access to UNI login and replace it in the demo.ipynb.

Running demo

We provide a examples for predicting super-resolution gene expression data of 10X Visium human dorsolateral prefrontal cortex tissue, please refer to demo.ipynb.

Baselines

We have listed the sources of some representative baselines below, and we would like to express our gratitude to the authors of these baselines for their generous sharing.

  • iStar super-resolution gene expression from hierarchical histological features using a feedforward neural network.
  • XFuse integrates Spatial transcriptomics (ST) data and histology images using a deep generative model to infer super-resolution gene expression profiles.
  • TESLA generates high-resolution gene expression profiles based on Euclidean distance metric, which considers the similarity in physical locations and histology image features between superpixels and measured spots.
  • STAGE to generate gene expression data for unmeasured spots or points from Spatial Transcriptomics with a spatial location-supervised Auto-encoder GEnerator by integrating spatial information and gene expression data.

Acknowledgements

Part of the code, such as the training framework based on pytorch lightning and the method for mask image in this repository is adapted from the iStar. And the Vision Transformer in this repository has been pre-trained by UNI. We are grateful to the authors for their excellent work.

Contact details

If you have any questions, please contact [email protected].

Citing

The corresponding BiBTeX citation are given below:

@article{xue2025inferring,
  title={Inferring single-cell resolution spatial gene expression via fusing spot-based spatial transcriptomics, location, and histology using GCN},
  author={Xue, Shuailin and Zhu, Fangfang and Chen, Jinyu and Min, Wenwen},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={1},
  pages={bbae630},
  year={2025},
  publisher={Oxford University Press}
}

Article link

https://doi.org/10.1093/bib/bbae630

About

A GCN and pathology image large model-based method to predict single-cell resolution spatial gene expression by integrating multimodal information.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published