Questions on Spatial Coordinates and Large ST Data in gsMap

Dear Dr. Song,
First, I would like to express my sincere respect for your work. 
In my view, gsMap is a major methodological advance that elegantly overcomes several intrinsic limitations of traditional GWAS interpretation. I truly appreciate this contribution to the field.
While using gsMap for large-scale ST datasets, I encountered several conceptual and practical questions. I would be very grateful if you could provide clarification or suggestions:

1. How is “spatially connected” defined in gsMap? Does it rely strictly on XY coordinates from the ST file?
In the Supplementary Note, gsMap states: 

> “we leverage the graph attention (GAT) layer to aggregate information from spatially connected spots.”

May I confirm whether “spatially connected” refers specifically to adjacency based on the XY coordinates in the original ST dataset? This question becomes important because some ST datasets, such as the human dataset from Science ([https://www.science.org/doi/10.1126/science.add7046](https://www.science.org/doi/10.1126/science.add7046)), do not provide true physical XY coordinates, but rather pseudo-layout coordinates for visualization. If gsMap builds the graph strictly from these XY coordinates, would this lead to inaccuracies in the aggregation step or in the resulting GSS/PCC values?

2. How should gsMap handle datasets with true 3D coordinates (XYZ)?
For example, the ST dataset from Nature ([https://www.nature.com/articles/s41586-023-06812-z](https://www.nature.com/articles/s41586-023-06812-z)) provides 3D spatial coordinates (XYZ). gsMap currently takes only XY coordinates: If I discard the Z dimension, will the loss of 3D spatial neighborhood information distort the spatial graph? Would you recommend projecting XYZ into 2D, or using another method to preserve true spatial relationships?

3. The Nature dataset  ([https://www.nature.com/articles/s41586-023-06812-z](https://www.nature.com/articles/s41586-023-06812-z))  contains ~4 million cells, and generating the GSS requires extremely large memory. To solve this, I considered splitting the dataset into many parts and running gsMap separately. However, two problems arise:

    3a. Splitting by XY coordinates risks losing entire brain regions in some partitions. So, if splitting by cell type would be better? Or is there a recommended or optimal splitting strategy for gsMap?

    3b. After splitting, each part will compute its own GSS and SNP annotations. This may lead to the same gene having different annotations in different parts, even though it is biologically identical. How should one deal with this issue? Should annotation consistency be enforced across all parts?

4. I plan to integrate PCC results from several ST datasets. However, different datasets may assign different spatial or cell-type GSS patterns to the same gene (e.g., a gene appears glutamatergic-specific in dataset A but GABAergic-specific in dataset B). How should such discrepancies be interpreted or handled?

I don't know if these concerns make sense within the gsMap, or would you recommend alternative approaches to address them?

Thank you very much for your time and for developing such an impactful tool. I would deeply appreciate any insight, clarification, or recommendations you could provide.

Best regards,
Yubin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions on Spatial Coordinates and Large ST Data in gsMap #107

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions on Spatial Coordinates and Large ST Data in gsMap #107

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions