Function to subset the entire spatialdata object #1007

timtreis · 2025-05-26T15:22:23Z

A little bit I started on this PR #967 which introduces a function that allows users to subset their entire SpatialData objects by certain criteria. The larger goal is to emulate this Scanpy notebook and to make Squidpy basically the biologist-friendly interface to SpatialData. Subsetting your object will be the first step in that journey.

For this, there are several considerations:

a given SpatialData object can contain 0-n AnnData objects
these AnnData objects can annotate 0-n other objects, f.e. segmentation masks, shapes (like for Visium), ROIs or even points
a given subsetting step on the AnnData object needs to find all instances that are annotated by these soon-to-be-gone observations in all other elements and deal with them accordingly:
- segmentation masks -> set to 0 (background)
  - potentially: remove transcript locations falling into these segmentation masks
- shapes -> remove
- points -> remove
- etc

However, there are additional constraints and open questions that are important for the implementation.

We can f.e. store segmentation masks as DataTrees with different scales - is it faster to subset the original resolution and to then regenerate the tree or subset all scales individually?
How do we handle inplace True vs False? Returning a copy can easily mean doubling a 500 GB object.

Some other edge cases might only really show up once there.

Generally, the goal should be to identify relevant subfunctions and push these upstream to SpatialData, some might already exist there and just need to be found (realistically by asking @LucaMarconato, there's quite a few functions only he really knows about), other might need to be written and pushed upstream. Ideally Squidpy then chains together these functions into something with good UX.

The text was updated successfully, but these errors were encountered:

LucaMarconato · 2025-05-28T20:18:37Z

We can f.e. store segmentation masks as DataTrees with different scales - is it faster to subset the original resolution and to then regenerate the tree or subset all scales individually?

I'd try benchmark this. It's all lazy anyway until things are written to disk. One important consideration is that if we want to subset and access the data in memory. In that case if we compute all the scales from the first, then we need to write it and reread it (or call .persist() from dask), otherwise the computation of the lower scales from the first scale is re-performed everytime! This would favor slicing every scale instead of computing them from the largest.

LucaMarconato · 2025-05-28T20:19:19Z

How do we handle inplace True vs False? Returning a copy can easily mean doubling a 500 GB object.

No copy is done because the heavy data (images and labels) are lazy. The copy will come only on-disk. So this should not be a problem.

LucaMarconato · 2025-05-28T20:21:18Z

Generally, the goal should be to identify relevant subfunctions and push these upstream to SpatialData, some might already exist there and just need to be found (realistically by asking @LucaMarconato, there's quite a few functions only he really knows about), other might need to be written and pushed upstream. Ideally Squidpy then chains together these functions into something with good UX.

For table-based subsetting this is the most up-to-date code we have: scverse/spatialdata#894

timtreis added enhancement ✨ New feature or request squidpy2.0 Everything releated to a Squidpy 2.0 release labels May 26, 2025

timtreis assigned selmanozleyen May 26, 2025

timtreis linked a pull request May 26, 2025 that will close this issue

Add filter function that subsets the entire object #967

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Function to subset the entire spatialdata object #1007

Function to subset the entire spatialdata object #1007

timtreis commented May 26, 2025 •

edited

Loading

LucaMarconato commented May 28, 2025

Uh oh!

LucaMarconato commented May 28, 2025

Uh oh!

LucaMarconato commented May 28, 2025

Uh oh!

Function to subset the entire spatialdata object #1007

Function to subset the entire spatialdata object #1007

Comments

timtreis commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LucaMarconato commented May 28, 2025

Uh oh!

LucaMarconato commented May 28, 2025

Uh oh!

LucaMarconato commented May 28, 2025

Uh oh!

timtreis commented May 26, 2025 •

edited

Loading