You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A little bit I started on this PR #967 which introduces a function that allows users to subset their entire SpatialData objects by certain criteria. The larger goal is to emulate thisScanpy notebook and to make Squidpy basically the biologist-friendly interface to SpatialData. Subsetting your object will be the first step in that journey.
For this, there are several considerations:
a given SpatialData object can contain 0-n AnnData objects
these AnnData objects can annotate 0-n other objects, f.e. segmentation masks, shapes (like for Visium), ROIs or even points
a given subsetting step on the AnnData object needs to find all instances that are annotated by these soon-to-be-gone observations in all other elements and deal with them accordingly:
segmentation masks -> set to 0 (background)
potentially: remove transcript locations falling into these segmentation masks
shapes -> remove
points -> remove
etc
However, there are additional constraints and open questions that are important for the implementation.
We can f.e. store segmentation masks as DataTrees with different scales - is it faster to subset the original resolution and to then regenerate the tree or subset all scales individually?
How do we handle inplace True vs False? Returning a copy can easily mean doubling a 500 GB object.
Some other edge cases might only really show up once there.
Generally, the goal should be to identify relevant subfunctions and push these upstream to SpatialData, some might already exist there and just need to be found (realistically by asking @LucaMarconato, there's quite a few functions only he really knows about), other might need to be written and pushed upstream. Ideally Squidpy then chains together these functions into something with good UX.
The text was updated successfully, but these errors were encountered:
We can f.e. store segmentation masks as DataTrees with different scales - is it faster to subset the original resolution and to then regenerate the tree or subset all scales individually?
I'd try benchmark this. It's all lazy anyway until things are written to disk. One important consideration is that if we want to subset and access the data in memory. In that case if we compute all the scales from the first, then we need to write it and reread it (or call .persist() from dask), otherwise the computation of the lower scales from the first scale is re-performed everytime! This would favor slicing every scale instead of computing them from the largest.
Generally, the goal should be to identify relevant subfunctions and push these upstream to SpatialData, some might already exist there and just need to be found (realistically by asking @LucaMarconato, there's quite a few functions only he really knows about), other might need to be written and pushed upstream. Ideally Squidpy then chains together these functions into something with good UX.
For table-based subsetting this is the most up-to-date code we have: scverse/spatialdata#894
Uh oh!
There was an error while loading. Please reload this page.
A little bit I started on this PR #967 which introduces a function that allows users to subset their entire
SpatialData
objects by certain criteria. The larger goal is to emulate thisScanpy
notebook and to makeSquidpy
basically the biologist-friendly interface toSpatialData
. Subsetting your object will be the first step in that journey.For this, there are several considerations:
SpatialData
object can contain 0-nAnnData
objectsAnnData
objects can annotate 0-n other objects, f.e. segmentation masks, shapes (like for Visium), ROIs or even pointsHowever, there are additional constraints and open questions that are important for the implementation.
True
vsFalse
? Returning a copy can easily mean doubling a 500 GB object.Some other edge cases might only really show up once there.
Generally, the goal should be to identify relevant subfunctions and push these upstream to
SpatialData
, some might already exist there and just need to be found (realistically by asking @LucaMarconato, there's quite a few functions only he really knows about), other might need to be written and pushed upstream. IdeallySquidpy
then chains together these functions into something with good UX.The text was updated successfully, but these errors were encountered: