Replies: 4 comments 8 replies
-
|
I am not an expert on geohashes or timehashes, but my rudimentary understanding of the applications of this extension is that it can be used:
Is this some of the applications you see for this extension, or are you thinking of any others? I would love to hear your on potential applications. OtherI can imagine this extension being used to create an offline browser application of [1] (not affiliated), with no server required. [1] https://www.pickyourplace.app/explore#13.75/49.22717/-123.14209 |
Beta Was this translation helpful? Give feedback.
-
|
I love the idea of adding a convention for generating sortable ids that improve query performance 👏 How would you recommend setting Or, is this feature more applicable to static snapshots of STAC items? |
Beta Was this translation helpful? Give feedback.
-
|
I like this idea a lot! Is there an assumption that the data is evenly distributed such that the partitions will be similar sizes for a given |
Beta Was this translation helpful? Give feedback.
-
|
Possibly related to this-- I've been working on a new geospatial bounding box that uses healpix cells to encode spatial extent coverage. All of our data is at the poles, and the bounding 'box' specification is really just two kitty-corner coordinate pairs rather than all four corner coordinates of the box. This is miserable for Antarctica, so I've been working on a 'morton bounding box' to replace the two lat/lon coordinate pairs-- with an eye towards using these for fast spatial searches via STAC catalogs. A morton index identifies a healpix cell (which is a geospatial hash). Since healpix cells are multi-resolution, a single morton index (which is a single 64-bit integer) will encode a.) how large your cell is, and b.) where that cell is located on the globe. Of course, once you have geospatial collections of data, you begin to have a fairly good chance of that data crossing cell boundaries. The idea with the morton bounding box is to account for this by selecting the four Morton indices that make up the minimum spanning tree of cells that cover input dataset. Since each of the four indices can be a different size of coverage, this ends up being reasonably compact-- here's a real example, showing the data points for a STAC catalog, along with the morton box spanning tree for flight line data over Antarctica (i.e., tree for span = 4):
That's for the top level STAC catalog metadata; each STAC item in the catalog would have there own morton bounding box. The actual morton indices look like this: I'm using 'bounding box' because it's a defined space in the STAC metadata schema to define four 64-bit numbers...but you could get a decent 'polygon' that fits much tighter if you allow for more cells. Here's what it looks like for the same example using 12 instead of 4:
But again, 4 values works for most cases... and parquet is most happy when it's parsing a data structure of fixed rather than variable length. There are 12 total base cells the cover the full globe, so if you're spatial coverage in under a 3rd of the planet, you can likely generate a decent coverage 'box' using the coordinates. For individual STAC items (rather than the catalog), you very rarely need even four coordinates to get a compact footprint, but having space allows for cases where you bump up against the base cell boundaries (i.e., intersecting over the north or south pole). We'll roll out a version of this to solve this duplicate geospatial item issue in the coming week for our users internally, and post more details there if folks are interested. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As we're working on stac-geoparquet, we're finding that there are significant performance improvements when the data are sorted and partitioned.12 As space and time are the two "first-class" attributes of STAC items, it follows that a good sort key should include both. The idea of combining a geohash with a timehash is not particularly novel, but this author hasn't found a good, portable implementation so I'm experimenting with some ideas over in rustac (PR). I'm thinking it may make sense to extract the code to its own repository and provide Python+WASM bindings, so folks don't need to use rustac to use the hasher.
Currently, my implementation has three inputs to its hashing function:
spatial precisionin degreestemporal precisionas a duration (e.g. one hour or one day)temporal extentThere should be an optional fourth input3,
spatial extent, but for simplicity I'm currently using the entire WGS84 domain. If you know your temporal extent, and are happy with the global domain, the algorithm can calculate the spatial and temporal precisions for you and apply the hash to a stream of items (no need to load them all into memory). Currently, I'm using the hash as a prefix for item ids, which should enable faster needle-in-a-haystack (aka search-by-id) queries against stac-geoparquet; one can imagine other uses for a hash (partitioning, filtering, etc).Very simple usage looks like this:
Because you can only compare hash values if they were generated with the same input parameters, we'll need to store those parameters on the items. It probably also makes sense to store them on a collection, so that a collection can "advertise" what hasher params should be used for its items. All this leads us to propose a new STAC extension, and I'd love to hear your thoughts and feedback.
Extension proposal
hashuint32oruint64(defaultuint64)base64orbase16(defaultbase16)Note
The extension will also need to define the hashing algorithm, so others may implement.
Note
I'm not quite sure how the "prefix the item id with the hash" stuff fits in. Should that be part of the extension spec?
Footnotes
https://www.gadom.ski/posts/stac-geoparquet-organization/ ↩
https://github.com/opengeospatial/geoparquet/blob/main/format-specs/distributing-geoparquet.md#spatial-partitioning ↩
And really another fifth one to set the precision of the output hash (
u32oru64, or maybe more?) ↩Beta Was this translation helpful? Give feedback.
All reactions