Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Arrow mod, point to geoarrow crate #186

Merged
merged 3 commits into from
Feb 12, 2024

Conversation

kylebarron
Copy link
Member

This is marked as WIP until 0.1 release of the geoarrow crate, expected to be this week (I need arrow-rs to release a new version).

  • This removes the arrow mod in favor of the https://github.com/geoarrow/geoarrow-rs crate, which implements geozero conversions to and from each of the array types.
  • arrow2 was an alternate implementation of the Arrow spec, but has stopped being maintained, so I think the ecosystem (except polars) is focusing on arrow-rs. I don't think there's much benefit in keeping the existing code here.

@michaelkirk
Copy link
Member

which implements geozero conversions to and from each of the array types.

In light of #187, do you think it's desirable to have the geozero integrations live externally? TBH I haven't really figured out which way is "least worst".

@kylebarron
Copy link
Member Author

I agree it's a difficult question. My first thought is: the dependency should go "from the crate that is changing more to the crate that is changing less". Right now, geoarrow is changing heavily while geozero is relatively stable, so it seems like having the geozero implementation in the geoarrow crate makes for easier and faster development iteration.

In relation to #187 and shapefile, maybe it's the other way around; if the core part of parsing shapefiles is more stable than geozero's release cycle, then maybe geozero itself should depend on geozero-shp's public APIs?

Just my initial thoughts; what do you think?

@michaelkirk
Copy link
Member

from the crate that is changing more to the crate that is changing less

It's unfortunately not a very objective or stable criteria to apply, but I do think it's a pragmatic approach. 👍

One small thing to clarify: I don't think we care about non-breaking changes, only semver incompatible changes that can complicate cargo's dependency resolution.

In that sense, the shapefile code might have a better chance of being stable, just by virtue of the format itself being stable. But I suppose that's not necessarily so.

@kylebarron
Copy link
Member Author

kylebarron commented Jan 10, 2024

It's unfortunately not a very objective or stable criteria to apply, but I do think it's a pragmatic approach. 👍

I agree it's not objective, but it feels like Rust's strict version resolution pushes towards this in practice.

With geoarrow specifically, the geozero bindings are pretty significant in size. This entire directory is just for geozero bindings. It seems like it would be more invasive to add all of that to geozero and for maintenance to happen here.

Edit: one more note is that geoarrow's geozero bindings are implemented using geo-traits (e.g. here), which isn't yet stable. Maybe in a future where geo-traits become stable, geozero can have a source that reads from Iterator<Item = impl GeometryTrait<T=f64>>, and then that can decouple a bit from geoarrow

@kylebarron
Copy link
Member Author

FWIW: I published v0.1 of geoarrow and its geozero integration is here: https://docs.rs/geoarrow/0.1.0/geoarrow/io/geozero/index.html

@kylebarron kylebarron marked this pull request as ready for review February 9, 2024 03:44
@pka
Copy link
Member

pka commented Feb 9, 2024

I'm +1 for merging. If there is an external arrow based implementation, we can drop the arrow2 based implementation.

@kylebarron
Copy link
Member Author

Is this good to merge?

@kylebarron kylebarron mentioned this pull request Feb 12, 2024
@michaelkirk
Copy link
Member

Just to make sure I understand, the change notes for this might read like:

Moved the geozero/geoarrow integration from the geozero crate to the geoarrow crate

And the motivation for the change is because the geo-arrow crate is undergoing more frequent breaking changes than geozero (currently anyway).

If so, it's all fine with me.

@kylebarron
Copy link
Member Author

kylebarron commented Feb 12, 2024

Moved the geozero/geoarrow integration from the geozero crate to the geoarrow crate

And the motivation for the change is because the geo-arrow crate is undergoing more frequent breaking changes than geozero (currently anyway).

Moved, yes, but more importantly updated and expanded the geozero/geoarrow integration:

  • There are two different Rust arrow implementations, arrow and arrow2. geozero's existing integration used arrow2, which is now defunct. The geoarrow crate uses the maintained arrow crate.
  • The GeoArrow spec is much broader than what was implemented here in geozero. In particular, GeoArrow defines both "serialized" and "native" encodings. WKB and WKT are serialized encodings because they're O(N) to decode; native encodings are always O(1) access to any coordinate. The existing geozero code supported only a tiny fraction of that: read-only from the WKB serialized encoding. The current geoarrow crate supports read-write from/to a GeoArrow WKB array but also to/from all the native encodings.

The geoarrow crate is indeed likely to change more frequently; I hope to support 3D and 4D coordinates (even if they can't be interop'ed with geo-types) which would likely incur a bunch of breaking changes.

I am considering breaking geoarrow into sub-crates, where the geozero integration would live in geoarrow-io. Maybe at that point the geoarrow-io sub-crate would be stable enough for geozero to depend on it 🤷 .

Another possible consideration for "in which repo should code live" is the current flatgeobuf situation where IIUC there's a circular dependency between the flatgeobuf and geozero crates which makes it hard to make new releases? I'm not sure the right

@kylebarron kylebarron added this pull request to the merge queue Feb 12, 2024
@kylebarron
Copy link
Member Author

It's my first time trying the merge queue and I'm excited!

Merged via the queue into georust:main with commit c1e7b30 Feb 12, 2024
1 check passed
@kylebarron kylebarron deleted the kyle/remove-arrow branch February 12, 2024 21:53
@michaelkirk
Copy link
Member

Thanks for the explanation - I feel like you've explained it before, but it's kind of complicated for someone not steeped in the day to day of arrow, so I appreciate the recap. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants