This adds basic implementation as an arrow-go extension#2
Conversation
Includes point, polygon, WKB types with builders, geometry conversion utilities (gogeom), metadata handling, and test coverage.
96202b3 to
b9ba6a9
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Cool!
I took a pass but I'm ignorant of the Go-specific details 🙂. You're welcome to merge this and fix things incrementally but there's a sufficient number of potential issues here that I think we'll be able to produce something higher quality faster with smaller more polished PRs (e.g., perhaps containing documentation unless it's idiomatic in the Go world to omit it?)
I think the tests would benefit from a geoarrow-data submodule (or copy of the example/ files). You can set up a test that roundtrips iterating over values and building new arrays and asserting equal storage and an LLM sort out any issues that arise. You could do the same for WKT and JSON (and for WKT the values are provided in the file so you can check parsing or generating those).
There was a problem hiding this comment.
Should we nest these into a subdirectory so that we can add other go packages to this repo? (e.g., I see this is how https://github.com/apache/arrow-go/ is set up)
There was a problem hiding this comment.
geometryArray and valueBuilder are internal generic types that I believe should be unexported. This does not prohibit users from implementing other geometry packages, but if it becomes an issue, it's easier to export them since that doesn't require any decrease in API surface compared to the other way around. arrow-go needs to export them so that users outside of the package can access them. In our case, they do not since we have type aliases for the concrete types (which themselves are really just containers for Arrow values).
There was a problem hiding this comment.
I think I just meant that if you put all your .go files at the root of the repo, it will be a breaking change for importers if you want to add another package that also lives in this repo (so maybe starting with one level of nesting would be a good plan).
There was a problem hiding this comment.
There are arguments either way, but typically go libraries structure it this way. iceberg-go does, actually. Adding in other packages here isn't a problem really, they'd just get different input paths. Keeping the geoarrow package at the root is the cleanest option for someone that just wants to use the geoarrow extension types. All they need to do is:
import _ "githhub.com/geoarrow/geoarrow-go"
whereas with nesting it would be
import _ "github.com/geoarrow/geoarrow-go/geoarrow"
Not a big deal either way. I'll let @zeroshade way in on this since it is intended to be a sibling to arrow-go.
| for i, r := range e.meta.CRS { | ||
| if r != other.meta.CRS[i] { | ||
| return false | ||
| } | ||
| } |
There was a problem hiding this comment.
Does this work for all of omitted / string / PROJJSON cases?
There was a problem hiding this comment.
No - I'll create an issue for safer CRS handling. I expect any earlier users that would be messing around in this experimental repo would likely just be using EPSG:4326 anyway! :)
There was a problem hiding this comment.
Perhaps you could check serialized bytes of the lhs and rhs as a safe but too strict default until you have time to circle back?
There was a problem hiding this comment.
that's what this is doing already, isn't it? This just checks raw byte equality for rhs and lhs.
gogeom/convert.go
Outdated
| // PointsToGeom converts a geoarrow PointArray to a slice of go-geom Points. | ||
| // Null entries produce nil pointers. The conversion reads directly from the | ||
| // underlying Arrow struct storage arrays without intermediate copies. | ||
| func PointsToGeom(arr *geoarrow.PointArray) []*geom.Point { | ||
| structArr := arr.Storage().(*array.Struct) |
There was a problem hiding this comment.
What happens if this is an interleaved point?
gogeom/convert.go
Outdated
|
|
||
| st := typ.StorageType().(*arrow.StructType) | ||
| nFields := st.NumFields() | ||
|
|
There was a problem hiding this comment.
What happens if this is an interleaved point?
gogeom/convert.go
Outdated
| // Pre-extract the typed field arrays once | ||
| fields := make([]*array.Float64, nFields) | ||
| for i := range nFields { | ||
| fields[i] = structArr.Field(i).(*array.Float64) | ||
| } |
There was a problem hiding this comment.
Does this handle the case where one or more of the child arrays has a non-zero offset?
gogeom/convert.go
Outdated
| nFields := coordStruct.NumFields() | ||
| dim := geoarrow.XY | ||
| switch nFields { | ||
| case 3: | ||
| if coordStruct.Field(2).Name == "z" { | ||
| dim = geoarrow.XYZ |
There was a problem hiding this comment.
I think there are a few copies of this with subtly different heruistics
| ringStart, ringEnd := outerList.ValueOffsets(i) | ||
| nRings := int(ringEnd - ringStart) |
There was a problem hiding this comment.
Does this handle the case where either the out list or any of the inner lists have non-zero offsets?
For sure - I just wanted to get something semi-functional in place - I'll go ahead and merge and create issues for these comments. As far as the major version pinning - yes, that's going to be a problem and I don't know a great away around it - We'll need to tag this to track with arrow-go. Go uses /v{major version} in the import path to avoid any chance of a major version causing breaking changes. For now we'll need to go 0ver until we can release major versions in lock step with arrow-go. I'll address low hanging fruit comments here and then merge, create issues for the rest. |
|
I've intentionally been doing whatever I can to avoid major version releases of arrow-go for this reason. Don't worry, you'll have plenty of notice if there's gonna be a major version update for arrow-go, but it'll only happen if there's no choice to but have a breaking change. I'll give this a review look over in the next day or so |
|
@paleolimbot - I think I addressed most of your comments.
Big changeset, just wanted to get something "real" we can work with. |
|
@paleolimbot and @zeroshade - you good with me merging this as the starting point? It's reasonably functional, just missing a bunch of the native types. I should be able to fill this in pretty quickly and we can work off issues before we give this a tagged release. |
|
Yes, go for it! We can review after the fact if it seems like a good changeset to review. |
This PR migrates the contents of apache/arrow-go#713 into this repo
This does not yet address @paleolimbot's comments such as on CRS equality and will be covered in a future PR. This just puts something here for us to work on, but consider it rough clay.