Skip to content

This adds basic implementation as an arrow-go extension#2

Merged
dwilson1988 merged 2 commits intomainfrom
feat/initial-baseline
Apr 3, 2026
Merged

This adds basic implementation as an arrow-go extension#2
dwilson1988 merged 2 commits intomainfrom
feat/initial-baseline

Conversation

@dwilson1988
Copy link
Copy Markdown
Collaborator

This PR migrates the contents of apache/arrow-go#713 into this repo

This does not yet address @paleolimbot's comments such as on CRS equality and will be covered in a future PR. This just puts something here for us to work on, but consider it rough clay.

@dwilson1988 dwilson1988 requested a review from paleolimbot March 23, 2026 20:42
Includes point, polygon, WKB types with builders, geometry conversion
utilities (gogeom), metadata handling, and test coverage.
Copy link
Copy Markdown
Contributor

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

I took a pass but I'm ignorant of the Go-specific details 🙂. You're welcome to merge this and fix things incrementally but there's a sufficient number of potential issues here that I think we'll be able to produce something higher quality faster with smaller more polished PRs (e.g., perhaps containing documentation unless it's idiomatic in the Go world to omit it?)

I think the tests would benefit from a geoarrow-data submodule (or copy of the example/ files). You can set up a test that roundtrips iterating over values and building new arrays and asserting equal storage and an LLM sort out any issues that arise. You could do the same for WKT and JSON (and for WKT the values are provided in the file so you can check parsing or generating those).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we nest these into a subdirectory so that we can add other go packages to this repo? (e.g., I see this is how https://github.com/apache/arrow-go/ is set up)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geometryArray and valueBuilder are internal generic types that I believe should be unexported. This does not prohibit users from implementing other geometry packages, but if it becomes an issue, it's easier to export them since that doesn't require any decrease in API surface compared to the other way around. arrow-go needs to export them so that users outside of the package can access them. In our case, they do not since we have type aliases for the concrete types (which themselves are really just containers for Arrow values).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just meant that if you put all your .go files at the root of the repo, it will be a breaking change for importers if you want to add another package that also lives in this repo (so maybe starting with one level of nesting would be a good plan).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are arguments either way, but typically go libraries structure it this way. iceberg-go does, actually. Adding in other packages here isn't a problem really, they'd just get different input paths. Keeping the geoarrow package at the root is the cleanest option for someone that just wants to use the geoarrow extension types. All they need to do is:
import _ "githhub.com/geoarrow/geoarrow-go"

whereas with nesting it would be
import _ "github.com/geoarrow/geoarrow-go/geoarrow"

Not a big deal either way. I'll let @zeroshade way in on this since it is intended to be a sibling to arrow-go.

Comment on lines +98 to +102
for i, r := range e.meta.CRS {
if r != other.meta.CRS[i] {
return false
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for all of omitted / string / PROJJSON cases?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - I'll create an issue for safer CRS handling. I expect any earlier users that would be messing around in this experimental repo would likely just be using EPSG:4326 anyway! :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could check serialized bytes of the lhs and rhs as a safe but too strict default until you have time to circle back?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what this is doing already, isn't it? This just checks raw byte equality for rhs and lhs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust you!

Comment on lines +29 to +33
// PointsToGeom converts a geoarrow PointArray to a slice of go-geom Points.
// Null entries produce nil pointers. The conversion reads directly from the
// underlying Arrow struct storage arrays without intermediate copies.
func PointsToGeom(arr *geoarrow.PointArray) []*geom.Point {
structArr := arr.Storage().(*array.Struct)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is an interleaved point?


st := typ.StorageType().(*arrow.StructType)
nFields := st.NumFields()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is an interleaved point?

Comment on lines +37 to +41
// Pre-extract the typed field arrays once
fields := make([]*array.Float64, nFields)
for i := range nFields {
fields[i] = structArr.Field(i).(*array.Float64)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle the case where one or more of the child arrays has a non-zero offset?

Comment on lines +122 to +127
nFields := coordStruct.NumFields()
dim := geoarrow.XY
switch nFields {
case 3:
if coordStruct.Field(2).Name == "z" {
dim = geoarrow.XYZ
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few copies of this with subtly different heruistics

Comment on lines +149 to +150
ringStart, ringEnd := outerList.ValueOffsets(i)
nRings := int(ringEnd - ringStart)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle the case where either the out list or any of the inner lists have non-zero offsets?

@dwilson1988
Copy link
Copy Markdown
Collaborator Author

Cool!

I took a pass but I'm ignorant of the Go-specific details 🙂. You're welcome to merge this and fix things incrementally but there's a sufficient number of potential issues here that I think we'll be able to produce something higher quality faster with smaller more polished PRs (e.g., perhaps containing documentation unless it's idiomatic in the Go world to omit it?)

I think the tests would benefit from a geoarrow-data submodule (or copy of the example/ files). You can set up a test that roundtrips iterating over values and building new arrays and asserting equal storage and an LLM sort out any issues that arise. You could do the same for WKT and JSON (and for WKT the values are provided in the file so you can check parsing or generating those).

For sure - I just wanted to get something semi-functional in place - I'll go ahead and merge and create issues for these comments.

As far as the major version pinning - yes, that's going to be a problem and I don't know a great away around it - We'll need to tag this to track with arrow-go. Go uses /v{major version} in the import path to avoid any chance of a major version causing breaking changes. For now we'll need to go 0ver until we can release major versions in lock step with arrow-go.

I'll address low hanging fruit comments here and then merge, create issues for the rest.

@zeroshade
Copy link
Copy Markdown

I've intentionally been doing whatever I can to avoid major version releases of arrow-go for this reason. Don't worry, you'll have plenty of notice if there's gonna be a major version update for arrow-go, but it'll only happen if there's no choice to but have a breaking change.

I'll give this a review look over in the next day or so

@dwilson1988
Copy link
Copy Markdown
Collaborator Author

@paleolimbot - I think I addressed most of your comments.

  • Added interleaved for points, will create an issue for polygons.
  • Will open issue on CRS stuff
  • Still need to decide repo structure
  • Fixed slice offset issue
  • More tests (mostly for interleaved)
  • removed duplicates, tried to reduce repetition, but probably needs another pass.

Big changeset, just wanted to get something "real" we can work with.

@dwilson1988 dwilson1988 changed the title This adds basic implementation as an arrow-go extenion This adds basic implementation as an arrow-go extension Mar 26, 2026
@dwilson1988
Copy link
Copy Markdown
Collaborator Author

@paleolimbot and @zeroshade - you good with me merging this as the starting point? It's reasonably functional, just missing a bunch of the native types. I should be able to fill this in pretty quickly and we can work off issues before we give this a tagged release.

@paleolimbot
Copy link
Copy Markdown
Contributor

Yes, go for it! We can review after the fact if it seems like a good changeset to review.

@dwilson1988 dwilson1988 merged commit f54751c into main Apr 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants