Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle nested properties in GeoJSON reader and writer #208

Merged
merged 3 commits into from
May 14, 2024

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Apr 12, 2024

The GeoJSONReader currently silently omits any nested property. This serializes the Value for an Array or Object back to a string to store within a ColumnValue::Json.

The GeoJSONWriter currently silently omits any ColumnValue::Json or ColumnValue::Binary. This PR writes the contained JSON string to the output writer. It leaves writing binary to a future PR (which should probably at least have the option of converting to hex?)

I'm not sure why but I was unable to run tests locally because they failed with

  = note: ld: library 'pq' not found

I'm guessing this has something to do with Postgres' libpq, but brew install libpq says it's already installed.

@@ -208,6 +208,12 @@ fn write_str_prop<W: Write>(mut out: W, colname: &str, v: &str) -> Result<()> {
Ok(())
}

fn write_json_prop<W: Write>(mut out: W, colname: &str, v: &str) -> Result<()> {
let colname = colname.replace('\"', "\\\"");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this escaping about? Is this captured in a test somewhere?

Copy link
Member Author

@kylebarron kylebarron Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's identical to the other write* methods

fn write_num_prop<W: Write>(mut out: W, colname: &str, v: &dyn Display) -> Result<()> {
let colname = colname.replace('\"', "\\\"");

fn write_str_prop<W: Write>(mut out: W, colname: &str, v: &str) -> Result<()> {
let colname = colname.replace('\"', "\\\"");

Any embedded " in the column name must be escaped with a \" or else the JSON will be invalid. I'm not sure if there's an existing test for this, but it would be separate from this change at hand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I created #209 as a followup task.

Comment on lines +210 to +219
JsonValue::Number(v) => {
if v.is_f64() {
processor.property(i, key, &ColumnValue::Double(v.as_f64().unwrap()))?
} else if v.is_i64() {
processor.property(i, key, &ColumnValue::Long(v.as_i64().unwrap()))?
} else if v.is_u64() {
processor.property(i, key, &ColumnValue::ULong(v.as_u64().unwrap()))?
} else {
unreachable!()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was refactored so that the outer match value could be exhaustive.

kylebarron added a commit to geoarrow/geoarrow-rs that referenced this pull request May 8, 2024
### Change list

- Support writing Struct, Array, Datetime columns from GeoArrow through
geozero.
- Vendor geozero GeoJSON updates
(georust/geozero#208 and
georust/geozero#206)
- Handle null attribute values when writing to geozero.
- add `writeGeoJSON` to JS binding
- 

Closes #588. 

~~I'm still getting issues with the GeoJSONWriter excluding fields. In
particular, writing Overture data to GeoJSON completely omits list and
struct fields, giving invalid GeoJSON with ", ,"~~

---- 

Note: tested this via the Python API
@kylebarron
Copy link
Member Author

Anything else I can do to push this along?

Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaelkirk michaelkirk added this pull request to the merge queue May 14, 2024
Merged via the queue into georust:main with commit 05daf37 May 14, 2024
1 check passed
// For null values omit the property
JsonValue::Null => false,
// Array(Vec<Value>), Object(Map<String, Value>)
_ => processor.property(i, key, &ColumnValue::String(&value.to_string()))?,
Copy link
Member

@michaelkirk michaelkirk May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kylebarron - just working on some changelog notes... your PR description says

The GeoJSONReader currently silently omits any nested property. This serializes the Value for an Array or Object back to a string to store within a ColumnValue::Json.

But it looks like we were not omitting them, rather we were previously decoding them as ColumnValue::String, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe I conflated the reading and writing parts in the PR description. I don't remember exactly right now.

I think the primary change is that we change from the String variant to Json. I don't know what was the behavior of value.to_string(). I assume that was not guaranteed to be the same as serde_json::to_string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the output is the same. I still think your change to the Reader makes sense - if we have JSON we might as well put it in a ColumnValue::JSON rather than a ColumnValue::String. I just wanted to get the changelog notes correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants