-
Notifications
You must be signed in to change notification settings - Fork 8
WKB module: proposed plan #46
Description
Following up on the discussion in #44 and #45.
Context
The parser on kyle/wkb (src/io/wkb.ts) covers Point, LineString, Polygon using @loaders.gl/wkt for WKB decoding. The caller passes geometry type and dimension explicitly. This plan proposes replacing that with direct DataView parsing.
| kyle/wkb | Proposed | |
|---|---|---|
| Dependencies | @loaders.gl/wkt, @loaders.gl/schema | None |
| Geometry types | Point, LineString, Polygon | All 6 OGC types |
| Type/dim detection | Caller-supplied | Auto-detected from WKB header |
| EWKB (PostGIS) | No | SRID, Z/M flags |
| Null handling | No | Arrow validity bitmap |
| LargeWKB | No | Int64 offsets (new scope, not in #45 prototype) |
Rust reference
The actual WKB byte parsing in the Rust ecosystem lives in georust/wkb. geoarrow-rs calls into it from cast.rs. The geoarrow-rs modules relevant to the TypeScript side:
| geoarrow-rs | geoarrow-js | What |
|---|---|---|
datatypes/dimension.rs |
wkb/types.ts |
WkbType enum (1-6), Dimension enum, coordSize() |
capacity/{point,...}.rs |
wkb/capacity.ts |
Pre-scan buffer sizes without parsing coords |
cast.rs |
wkb/reader.ts |
Two-pass scan+fill, parseWkb() entry point |
array/{point,...}.rs |
wkb/reader.ts (fill fns) |
Build Arrow arrays from scanned WKB |
builder/ |
TBD | Incremental array construction. Arrow.js makeData requires full pre-allocated arrays, so builders need a TypeScript equivalent |
| (no equivalent) | wkb/header.ts |
ISO WKB + EWKB header parsing |
Not porting: mixed.rs, geometrycollection.rs (would need a Mixed array type that doesn't exist in geoarrow-js yet), geozero/ (no intermediate repr needed), scalar/, rect.rs, wkb_view.rs, wkt.rs.
Type design: discriminated Capacity union
The as casts flagged on #45 (capacity as PointCapacity, etc.) can be eliminated with a discriminated union keyed on WkbType:
type PointCapacity = { type: WkbType.Point; geomCapacity: number };
type LineStringCapacity = { type: WkbType.LineString; coordCapacity: number; geomCapacity: number };
type PolygonCapacity = { type: WkbType.Polygon; coordCapacity: number; ringCapacity: number; geomCapacity: number };
type MultiPointCapacity = { type: WkbType.MultiPoint; coordCapacity: number; partCapacity: number; geomCapacity: number };
type MultiLineStringCapacity = { type: WkbType.MultiLineString; coordCapacity: number; ringCapacity: number; partCapacity: number; geomCapacity: number };
type MultiPolygonCapacity = { type: WkbType.MultiPolygon; coordCapacity: number; ringCapacity: number; polygonCapacity: number; geomCapacity: number };
type Capacity = PointCapacity | LineStringCapacity | PolygonCapacity
| MultiPointCapacity | MultiLineStringCapacity | MultiPolygonCapacity;Then switch (capacity.type) narrows automatically. Mirrors how geoarrow-rs uses Rust enums.
Alternative: function overloads on emptyCapacity(type: WkbType.Point): PointCapacity, etc.
Proposed PR sequence
PR 1: types + header
wkb/types.ts (enums, coordSize) + wkb/header.ts (ISO WKB + EWKB parsing) + tests.
PR 2: capacity
wkb/capacity.ts (discriminated union, emptyCapacity(), addGeometryCapacity()) + tests against known WKB hex.
PR 3: reader + cleanup
wkb/reader.ts (two-pass parseWkb) + per-type tests + geoarrow-data corpus tests + drop @loaders.gl + wire up exports.
Test corpus
A standalone WKB/GeoArrow test corpus (like geotiff-test-data) covering edge cases that geoarrow-data doesn't: empty geometries, large coordinate counts, mixed endianness, EWKB with SRID.
Status
#45 converted to draft. This plan is parked until there's bandwidth to review.
AI (Claude) supported my development of this plan.