Skip to content

WKB module: proposed plan #46

@wietzesuijker

Description

@wietzesuijker

Following up on the discussion in #44 and #45.

Context

The parser on kyle/wkb (src/io/wkb.ts) covers Point, LineString, Polygon using @loaders.gl/wkt for WKB decoding. The caller passes geometry type and dimension explicitly. This plan proposes replacing that with direct DataView parsing.

kyle/wkb Proposed
Dependencies @loaders.gl/wkt, @loaders.gl/schema None
Geometry types Point, LineString, Polygon All 6 OGC types
Type/dim detection Caller-supplied Auto-detected from WKB header
EWKB (PostGIS) No SRID, Z/M flags
Null handling No Arrow validity bitmap
LargeWKB No Int64 offsets (new scope, not in #45 prototype)

Rust reference

The actual WKB byte parsing in the Rust ecosystem lives in georust/wkb. geoarrow-rs calls into it from cast.rs. The geoarrow-rs modules relevant to the TypeScript side:

geoarrow-rs geoarrow-js What
datatypes/dimension.rs wkb/types.ts WkbType enum (1-6), Dimension enum, coordSize()
capacity/{point,...}.rs wkb/capacity.ts Pre-scan buffer sizes without parsing coords
cast.rs wkb/reader.ts Two-pass scan+fill, parseWkb() entry point
array/{point,...}.rs wkb/reader.ts (fill fns) Build Arrow arrays from scanned WKB
builder/ TBD Incremental array construction. Arrow.js makeData requires full pre-allocated arrays, so builders need a TypeScript equivalent
(no equivalent) wkb/header.ts ISO WKB + EWKB header parsing

Not porting: mixed.rs, geometrycollection.rs (would need a Mixed array type that doesn't exist in geoarrow-js yet), geozero/ (no intermediate repr needed), scalar/, rect.rs, wkb_view.rs, wkt.rs.

Type design: discriminated Capacity union

The as casts flagged on #45 (capacity as PointCapacity, etc.) can be eliminated with a discriminated union keyed on WkbType:

type PointCapacity = { type: WkbType.Point; geomCapacity: number };
type LineStringCapacity = { type: WkbType.LineString; coordCapacity: number; geomCapacity: number };
type PolygonCapacity = { type: WkbType.Polygon; coordCapacity: number; ringCapacity: number; geomCapacity: number };
type MultiPointCapacity = { type: WkbType.MultiPoint; coordCapacity: number; partCapacity: number; geomCapacity: number };
type MultiLineStringCapacity = { type: WkbType.MultiLineString; coordCapacity: number; ringCapacity: number; partCapacity: number; geomCapacity: number };
type MultiPolygonCapacity = { type: WkbType.MultiPolygon; coordCapacity: number; ringCapacity: number; polygonCapacity: number; geomCapacity: number };

type Capacity = PointCapacity | LineStringCapacity | PolygonCapacity
  | MultiPointCapacity | MultiLineStringCapacity | MultiPolygonCapacity;

Then switch (capacity.type) narrows automatically. Mirrors how geoarrow-rs uses Rust enums.

Alternative: function overloads on emptyCapacity(type: WkbType.Point): PointCapacity, etc.

Proposed PR sequence

PR 1: types + header
wkb/types.ts (enums, coordSize) + wkb/header.ts (ISO WKB + EWKB parsing) + tests.

PR 2: capacity
wkb/capacity.ts (discriminated union, emptyCapacity(), addGeometryCapacity()) + tests against known WKB hex.

PR 3: reader + cleanup
wkb/reader.ts (two-pass parseWkb) + per-type tests + geoarrow-data corpus tests + drop @loaders.gl + wire up exports.

Test corpus

A standalone WKB/GeoArrow test corpus (like geotiff-test-data) covering edge cases that geoarrow-data doesn't: empty geometries, large coordinate counts, mixed endianness, EWKB with SRID.

Status

#45 converted to draft. This plan is parked until there's bandwidth to review.

AI (Claude) supported my development of this plan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions