Skip to content

@turf/clean-coords - behavior of degenerate polygons, addition of Collection support, and other tweaks #2998

@bratter

Description

@bratter

There are several items that could be changed to improve the consistency and performance of @turf/clean-coords.

1. Degenerate Polygons

Valid (but polygons that are lines) input polygons (and multipolygons) such as the below cause clean-coords to throw due to the fewer than 4 points check here.

{
  "type": "Feature",
  "properties": {},
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [[-75.994645,37.95325],[-76.016553,38.95325],[-76.043938,37.95325],[-75.994645,37.95325]]
    ]
  }
}

While not really a bug, it isn't ideal behavior. In general, valid input should produce valid output if there is a path to do so (per other recent discussions). Also LineStrings with equivalent degenerate conditions (i.e., doubled up points), work fine, so the behavior is inconsistent between polygons and lines.

I propose making this case not throw and returning a valid polygon with a minimum viable set of points. Other solutions are possible (dropping the shape or reducing its dimension, but I think having clean coords with a contract of "same output shape as input shape" is fair.

2. Support FeatureCollection and GeometryCollection types

Clean-coords should support both collection types by simply operating on each contained geometry independently. Current behavior is to throw.

3. Multipoint treatment

This is question about consistency rather than a proposal. MultiLineStrings removes overlapping points. The other Multi- geometries don't look for equivalent conditions (identically overlapping. This is at least a minor inconsistency. Is it worth/beneficial to either (a) remove that behavior from points or (b) add it to multi lines and polys?

4. Improved types

The type signature of the function can be improved by adding some overloads and other refinements.

5. Improved consistency of structural sharing

The implementation currently uses structural sharing differently depending on the type of shape. For instance points are returned as-is, but all other types have a fresh outer wrapper when not mutating. I believe that it would be better to be consistent across all shape types. Also, the level of structural sharing can be increased to improve performance and memory use, especially in "real world" cases with larger shapes.

I propose writing in code then documenting that:

  1. When not mutating, the outer object wrapper(s) will always be new objects such that geojsonIn !== geojsonOut. This is usually the case now, but let's make it always the case.
  2. When mutating, the outer object wrapper(s) will always be the same object that was provided geojsonIn === geojsonOut.
  3. properties, bbox, id will always be structurally shared (no change)
  4. coordinates will be maximally structurally shared in both cases but cannot be relied upon. When not mutating, this means lines and rings MAY be shared.

This materially slows down single point calcs, but I think the consistency is worth it, and materially speeds up any case where the geometry is larger than just a couple of points (where setup overhead dominates).

6. Epsilon option

Pass an epsilon down to booleanPointOnLine - its a useful add.


I'm part way through a PR, but let me know any thoughts.

Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions