{: .no_toc }
- TOC {:toc}
Solution
Use ST_ClusterDBSCAN
, which provides very good performance.
SELECT *,
ST_ClusterDBSCAN(geom, 0, 1) OVER() AS clst_id
FROM poly_tbl;
This problem has a similar recommended solution: https://gis.stackexchange.com/questions/265137/postgis-union-geometries-that-intersect-but-keep-their-original-geometries-info
A worked example: https://gis.stackexchange.com/questions/366374/how-to-use-dissolve-a-subset-of-a-postgis-table-based-on-a-value-in-a-column
An alernate solution using recursive CTE and ST_DWithin?: https://stackoverflow.com/questions/27081061/how-to-merge-adjactent-polygons-to-1-polygon-and-keep-min-max-data
Similar problem in R https://gis.stackexchange.com/questions/254519/group-and-union-polygons-that-share-a-border-in-r
Issues
- DBSCAN uses distance. This will also cluster polygons which touch only at a point, not just along an edge. Is there a way to improve this? Allow a different distance metric perhaps - say length of overlap?
https://gis.stackexchange.com/questions/473030/union-of-multiple-polygons-by-value-that-st-touch
Solutions
The query data
is example data.
Using ST_ClusterDBSCAN
with eps => 0
:
WITH data(fid, class, geom) AS (
SELECT ROW_NUMBER() OVER (),
CASE x WHEN 5 THEN 1 ELSE x END AS class,
ST_Buffer(ST_Point(x, 1.5 * y), 0.6, 2) AS geom
FROM generate_series(1, 10) AS s1(y)
CROSS JOIN generate_series(1, 10) AS s2(x)
),
clust AS (
SELECT ST_ClusterDBSCAN(geom, 0, 2) OVER () AS clustid, fid, class, geom
FROM data WHERE class IN (1, 2, 3)
)
SELECT * FROM clust WHERE clustid IS NOT NULL;
Using ST_ClusterIntersectingWin
:
WITH data(fid, class, geom) AS (
SELECT ROW_NUMBER() OVER (),
CASE x WHEN 5 THEN 1 ELSE x END AS class,
ST_Buffer(ST_Point(x, 1.5 * y), 0.6, 2) AS geom
FROM generate_series(1, 10) AS s1(y)
CROSS JOIN generate_series(1, 10) AS s2(x)
),
datasel AS (
SELECT * FROM data WHERE class IN (1, 2, 3)
),
clust AS (
SELECT ST_ClusterIntersectingWin(geom) OVER () AS clustid, fid, class, geom FROM datasel
),
clustercnt AS (
SELECT clustid, COUNT(*) AS cnt FROM clust GROUP BY clustid
)
SELECT fid, class, geom FROM clust c
JOIN clustercnt cs ON c.clustid = cs.clustid
WHERE cnt > 1;
https://gis.stackexchange.com/questions/94203/grouping-connected-linestrings-in-postgis
Presents a recursive CTE approach, but ultimately recommends using ST_ClusterDBCSAN
or ST_ClusterIntersecting
.
Use ST_ClusterIntersecting
:
SELECT attr, unnest(ST_ClusterIntersecting(geom))
FROM lines
GROUP by attr;
https://gis.stackexchange.com/questions/350339/how-to-create-polygons-of-a-specific-size
More generally: how to group adjacent polygons into sets with similar sum of a given attribute.
Solution Build adjacency graph and aggregate based on total, and perhaps some distance criteria? Note: posts do not provide a PostGIS solution for this. Not known if such a solution exists. Would need a recursive query to do this. How to keep clusters compact?
https://stackoverflow.com/questions/67972764/alternative-first-function-for-geometry-type
Using FIRST_VALUE
window function:
SELECT DISTINCT col,
FIRSET_VALUE(geom) OVER (PARTITION BY col ORDER BT dt)
FROM t;
Using ARRAY_AGG
approach:
SELECT col,
(ARRAY_AGG(geo ORDER BY dt))[1]
FROM t GROUP BY col
https://gis.stackexchange.com/questions/269407/centroid-of-point-cluster-points
Use ST_Collect
with ST_Centroid
or ST_GeometricMedian
:
SELECT village, ST_Centroid(ST_Collect(geom)) AS geom FROM your_table GROUP BY village;
See https://gis.stackexchange.com/questions/11567/spatial-clustering-with-postgis for a variety of approaches that predate a lot of the PostGIS clustering functions.
Solution 1
Use ST_SnapToGrid
to compute a cell id for each point, then group the points based on that. Can use aggregate function to count points in grid cell, or use DISTINCT ON as a way to pick one representative point. Need to use representative point rather than average, for better visual results (perhaps?)
SELECT id, pt FROM (
SELECT DISTINCT ON (snap)
id, pt,
ST_SnapToGrid(pt, 1) AS snap
FROM points
) AS t;
Solution 2
Generate grid of cells covering desired area, then JOIN LATERAL
to points to aggregate. Not sure how to select a representative point doing this though - perhaps MIN or MAX? Requires a grid-generating function, which is coming in PostGIS 3.1
Cluster a selection of points using DBSCAN, and return centroid of cluster and count of points.
https://gis.stackexchange.com/questions/388848/clustering-and-combining-points-in-postgis
WITH pts AS (
SELECT (ST_DumpPoints(ST_GeneratePoints('POLYGON ((10 90, 90 90, 90 10, 10 10, 10 90))', 100, 2))).geom AS geom
)
SELECT x.cid, ST_Centroid(ST_Collect(x.geom)) geom, COUNT(cid) num_points FROM
(
SELECT ST_ClusterDBSCAN(geom, eps := 8, minpoints := 2) over () AS cid, geom
FROM pts
GROUP BY(geom)
) as x
WHERE cid IS NOT NULL
GROUP BY x.cid ORDER BY cid;
https://gis.stackexchange.com/questions/187256/finding-density-centroids-within-polygons-in-postgis
https://gist.github.com/AbelVM/dc86f01fbda7ba24b5091a7f9b48d2ee
Not sure if this is worthwhile or not. Possibly superseded by more recent standard PostGIS clustering functions
https://gis.stackexchange.com/questions/348484/clustering-points-in-postgis
Explains how DBSCAN is a superset of ST_ClusterWithin
, and provides simpler, more powerful SQL.
https://gis.stackexchange.com/questions/356663/postgis-finding-duplicate-label-within-a-radius
https://gis.stackexchange.com/questions/312167/calculating-shortest-distance-between-polygons
Solution
Use ST_GeometricMedian
https://gis.stackexchange.com/questions/376563/cluster-points-in-each-polygon-into-n-parts
Use the ST_ClusterKMeans
window function with PARTITION BY
.
ST_ClusterKMeans
computes cluster ids 0..n for each (hierarchy of) partition key(s) used in the PARTITION BY expression.
To compute clusters for each set of points in each polygon (assuming each polygon has id poly_id
):
WITH polys(poly_id, geom) AS (
VALUES (1, 'POLYGON((0 0, 0 5, 5 5, 5 0, 0 0))'::GEOMETRY),
(2, 'POLYGON((10 10, 10 15, 15 15, 15 10, 10 10))'::GEOMETRY)
)
SELECT polys.poly_id,
ST_ClusterKMeans(pts.geom, 4) OVER(PARTITION BY polys.poly_id) AS cluster_id,
pts.geom
FROM polys,
LATERAL ST_Dump(ST_GeneratePoints(polys.geom, 1000, 1)) AS pts
ORDER BY 1, 2;