This topic describes the ST_ClusterDBSCAN function. This function is a window function that returns a clustering ID and uses the 2D density-based spatial clustering of applications with noise (DBSCAN) algorithm.
Syntax
- Syntax 1
integer ST_ClusterDBSCAN(geometry winset geom , float8 eps , integer minpoints);
- Syntax 2
integer ST_ClusterDBSCANSpheroid(geometry winset geom , float8 eps , integer minpoints);
Parameters
Parameter | Description |
---|---|
geom | The geometry object that you want to specify. |
eps | The minimum distance that you want to specify. |
minpoints | The minimum number of geometry objects that must be located within the minimum distance specified by the eps parameter. This parameter is used to identify whether a geometry object is the core geometry object of a cluster. |
Description
- Unlike the ST_ClusterKMeans function, the ST_ClusterDBSCAN function does not require you to specify the number of clusters. The ST_ClusterDBSCAN function uses the distance and density that you specify to construct each cluster.
- If a geometry object that you specify meets one of the following conditions in a cluster, the geometry object is added to the cluster:
- At least the minimum number of geometry objects are located within the minimum distance to the geometry object, which, in this case, is considered to be the core geometry object in the cluster.
- The core geometry object is located within the minimum distance to the geometry object, which, in this case, is considered to be a boundary geometry object in the cluster.
A boundary geometry object can be located within the minimum distance to core geometry objects in multiple clusters. If a boundary geometry object is located within the minimum distance to core geometry objects in multiple clusters, the boundary geometry object is assigned at random to one of the available clusters, and the ST_ClusterDBSCAN function may generate a cluster in which the number of geometry objects is less than the specified minimum number.
- If a geometry object that you specify does not meet any of the preceding conditions in all clusters, the ST_ClusterDBSCAN function assigns a cluster number NULL to the geometry object.
- The ST_ClusterDBSCAN function is a window function.
- If you use the ST_ClusterDBSCAN function in Syntax 1 for clustering, the Euclidean distance is used, and the value of the eps parameter is calculated based on the Euclidean distance between coordinates.
- If you use the ST_ClusterDBSCANSpheroid function in Syntax 2 for clustering, the length of a geometry object on an ellipsoid is used.
For example, when a geometry has a spatial reference identifier (SRID) and the SRID is represented by a longitude and a latitude, clustering is performed in the coordinate system of the SRID that is measured in meters.
Examples
SELECT ST_ClusterDBSCAN(geom,2,1) over() ,st_AsText(geom)
FROM (SELECT unnest(ARRAY['POINT (0 0)'::geometry,
'POINT(1 1)'::geometry,
'POINT (-1 -1)'::geometry,
'POINT (-3 -3)'::geometry]) AS geom) AS test;
st_clusterdbscan | st_astext
------------------+--------------
0 | POINT(0 0)
0 | POINT(1 1)
0 | POINT(-1 -1)
1 | POINT(-3 -3)
(4 rows)