Place data on CPU memory for `nn_descent` option in HDBSCAN

Current implementation of HDBSCAN puts data on GPU memory at the python layer.

https://github.com/rapidsai/cuml/blob/e05c9a5b1750ff1c145746441eff563ee0fa90ed/python/cuml/cuml/cluster/hdbscan/hdbscan.pyx#L916-L928


NN-Descent always copies the input data to GPU memory, even if the user originally provides it on the GPU. This is required because the algorithm internally converts the data to FP16. As a result, providing GPU-resident data leads to duplicate GPU allocations and unnecessary memory usage.

Therefore, when using the `nn_descent option, the input data should be put on CPU memory to avoid this extra GPU-side copy.

	def fit(self, X, y=None, *, convert_dtype=True) -> "HDBSCAN":
	"""
	Fit HDBSCAN model from features.
	"""

	kwds = self.build_kwds or {}
	if kwds.get("knn_n_clusters", 1) > 1:
	logger.warn("Using data on host memory because knn_n_clusters > 1.")
	convert_to_mem_type = MemoryType.host
	else:
	logger.warn("Using data on device memory because knn_n_clusters = 1.")
	convert_to_mem_type = MemoryType.device

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Place data on CPU memory for `nn_descent` option in HDBSCAN #7506

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Place data on CPU memory for nn_descent option in HDBSCAN #7506

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Place data on CPU memory for `nn_descent` option in HDBSCAN #7506