Clojure wrapper around Clust4j algorithms providing a small, idiomatic API for
use inside scicloj.metamorph pipelines and tablecloth datasets.
The library is very light-weight and exposes two main helpers:
clust/cluster– build clustering steps (k‑means, DBSCAN, …) that work with the metamorph:fit/:transformAPI.clust/pre-process– wrap a Clust4j pre‑processor (e.g. PCA, standard‑scaler, imputer) for use in the same pipelines.
This project is built atop the Clojure CLI tools. Your own deps.edn should
include at least the following entries:
{:deps {
org.scicloj/noj {:mvn/version "2-beta21"} ;;
}
}This section demonstrates some typical usage patterns. All examples assume you already have a small
tablecloth dataset. For concreteness we use the famous Iris set from scicloj.metamorph.ml.rdatasets.
(ns user
(:require
[scicloj.ml.clust :as clust]
[scicloj.metamorph.core :as mm]
[scicloj.metamorph.ml.preprocessing :as prep]
[scicloj.metamorph.ml.rdatasets :as datasets]
[tablecloth.api :as tc]))
(def iris-ds
(-> (datasets/datasets-iris)
(tc/drop-columns [:rownames :species])))The library is designed to be used inside a Metamorph pipeline. Each step is a function returning a
ctx -> ctx transformer that responds to the :metamorph/mode key. When the pipeline is :fit the
wrapped Clust4j object is trained; when the pipeline is :transform the previously saved model is applied.
(def pipe-kmeans
(mm/pipeline
{:metamorph/id :cluster}
(clust/cluster
(.. (com.clust4j.algo.KMeansParameters. 5)
(setSeed (java.util.Random. 1234))))))
;; run it
(-> (mm/fit-transform iris-ds pipe-kmeans)
:metamorph/data
:clustering
frequencies)
;; => {0 28, 1 22, 2 41, 3 27, 4 32}You don’t need to construct the Java parameter class yourself; you can pass a keyword and a map
instead. The keyword is converted to the appropriate …Parameters class and the properties are
applied with clojure.java.data.
(def pipe-ha
(mm/pipeline
(prep/std-scale :all {})
; pre‑scale features first
{:metamorph/id :cluster}
(clust/cluster :hierarchical-agglomerative
{:seed (java.util.Random. 123)})))
(-> (mm/fit-transform iris-ds pipe-ha)
:cluster
:model
.silhouetteScore)
;; => 0.577034601947599Supported algorithm keywords include (but are not limited to):
- :k-means
- :k-medoids
- :affinity-propagation
- :hierarchical-agglomerative
- :DBSCAN
- :HDBSCAN
- :mean-shift
Each one maps to a corresponding Clust4j …Parameters class – see the test suite for the full list.
You can wrap any Clust4j Preprocessor object with clust/pre-process to use it in the same
pipeline abstraction.
;; mean-centering
(-> (mm/fit-transform iris-ds
(mm/pipeline (clust/pre-process (com.clust4j.algo.preprocess.MeanCenterer.))))
:metamorph/data
tc/head)
| :sepal-length | :sepal-width | :petal-length | :petal-width |
|--------------:|-------------:|--------------:|-------------:|
| -0.74333333 | 0.44266667 | -2.358 | -0.99933333 |
| -0.94333333 | -0.05733333 | -2.358 | -0.99933333 |
| -1.14333333 | 0.14266667 | -2.458 | -0.99933333 |
| -1.24333333 | 0.04266667 | -2.258 | -0.99933333 |
| -0.84333333 | 0.54266667 | -2.358 | -0.99933333 |
;; PCA down to two principal components
(-> (mm/fit-transform iris-ds
(mm/pipeline (clust/pre-process (com.clust4j.algo.preprocess.PCA. 2))))
:metamorph/data
tc/head)
| 0 | 1 |
|------------:|------------:|
| -2.68412563 | 0.31939725 |
| -2.71414169 | -0.17700123 |
| -2.88899057 | -0.14494943 |
| -2.74534286 | -0.31829898 |
| -2.72871654 | 0.32675451 |
;; imputation example
(def ds (tc/dataset {:a [0 Double/NaN 2]}))
ds
| :a |
|---:|
| 0 |
| |
| 2 |
(-> (mm/fit-transform ds
(mm/pipeline (clust/pre-process (com.clust4j.algo.preprocess.impute.MeanImputation.))))
:metamorph/data
:a
vec)
;; => [0.0 1.0 2.0]The pre‑processor remains available in the context under the given :metamorph/id if you need to
inspect it later.
The repository includes a small test suite (test/scicloj/ml/clust_test.clj) that exercises
most of the exposed behaviour. You can run them with the Clojure CLI:
clojure -M:test- Clust4j API documentation – the underlying Java library.
scicloj.metamorph– the pipeline abstraction used here.
Distributed under the MIT License. See the LICENSE file for details.