Skip to content

scicloj/scicloj.ml.clust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scicloj.ml.clust

Clojure wrapper around Clust4j algorithms providing a small, idiomatic API for use inside scicloj.metamorph pipelines and tablecloth datasets.

The library is very light-weight and exposes two main helpers:

  • clust/cluster – build clustering steps (k‑means, DBSCAN, …) that work with the metamorph :fit/:transform API.
  • clust/pre-process – wrap a Clust4j pre‑processor (e.g. PCA, standard‑scaler, imputer) for use in the same pipelines.

📦 Requirements & Dependencies

This project is built atop the Clojure CLI tools. Your own deps.edn should include at least the following entries:

{:deps {
        org.scicloj/noj            {:mvn/version "2-beta21"}      ;; 
        }
 }

🚀 Quick Start

This section demonstrates some typical usage patterns. All examples assume you already have a small tablecloth dataset. For concreteness we use the famous Iris set from scicloj.metamorph.ml.rdatasets.

(ns user
  (:require
    [scicloj.ml.clust :as clust]
    [scicloj.metamorph.core :as mm]
    [scicloj.metamorph.ml.preprocessing :as prep]
    [scicloj.metamorph.ml.rdatasets :as datasets]
    [tablecloth.api :as tc]))

(def iris-ds
  (-> (datasets/datasets-iris)
      (tc/drop-columns [:rownames :species])))

🔁 Standard pipeline workflow

The library is designed to be used inside a Metamorph pipeline. Each step is a function returning a ctx -> ctx transformer that responds to the :metamorph/mode key. When the pipeline is :fit the wrapped Clust4j object is trained; when the pipeline is :transform the previously saved model is applied.

🧮 Clustering with a concrete parameter object

(def pipe-kmeans
  (mm/pipeline
    {:metamorph/id :cluster}
    (clust/cluster
      (.. (com.clust4j.algo.KMeansParameters. 5)
          (setSeed (java.util.Random. 1234))))))

;; run it
(-> (mm/fit-transform iris-ds pipe-kmeans)
    :metamorph/data
    :clustering
    frequencies)
;; => {0 28, 1 22, 2 41, 3 27, 4 32}

🪄 Keyword‑based constructor with property map

You don’t need to construct the Java parameter class yourself; you can pass a keyword and a map instead. The keyword is converted to the appropriate …Parameters class and the properties are applied with clojure.java.data.

(def pipe-ha
  (mm/pipeline
    (prep/std-scale :all {})               
    ; pre‑scale features first
    {:metamorph/id :cluster}
    (clust/cluster :hierarchical-agglomerative
                   {:seed (java.util.Random. 123)})))

(-> (mm/fit-transform iris-ds pipe-ha)
    :cluster
    :model
    .silhouetteScore)
;; => 0.577034601947599

Supported algorithm keywords include (but are not limited to):

  • :k-means
  • :k-medoids
  • :affinity-propagation
  • :hierarchical-agglomerative
  • :DBSCAN
  • :HDBSCAN
  • :mean-shift

Each one maps to a corresponding Clust4j …Parameters class – see the test suite for the full list.

⚙️ Pre‑processing steps

You can wrap any Clust4j Preprocessor object with clust/pre-process to use it in the same pipeline abstraction.

;; mean-centering
(-> (mm/fit-transform iris-ds
                   (mm/pipeline (clust/pre-process (com.clust4j.algo.preprocess.MeanCenterer.))))
    :metamorph/data
    tc/head)

| :sepal-length | :sepal-width | :petal-length | :petal-width |
|--------------:|-------------:|--------------:|-------------:|
|   -0.74333333 |   0.44266667 |        -2.358 |  -0.99933333 |
|   -0.94333333 |  -0.05733333 |        -2.358 |  -0.99933333 |
|   -1.14333333 |   0.14266667 |        -2.458 |  -0.99933333 |
|   -1.24333333 |   0.04266667 |        -2.258 |  -0.99933333 |
|   -0.84333333 |   0.54266667 |        -2.358 |  -0.99933333 |    

;; PCA down to two principal components
(-> (mm/fit-transform iris-ds
                   (mm/pipeline (clust/pre-process (com.clust4j.algo.preprocess.PCA. 2))))
    :metamorph/data
    tc/head)

|           0 |           1 |
|------------:|------------:|
| -2.68412563 |  0.31939725 |
| -2.71414169 | -0.17700123 |
| -2.88899057 | -0.14494943 |
| -2.74534286 | -0.31829898 |
| -2.72871654 |  0.32675451 |    


;; imputation example
(def ds (tc/dataset {:a [0 Double/NaN 2]}))
ds
| :a |
|---:|
|  0 |
|    |
|  2 |

(-> (mm/fit-transform ds
                   (mm/pipeline (clust/pre-process (com.clust4j.algo.preprocess.impute.MeanImputation.))))
    :metamorph/data
    :a
    vec)
;; => [0.0 1.0 2.0]

The pre‑processor remains available in the context under the given :metamorph/id if you need to inspect it later.

🧪 Running the tests

The repository includes a small test suite (test/scicloj/ml/clust_test.clj) that exercises most of the exposed behaviour. You can run them with the Clojure CLI:

clojure -M:test

🔍 Further reading

📝 License

Distributed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors