Adding HannoyTransformer by amalia-k510 · Pull Request #141 · scikit-learn-contrib/sklearn-ann

amalia-k510 · 2026-04-30T08:59:41Z

This PR adds HannoyTransformer as proposed in #123, wrapping hannoy (LMDB-backed storage) into the sklearn-ann transformer interface. The approach follows the same pattern as AnnoyTransformer and others.

A few things to note:

hannoy's Python bindings don't expose by_item yet (it exists in the Rust code in reader.rs but PyReader only has by_vec). A PR upstream was created to add it. Until that is approved, fit_transform stores the training data and re-queries it using by_vec instead of the faster by_item path. I plan to switch to by_item once it's available.
There's a known issue where multiple Database instances in the same process silently share the first LMDB environment.
I also opened an issue on hannoy requesting batch insert/query APIs to avoid the per-vector Python to Rust loop overhead.

Two questions: which metrics do we want to support? Right now it's only euclidean, but hannoy also has hamming, sqeuclidean, cosine, and manhattan. Also hannoy offers binary quantized variants of consine, euclidean, and manhattan. Would we want to also use those?

for more information, see https://pre-commit.ci

flying-sheep · 2026-04-30T12:00:26Z

There's a known issue where multiple Database instances in the same process silently share the first LMDB environment.

Do you mean a hannoy issue (if so, link plz) or in your code?

which metrics do we want to support?

All of them of course! Your code should not know which ones exist (except for the default): instead of accepting a string, just use the Metric enum as a type and pass that through to the upstream API.

flying-sheep · 2026-04-30T12:05:49Z

+        # distance correction
+        if self.metric == "euclidean":
+            np.sqrt(distances, out=distances)


why is that?

amalia-k510 and others added 2 commits April 30, 2026 10:36

hannoy implementation

3ba8f4a

[pre-commit.ci] auto fixes from pre-commit.com hooks

3110b95

for more information, see https://pre-commit.ci

flying-sheep reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding HannoyTransformer#141

Adding HannoyTransformer#141
amalia-k510 wants to merge 2 commits intoscikit-learn-contrib:mainfrom
amalia-k510:hannoy-implementation

amalia-k510 commented Apr 30, 2026 •

edited

Loading

Uh oh!

flying-sheep commented Apr 30, 2026 •

edited

Loading

Uh oh!

flying-sheep Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amalia-k510 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amalia-k510 commented Apr 30, 2026 •

edited

Loading

flying-sheep commented Apr 30, 2026 •

edited

Loading