feat(sq): algorithm optimization #1

ostapbodnar · 2025-06-07T13:09:52Z

Changes:

Multiple implementations added (faiss, numpy, numpy+memmap)
Nearest centroid search optimized
Refactored StochasticQuantization class
Added parallel execution support
Removed required history log (used only when turned on) Updated verbosity to support levels
Integrated tqdm for progress tracking
Model export/saving and loading adde

TODO:

Resolve conflicts by updating (rebasing) to latest master
Test this implementation to ensure smooth work and correct results

Multiple implementations added (faiss, numpy, numpy+memmap) Nearest centroid search optimized Refactored StochasticQuantization class Added parallel execution support Removed required history log (used only when turned on) Updated verbosity to support levels Integrated tqdm for progress tracking Model export/saving and loading adde

kaydotdev

Thank you for the contribution! Here are some preliminary fixed, including dependency management and polyfills for built-in functions. Also make sure the unit tests stay up to date

kaydotdev · 2025-07-04T14:55:06Z

code/pyproject.toml

@@ -47,3 +48,7 @@ classifiers = [
 Homepage = "https://github.com/kaydotdev/stochastic-quantization"
 Issues = "https://github.com/kaydotdev/stochastic-quantization/issues"
 Repository = "https://github.com/kaydotdev/stochastic-quantization.git"
+
+[project.optional-dependencies]


It would be better to group optional dependencies, something like this:

[project.optional-dependencies] faiss-cpu = ["faiss-cpu>=1.10.0,<2"] faiss-gpu = ["faiss-gpu>=1.10.0,<2"] progress = ["tqdm>=4.66.0,<5"] all = ["sqg[faiss-cpu,faiss-gpu,progress]"]

kaydotdev · 2025-07-04T14:56:03Z

code/pyproject.toml

 dependencies = [
    "numpy>=1.26.4,<2",
    "scikit-learn>=1.5.1,<2",
+    "tqdm>=4.66.0,<5",


The inclusion of tqdm would violate the references in the original paper, it's better to include it as an optional dependency

kaydotdev · 2025-07-04T14:57:39Z

code/setup.py

@@ -45,5 +44,10 @@
        install_requires=[
            "numpy>=1.26.4,<2",
            "scikit-learn>=1.5.1,<2",
+            "tqdm>=4.66.0,<5",


Same here, we need to move it to the optional dependencies

kaydotdev · 2025-07-04T14:59:20Z

code/sqg/progress_tracking/tqdm_joblib.py

+import contextlib
+
+import joblib
+from tqdm.autonotebook import tqdm


If we move tqdm to optional dependencies, we need to check the package with ImportError

kaydotdev · 2025-07-04T19:05:04Z

code/sqg/utils.py

+from itertools import islice
+
+
+def batched_iterable(iterable, batch_size):


Better to replace this implementation with the polyfill, something like this:

from sys import version_info import itertools if version_info >= (3, 12) and hasattr(itertools, "batched"): # Built-in since 3.12 (returns tuples) batched = itertools.batched # type: ignore[attr-defined] else: def batched(iterable, n, *, strict=False): """Back-port of itertools.batched for Py < 3.12 (returns tuples).""" if n < 1: raise ValueError("n must be >= 1") it = iter(iterable) while (chunk := tuple(itertools.islice(it, n))): if strict and len(chunk) != n: raise ValueError("last batch smaller than n") yield chunk

ostapbodnar marked this pull request as draft June 7, 2025 13:10

Ostap Bodnar added 2 commits June 7, 2025 16:52

fix: update imports after rebasing

5c0db3d

ostapbodnar force-pushed the feat/code-optimization branch from 0984426 to 5c0db3d Compare June 7, 2025 14:14

kaydotdev reviewed Jul 4, 2025

View reviewed changes

Ostap Bodnar added 5 commits July 13, 2025 19:50

fix: make tqdm optional

97aff9e

fix: use built-in-batch for python 3.12+

3234e34

fix: file cleanup issue

17e65ae

fix: tests

0be6790

fix: tqdm issues

49e8645

ostapbodnar requested a review from kaydotdev July 13, 2025 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sq): algorithm optimization #1

feat(sq): algorithm optimization #1

Uh oh!

ostapbodnar commented Jun 7, 2025 •

edited

Loading

Uh oh!

kaydotdev left a comment

Uh oh!

kaydotdev Jul 4, 2025

Uh oh!

kaydotdev Jul 4, 2025

Uh oh!

kaydotdev Jul 4, 2025

Uh oh!

kaydotdev Jul 4, 2025

Uh oh!

ostapbodnar Jul 13, 2025

Uh oh!

kaydotdev Jul 4, 2025

Uh oh!

ostapbodnar Jul 13, 2025

Uh oh!

Uh oh!

		from itertools import islice


		def batched_iterable(iterable, batch_size):

feat(sq): algorithm optimization #1

Are you sure you want to change the base?

feat(sq): algorithm optimization #1

Uh oh!

Conversation

ostapbodnar commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaydotdev left a comment

Choose a reason for hiding this comment

Uh oh!

kaydotdev Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

kaydotdev Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

kaydotdev Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

kaydotdev Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

ostapbodnar Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

kaydotdev Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

ostapbodnar Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ostapbodnar commented Jun 7, 2025 •

edited

Loading