Skip to content

feat(sq): algorithm optimization #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ostapbodnar
Copy link

@ostapbodnar ostapbodnar commented Jun 7, 2025

Changes:

  • Multiple implementations added (faiss, numpy, numpy+memmap)
  • Nearest centroid search optimized
  • Refactored StochasticQuantization class
  • Added parallel execution support
  • Removed required history log (used only when turned on) Updated verbosity to support levels
  • Integrated tqdm for progress tracking
  • Model export/saving and loading adde

TODO:

  • Resolve conflicts by updating (rebasing) to latest master
  • Test this implementation to ensure smooth work and correct results

@ostapbodnar ostapbodnar marked this pull request as draft June 7, 2025 13:10
Ostap Bodnar added 2 commits June 7, 2025 16:52
Multiple implementations added (faiss, numpy, numpy+memmap)
Nearest centroid search optimized
Refactored StochasticQuantization class
Added parallel execution support
Removed required history log (used only when turned on)
Updated verbosity to support levels
Integrated tqdm for progress tracking
Model export/saving and loading adde
@ostapbodnar ostapbodnar force-pushed the feat/code-optimization branch from 0984426 to 5c0db3d Compare June 7, 2025 14:14
Copy link
Owner

@kaydotdev kaydotdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! Here are some preliminary fixed, including dependency management and polyfills for built-in functions. Also make sure the unit tests stay up to date

@@ -47,3 +48,7 @@ classifiers = [
Homepage = "https://github.com/kaydotdev/stochastic-quantization"
Issues = "https://github.com/kaydotdev/stochastic-quantization/issues"
Repository = "https://github.com/kaydotdev/stochastic-quantization.git"

[project.optional-dependencies]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to group optional dependencies, something like this:

[project.optional-dependencies]
faiss-cpu = ["faiss-cpu>=1.10.0,<2"]
faiss-gpu = ["faiss-gpu>=1.10.0,<2"]
progress = ["tqdm>=4.66.0,<5"]
all = ["sqg[faiss-cpu,faiss-gpu,progress]"]

dependencies = [
"numpy>=1.26.4,<2",
"scikit-learn>=1.5.1,<2",
"tqdm>=4.66.0,<5",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of tqdm would violate the references in the original paper, it's better to include it as an optional dependency

code/setup.py Outdated
@@ -45,5 +44,10 @@
install_requires=[
"numpy>=1.26.4,<2",
"scikit-learn>=1.5.1,<2",
"tqdm>=4.66.0,<5",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we need to move it to the optional dependencies

import contextlib

import joblib
from tqdm.autonotebook import tqdm
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move tqdm to optional dependencies, we need to check the package with ImportError

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

from itertools import islice


def batched_iterable(iterable, batch_size):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to replace this implementation with the polyfill, something like this:

from sys import version_info

import itertools


if version_info >= (3, 12) and hasattr(itertools, "batched"):
    # Built-in since 3.12 (returns tuples)
    batched = itertools.batched  # type: ignore[attr-defined]
else:
    def batched(iterable, n, *, strict=False):
        """Back-port of itertools.batched for Py < 3.12 (returns tuples)."""

        if n < 1:
            raise ValueError("n must be >= 1")

        it = iter(iterable)
        while (chunk := tuple(itertools.islice(it, n))):
            if strict and len(chunk) != n:
                raise ValueError("last batch smaller than n")

            yield chunk

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

@ostapbodnar ostapbodnar requested a review from kaydotdev July 13, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants