Built-in local neural embeddings (all-MiniLM-L6-v2) with auto re-index#3
Closed
KirtiJha wants to merge 2 commits into
Closed
Built-in local neural embeddings (all-MiniLM-L6-v2) with auto re-index#3KirtiJha wants to merge 2 commits into
KirtiJha wants to merge 2 commits into
Conversation
…index Replaces the deterministic hashing embedding behind the zero-config `local` provider with a real neural sentence-transformer (all-MiniLM-L6-v2, 384-dim) run locally via Transformers.js, substantially improving semantic-search recall. - neuralEmbedding.ts: lazy, cached loader for the Transformers.js feature- extraction pipeline; weights (~23 MB) download once to the extension storage dir and are cached for offline use. Loader is injectable for tests. - embedding.ts: new BuiltInEmbeddingProvider prefers the neural backend and transparently falls back to the hashing embedding if the ONNX runtime can't load (same 384 dims). Per-call neural errors throw rather than silently mixing embedding spaces. Adds getBackendSignature() and embedAllPending(). - Auto re-index: on activation (and on settings change) the embedding backend signature is compared to the value stored in the metadata DB; if it changed (hashing -> neural, or a provider switch) the vector index is cleared and all embedding ids reset so the backlog is rebuilt cleanly. New metadata helpers: getSetting/setSetting/clearEmbeddingIds. - Activation is never blocked on the model download: embedding init + re-index + backfill all run in the background; capture/embeddings that arrive before the model is ready are simply picked up by the backfill pass. - @xenova/transformers added as an OPTIONAL dependency and marked external in esbuild (loaded at runtime like @lancedb/lancedb), so the extension still works on the hashing fallback if it isn't installed. Tests: injected-loader unit tests for the neural path and hashing fallback, an opt-in real-model test (RUN_NEURAL=1), and metadata settings/clear tests. 53 tests pass. Docs updated.
Now that the neural embedding deps are stacked on the packaging fix, complete the size story for the heavier runtime: - onnxruntime-node ships every platform's native binary in one ~93 MB package (unlike @lancedb/lancedb's per-platform npm packages). scripts/package-target.mjs now temporarily excludes the non-target platforms' binaries from .vscodeignore while packaging (restoring the file afterward), so each .vsix carries only its own platform's binary. The Release workflow uses the script for every target. - onnxruntime-web (the browser/WASM backend) and the duplicate .wasm binaries in @xenova/transformers/dist (~74 MB total) are never loaded in the Node extension host, so they're excluded statically in .vscodeignore (keeping the JS so any require still resolves). Net: a linux-x64 .vsix drops from ~70 MB to ~49 MB, still containing the lancedb, onnxruntime-node and sharp native runtimes for that platform (the model weights are downloaded at runtime, not bundled). Verified by packaging and inspecting the archive.
803270b to
3d731f4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Swaps the deterministic hashing embedding behind the zero-config
localprovider for a real neural sentence-transformer — all-MiniLM-L6-v2 (384-dim) — run locally via Transformers.js. Still zero-config (no API key, no server), but a substantial jump in semantic-search recall. The hashing embedding remains as an automatic fallback.Verified end-to-end against the real model in this environment: loads in <1s (after a one-time ~23 MB cache), produces 384-dim normalized vectors, and ranks semantically related text above unrelated text.
What changed
Neural backend (
src/services/neuralEmbedding.ts) — lazy, cached loader for the Transformers.js pipeline; weights download once to extension storage and are cached offline. Loader is injectable for tests.Provider (
src/services/embedding.ts) —BuiltInEmbeddingProviderprefers neural, falls back to hashing if the ONNX runtime can't load (both 384-dim). Per-call neural failures throw rather than silently mixing two embedding spaces. AddsgetBackendSignature()andembedAllPending().Automatic re-index (correctness) — the embedding backend signature is tracked in the metadata DB; when it changes (hashing → neural, or a provider switch) the vector index is cleared and all
embedding_ids reset so the backlog rebuilds cleanly. New helpers:getSetting/setSetting/clearEmbeddingIds.Non-blocking activation — embedding init + re-index + backfill run in the background; changes captured before the model is ready are caught by the backfill pass.
Packaging (stacked on #4, now completed for the neural runtime):
@xenova/transformersis an optional dependency, external in esbuild (dynamicimport), so the extension degrades to hashing if it's absent.onnxruntime-nodeships all platforms' binaries in one ~93 MB package;scripts/package-target.mjsnow trims it to the target platform per.vsix.onnxruntime-webbrowser/WASM backend (~74 MB of.wasm) is excluded statically. Net: linux-x64.vsix≈ 49 MB, verified by inspecting the archive.Tests
getSetting/setSetting/clearEmbeddingIdstests; opt-in real-model test (RUN_NEURAL=1).https://claude.ai/code/session_014bNJaULcYHnDkqUP6HcemQ