KirtiJha · KirtiJha · Jun 4, 2026 · Jun 4, 2026
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -53,7 +53,10 @@ jobs:
         shell: bash
 
       - name: Package
-        run: npx vsce package --target ${{ matrix.target }} -o code-historian-${{ matrix.target }}-${{ github.ref_name }}.vsix
+        # Uses the helper so onnxruntime-node is trimmed to this target's binary
+        # (it ships all platforms in one package; vsce names the output
+        # code-historian-<target>-<version>.vsix automatically).
+        run: node scripts/package-target.mjs ${{ matrix.target }}
         shell: bash
 
       - name: Upload artifact

diff --git a/.vscodeignore b/.vscodeignore
@@ -31,8 +31,15 @@ coverage/**
 CONTRIBUTING.md
 
 # Trim cruft from shipped production dependencies (never required at runtime).
-# These deliberately do NOT match *.js, *.json, *.node or *.wasm.
+# These deliberately do NOT match *.js, *.json or *.node.
 node_modules/**/*.ts
 node_modules/**/*.map
 node_modules/**/*.md
 node_modules/**/{test,tests,docs,example,examples,.github}/**
+
+# The extension host runs on Node, where @xenova/transformers uses the
+# onnxruntime-node native backend. The onnxruntime-web (browser/WASM) backend and
+# its ~74 MB of bundled .wasm files are never loaded here — keep the JS so any
+# `require` resolves, but drop the binaries.
+node_modules/onnxruntime-web/dist/*.wasm
+node_modules/@xenova/transformers/dist/*.wasm
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,22 @@ format is based on [Keep a Changelog](https://keepachangelog.com/).
 ## [Unreleased]
 
 ### Added
+- **Built-in neural embeddings.** The zero-config `local` provider now runs a
+  small sentence-transformer (all-MiniLM-L6-v2, 384-dim) locally via
+  Transformers.js — no API key, no server, weights cached after a one-time
+  ~23 MB download. This substantially improves semantic-search recall over the
+  previous hashing embedding, which is kept as an automatic fallback when the
+  ONNX runtime can't load (e.g. unsupported platform). The model loads in the
+  background so activation is never blocked.
+  - When the embedding backend changes (hashing → neural, or a provider switch),
+    the vector index is automatically rebuilt so old and new vectors are never
+    mixed. A backend signature is tracked in the metadata DB to detect this.
+  - Added `embedAllPending()` to drain the full backlog when re-indexing.
+  - `@xenova/transformers` is an **optional** dependency: if it (or its native
+    ONNX runtime) isn't installed, the extension still works on the hashing
+    fallback. It ships via the platform-specific packaging (see Fixed below);
+    `onnxruntime-node`'s other-platform binaries and the unused `onnxruntime-web`
+    browser/WASM backend (~74 MB) are trimmed from each `.vsix`.
 - **Symbol extraction.** Captured changes now record the functions/classes they
   touched (via VS Code's document symbol provider), so symbol history, the
   clickable symbol tags, and symbol-aware search actually work. Also records the

diff --git a/README.md b/README.md
@@ -34,8 +34,8 @@
 - **Hybrid search** combining vector similarity and keyword matching (BM25)
 - Context-aware results with relevant code snippets
 - Temporal filtering: *"Changes from last week"*
-- **Works with zero configuration** (built-in local embeddings) — or plug in
-  Ollama, HuggingFace, or OpenAI for higher quality
+- **Works with zero configuration** — a neural embedding model (all-MiniLM-L6-v2)
+  runs locally out of the box; or plug in Ollama, HuggingFace, or OpenAI
 - **Symbol time-travel**: trace the history of a single function/class/variable
 
 ### 💬 **Chat Integration**
@@ -101,14 +101,16 @@ Code Historian supports multiple embedding providers for semantic search:
 
 | Provider | Model | Local | Cost | Dimensions | Setup |
 |----------|-------|:-----:|------|:----------:|-------|
-| **Built-in** (default) | hashing embedding | ✅ | Free | 384 | None — works offline |
+| **Built-in** (default) | all-MiniLM-L6-v2 (neural) | ✅ | Free | 384 | None — runs locally |
 | **Ollama** | nomic-embed-text | ✅ | Free | 768 | Install Ollama |
 | **HuggingFace** | BAAI/bge-large-en-v1.5 | ❌ | Free tier | 1024 | API token |
 | **OpenAI** | text-embedding-3-small | ❌ | Paid | 1536 | API key |
 
-The **built-in** provider needs no configuration and works out of the box (lower
-quality than a neural model). Switch to Ollama / HuggingFace / OpenAI for stronger
-semantic search:
+The **built-in** provider needs no configuration and runs a small neural
+sentence-transformer (all-MiniLM-L6-v2) locally via Transformers.js — the model
+(~23 MB) is downloaded once and cached for offline use. If the local ONNX runtime
+can't load, it automatically falls back to a lightweight hashing embedding so
+search always works. Switch to Ollama / HuggingFace / OpenAI for other models:
 
 ```json
 {