Skip to content

Commit d4f5f28

Browse files
andylizfyichuan-w
andauthored
Faster Update (#148)
* stash * stash * add std err in add and trace progress * fix. * docs * style: format * docs * better figs * better figs * update results * fotmat --------- Co-authored-by: yichuan-w <[email protected]>
1 parent 366984e commit d4f5f28

File tree

9 files changed

+2328
-4
lines changed

9 files changed

+2328
-4
lines changed

benchmarks/update/README.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# Update Benchmarks
2+
3+
This directory hosts two benchmark suites that exercise LEANN’s HNSW “update +
4+
search” pipeline under different assumptions:
5+
6+
1. **RNG recompute latency** – measure how random-neighbour pruning and cache
7+
settings influence incremental `add()` latency when embeddings are fetched
8+
over the ZMQ embedding server.
9+
2. **Update strategy comparison** – compare a fully sequential update pipeline
10+
against an offline approach that keeps the graph static and fuses results.
11+
12+
Both suites build a non-compact, `is_recompute=True` index so that new
13+
embeddings are pulled from the embedding server. Benchmark outputs are written
14+
under `.leann/bench/` by default and appended to CSV files for later plotting.
15+
16+
## Benchmarks
17+
18+
### 1. HNSW RNG Recompute Benchmark
19+
20+
`bench_hnsw_rng_recompute.py` evaluates incremental update latency under four
21+
random-neighbour (RNG) configurations. Each scenario uses the same dataset but
22+
changes the forward / reverse RNG pruning flags and whether the embedding cache
23+
is enabled:
24+
25+
| Scenario name | Forward RNG | Reverse RNG | ZMQ embedding cache |
26+
| ---------------------------------- | ----------- | ----------- | ------------------- |
27+
| `baseline` | Enabled | Enabled | Enabled |
28+
| `no_cache_baseline` | Enabled | Enabled | **Disabled** |
29+
| `disable_forward_rng` | **Disabled**| Enabled | Enabled |
30+
| `disable_forward_and_reverse_rng` | **Disabled**| **Disabled**| Enabled |
31+
32+
For each scenario the script:
33+
1. (Re)builds a `is_recompute=True` index and writes it to `.leann/bench/`.
34+
2. Starts `leann_backend_hnsw.hnsw_embedding_server` for remote embeddings.
35+
3. Appends the requested updates using the scenario’s RNG flags.
36+
4. Records total time, latency per passage, ZMQ fetch counts, and stage-level
37+
timings before appending a row to the CSV output.
38+
39+
**Run:**
40+
```bash
41+
LEANN_HNSW_LOG_PATH=.leann/bench/hnsw_server.log \
42+
LEANN_LOG_LEVEL=INFO \
43+
uv run -m benchmarks.update.bench_hnsw_rng_recompute \
44+
--runs 1 \
45+
--index-path .leann/bench/test.leann \
46+
--initial-files data/PrideandPrejudice.txt \
47+
--update-files data/huawei_pangu.md \
48+
--max-initial 300 \
49+
--max-updates 1 \
50+
--add-timeout 120
51+
```
52+
53+
**Output:**
54+
- `benchmarks/update/bench_results.csv` – per-scenario timing statistics
55+
(including ms/passage) for each run.
56+
- `.leann/bench/hnsw_server.log` – detailed ZMQ/server logs (path controlled by
57+
`LEANN_HNSW_LOG_PATH`).
58+
_The reference CSVs checked into this branch were generated on a workstation with an NVIDIA RTX 4090 GPU; throughput numbers will differ on other hardware._
59+
60+
### 2. Sequential vs. Offline Update Benchmark
61+
62+
`bench_update_vs_offline_search.py` compares two end-to-end strategies on the
63+
same dataset:
64+
65+
- **Scenario A – Sequential Update**
66+
- Start an embedding server.
67+
- Sequentially call `index.add()`; each call fetches embeddings via ZMQ and
68+
mutates the HNSW graph.
69+
- After all inserts, run a search on the updated graph.
70+
- Metrics recorded: update time (`add_total_s`), post-update search time
71+
(`search_time_s`), combined total (`total_time_s`), and per-passage
72+
latency.
73+
74+
- **Scenario B – Offline Embedding + Concurrent Search**
75+
- Stop Scenario A’s server and start a fresh embedding server.
76+
- Spawn two threads: one generates embeddings for the new passages offline
77+
(graph unchanged); the other computes the query embedding and searches the
78+
existing graph.
79+
- Merge offline similarities with the graph search results to emulate late
80+
fusion, then report the merged top‑k preview.
81+
- Metrics recorded: embedding time (`emb_time_s`), search time
82+
(`search_time_s`), concurrent makespan (`makespan_s`), and scenario total.
83+
84+
**Run (both scenarios):**
85+
```bash
86+
uv run -m benchmarks.update.bench_update_vs_offline_search \
87+
--index-path .leann/bench/offline_vs_update.leann \
88+
--max-initial 300 \
89+
--num-updates 1
90+
```
91+
92+
You can pass `--only A` or `--only B` to run a single scenario. The script will
93+
print timing summaries to stdout and append the results to CSV.
94+
95+
**Output:**
96+
- `benchmarks/update/offline_vs_update.csv` – per-scenario timing statistics for
97+
Scenario A and B.
98+
- Console output includes Scenario B’s merged top‑k preview for quick sanity
99+
checks.
100+
_The sample results committed here come from runs on an RTX 4090-equipped machine; expect variations if you benchmark on different GPUs._
101+
102+
### 3. Visualisation
103+
104+
`plot_bench_results.py` combines the RNG benchmark and the update strategy
105+
benchmark into a single two-panel plot.
106+
107+
**Run:**
108+
```bash
109+
uv run -m benchmarks.update.plot_bench_results \
110+
--csv benchmarks/update/bench_results.csv \
111+
--csv-right benchmarks/update/offline_vs_update.csv \
112+
--out benchmarks/update/bench_latency_from_csv.png
113+
```
114+
115+
**Options:**
116+
- `--broken-y` – Enable a broken Y-axis (default: true when appropriate).
117+
- `--csv` – RNG benchmark results CSV (left panel).
118+
- `--csv-right` – Update strategy results CSV (right panel).
119+
- `--out` – Output image path (PNG/PDF supported).
120+
121+
**Output:**
122+
- `benchmarks/update/bench_latency_from_csv.png` – visual comparison of the two
123+
suites.
124+
- `benchmarks/update/bench_latency_from_csv.pdf` – PDF version, suitable for
125+
slides/papers.
126+
127+
## Parameters & Environment
128+
129+
### Common CLI Flags
130+
- `--max-initial` – Number of initial passages used to seed the index.
131+
- `--max-updates` / `--num-updates` – Number of passages to treat as updates.
132+
- `--index-path` – Base path (without extension) where the LEANN index is stored.
133+
- `--runs` – Number of repetitions (RNG benchmark only).
134+
135+
### Environment Variables
136+
- `LEANN_HNSW_LOG_PATH` – File to receive embedding-server logs (optional).
137+
- `LEANN_LOG_LEVEL` – Logging verbosity (DEBUG/INFO/WARNING/ERROR).
138+
- `CUDA_VISIBLE_DEVICES` – Set to empty string if you want to force CPU
139+
execution of the embedding model.
140+
141+
With these scripts you can easily replicate LEANN’s update benchmarks, compare
142+
multiple RNG strategies, and evaluate whether sequential updates or offline
143+
fusion better match your latency/accuracy trade-offs.

benchmarks/update/__init__.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
"""Benchmarks for LEANN update workflows."""
2+
3+
# Expose helper to locate repository root for other modules that need it.
4+
from pathlib import Path
5+
6+
7+
def find_repo_root() -> Path:
8+
"""Return the project root containing pyproject.toml."""
9+
current = Path(__file__).resolve()
10+
for parent in current.parents:
11+
if (parent / "pyproject.toml").exists():
12+
return parent
13+
return current.parents[1]
14+
15+
16+
__all__ = ["find_repo_root"]

0 commit comments

Comments
 (0)