|
| 1 | +# Update Benchmarks |
| 2 | + |
| 3 | +This directory hosts two benchmark suites that exercise LEANN’s HNSW “update + |
| 4 | +search” pipeline under different assumptions: |
| 5 | + |
| 6 | +1. **RNG recompute latency** – measure how random-neighbour pruning and cache |
| 7 | + settings influence incremental `add()` latency when embeddings are fetched |
| 8 | + over the ZMQ embedding server. |
| 9 | +2. **Update strategy comparison** – compare a fully sequential update pipeline |
| 10 | + against an offline approach that keeps the graph static and fuses results. |
| 11 | + |
| 12 | +Both suites build a non-compact, `is_recompute=True` index so that new |
| 13 | +embeddings are pulled from the embedding server. Benchmark outputs are written |
| 14 | +under `.leann/bench/` by default and appended to CSV files for later plotting. |
| 15 | + |
| 16 | +## Benchmarks |
| 17 | + |
| 18 | +### 1. HNSW RNG Recompute Benchmark |
| 19 | + |
| 20 | +`bench_hnsw_rng_recompute.py` evaluates incremental update latency under four |
| 21 | +random-neighbour (RNG) configurations. Each scenario uses the same dataset but |
| 22 | +changes the forward / reverse RNG pruning flags and whether the embedding cache |
| 23 | +is enabled: |
| 24 | + |
| 25 | +| Scenario name | Forward RNG | Reverse RNG | ZMQ embedding cache | |
| 26 | +| ---------------------------------- | ----------- | ----------- | ------------------- | |
| 27 | +| `baseline` | Enabled | Enabled | Enabled | |
| 28 | +| `no_cache_baseline` | Enabled | Enabled | **Disabled** | |
| 29 | +| `disable_forward_rng` | **Disabled**| Enabled | Enabled | |
| 30 | +| `disable_forward_and_reverse_rng` | **Disabled**| **Disabled**| Enabled | |
| 31 | + |
| 32 | +For each scenario the script: |
| 33 | +1. (Re)builds a `is_recompute=True` index and writes it to `.leann/bench/`. |
| 34 | +2. Starts `leann_backend_hnsw.hnsw_embedding_server` for remote embeddings. |
| 35 | +3. Appends the requested updates using the scenario’s RNG flags. |
| 36 | +4. Records total time, latency per passage, ZMQ fetch counts, and stage-level |
| 37 | + timings before appending a row to the CSV output. |
| 38 | + |
| 39 | +**Run:** |
| 40 | +```bash |
| 41 | +LEANN_HNSW_LOG_PATH=.leann/bench/hnsw_server.log \ |
| 42 | +LEANN_LOG_LEVEL=INFO \ |
| 43 | +uv run -m benchmarks.update.bench_hnsw_rng_recompute \ |
| 44 | + --runs 1 \ |
| 45 | + --index-path .leann/bench/test.leann \ |
| 46 | + --initial-files data/PrideandPrejudice.txt \ |
| 47 | + --update-files data/huawei_pangu.md \ |
| 48 | + --max-initial 300 \ |
| 49 | + --max-updates 1 \ |
| 50 | + --add-timeout 120 |
| 51 | +``` |
| 52 | + |
| 53 | +**Output:** |
| 54 | +- `benchmarks/update/bench_results.csv` – per-scenario timing statistics |
| 55 | + (including ms/passage) for each run. |
| 56 | +- `.leann/bench/hnsw_server.log` – detailed ZMQ/server logs (path controlled by |
| 57 | + `LEANN_HNSW_LOG_PATH`). |
| 58 | + _The reference CSVs checked into this branch were generated on a workstation with an NVIDIA RTX 4090 GPU; throughput numbers will differ on other hardware._ |
| 59 | + |
| 60 | +### 2. Sequential vs. Offline Update Benchmark |
| 61 | + |
| 62 | +`bench_update_vs_offline_search.py` compares two end-to-end strategies on the |
| 63 | +same dataset: |
| 64 | + |
| 65 | +- **Scenario A – Sequential Update** |
| 66 | + - Start an embedding server. |
| 67 | + - Sequentially call `index.add()`; each call fetches embeddings via ZMQ and |
| 68 | + mutates the HNSW graph. |
| 69 | + - After all inserts, run a search on the updated graph. |
| 70 | + - Metrics recorded: update time (`add_total_s`), post-update search time |
| 71 | + (`search_time_s`), combined total (`total_time_s`), and per-passage |
| 72 | + latency. |
| 73 | + |
| 74 | +- **Scenario B – Offline Embedding + Concurrent Search** |
| 75 | + - Stop Scenario A’s server and start a fresh embedding server. |
| 76 | + - Spawn two threads: one generates embeddings for the new passages offline |
| 77 | + (graph unchanged); the other computes the query embedding and searches the |
| 78 | + existing graph. |
| 79 | + - Merge offline similarities with the graph search results to emulate late |
| 80 | + fusion, then report the merged top‑k preview. |
| 81 | + - Metrics recorded: embedding time (`emb_time_s`), search time |
| 82 | + (`search_time_s`), concurrent makespan (`makespan_s`), and scenario total. |
| 83 | + |
| 84 | +**Run (both scenarios):** |
| 85 | +```bash |
| 86 | +uv run -m benchmarks.update.bench_update_vs_offline_search \ |
| 87 | + --index-path .leann/bench/offline_vs_update.leann \ |
| 88 | + --max-initial 300 \ |
| 89 | + --num-updates 1 |
| 90 | +``` |
| 91 | + |
| 92 | +You can pass `--only A` or `--only B` to run a single scenario. The script will |
| 93 | +print timing summaries to stdout and append the results to CSV. |
| 94 | + |
| 95 | +**Output:** |
| 96 | +- `benchmarks/update/offline_vs_update.csv` – per-scenario timing statistics for |
| 97 | + Scenario A and B. |
| 98 | +- Console output includes Scenario B’s merged top‑k preview for quick sanity |
| 99 | + checks. |
| 100 | + _The sample results committed here come from runs on an RTX 4090-equipped machine; expect variations if you benchmark on different GPUs._ |
| 101 | + |
| 102 | +### 3. Visualisation |
| 103 | + |
| 104 | +`plot_bench_results.py` combines the RNG benchmark and the update strategy |
| 105 | +benchmark into a single two-panel plot. |
| 106 | + |
| 107 | +**Run:** |
| 108 | +```bash |
| 109 | +uv run -m benchmarks.update.plot_bench_results \ |
| 110 | + --csv benchmarks/update/bench_results.csv \ |
| 111 | + --csv-right benchmarks/update/offline_vs_update.csv \ |
| 112 | + --out benchmarks/update/bench_latency_from_csv.png |
| 113 | +``` |
| 114 | + |
| 115 | +**Options:** |
| 116 | +- `--broken-y` – Enable a broken Y-axis (default: true when appropriate). |
| 117 | +- `--csv` – RNG benchmark results CSV (left panel). |
| 118 | +- `--csv-right` – Update strategy results CSV (right panel). |
| 119 | +- `--out` – Output image path (PNG/PDF supported). |
| 120 | + |
| 121 | +**Output:** |
| 122 | +- `benchmarks/update/bench_latency_from_csv.png` – visual comparison of the two |
| 123 | + suites. |
| 124 | +- `benchmarks/update/bench_latency_from_csv.pdf` – PDF version, suitable for |
| 125 | + slides/papers. |
| 126 | + |
| 127 | +## Parameters & Environment |
| 128 | + |
| 129 | +### Common CLI Flags |
| 130 | +- `--max-initial` – Number of initial passages used to seed the index. |
| 131 | +- `--max-updates` / `--num-updates` – Number of passages to treat as updates. |
| 132 | +- `--index-path` – Base path (without extension) where the LEANN index is stored. |
| 133 | +- `--runs` – Number of repetitions (RNG benchmark only). |
| 134 | + |
| 135 | +### Environment Variables |
| 136 | +- `LEANN_HNSW_LOG_PATH` – File to receive embedding-server logs (optional). |
| 137 | +- `LEANN_LOG_LEVEL` – Logging verbosity (DEBUG/INFO/WARNING/ERROR). |
| 138 | +- `CUDA_VISIBLE_DEVICES` – Set to empty string if you want to force CPU |
| 139 | + execution of the embedding model. |
| 140 | + |
| 141 | +With these scripts you can easily replicate LEANN’s update benchmarks, compare |
| 142 | +multiple RNG strategies, and evaluate whether sequential updates or offline |
| 143 | +fusion better match your latency/accuracy trade-offs. |
0 commit comments