This directory contains structured records of all performance experiments conducted on sqlite-diskann. Each experiment documents the hypothesis, methodology, results, and conclusions to build institutional knowledge.
Each experiment is documented in a separate markdown file with the following structure:
# Experiment: [Short Title]
**Date:** YYYY-MM-DD
**Engineer:** [Name]
**Status:** [Planned | Running | Complete | Abandoned]
## Hypothesis
What we believe will happen and why.
## Motivation
Why we're running this experiment. What problem are we trying to solve?
## Test Setup
- Parameters tested
- Dataset size and characteristics
- Hardware/environment
- Comparison baseline
## Expected Results
Quantitative predictions with reasoning.
## Actual Results
Raw data, tables, graphs. Link to benchmark output files.
## Analysis
What the results mean. Surprises? Confirmations?
## Conclusions
- What we learned
- Impact on defaults/recommendations
- Follow-up experiments needed
## Artifacts
- Benchmark profiles: `benchmarks/profiles/experiment-001-*.json`
- Results: `results/experiment-001-*.json`
- Graphs: `experiments/graphs/experiment-001-*.png`| ID | Date | Title | Status | Key Finding |
|---|---|---|---|---|
| 001 | 2026-02-11 | Cache + Hash Set Optimization | Complete | 37% build speedup from BLOB caching |
| 002 | 2026-02-11 | insert_list_size Reduction (200→100) | Complete | Only 2% improvement due to cache masking |
| 003 | 2026-02-14 | max_neighbors Impact on Recall | Complete | searchListSize bottleneck; keep default=32 |
| 004 | 2026-02-12 | Scaling Test (10k→200k) | Planned | Find crossover vs brute-force |
| 005 | 2026-02-12 | Block Size Fix at 100k | Complete | 98% recall (maxDeg=64), 64% (maxDeg=32) |
# Create from template
cp experiments/template.md experiments/experiment-XXX-short-name.md
# Edit with your hypothesis and setup
vim experiments/experiment-XXX-short-name.md
# Run benchmark
cd benchmarks
npm run bench -- --profile=profiles/experiment-XXX.json > ../experiments/experiment-XXX-output.txt
# Update experiment file with results# Find experiments testing specific parameters
grep -r "max_neighbors" experiments/
# Find experiments with high recall
grep -r "Recall.*9[5-9]%" experiments/
# List all completed experiments
grep -l "Status: Complete" experiments/*.md- Check for similar past experiments - Don't repeat work
- Document hypothesis clearly - Make predictions falsifiable
- Plan for automation - Use benchmark profiles, not manual tests
- Estimate time/cost - Large benchmarks can take hours
- Capture raw output - Redirect to
experiment-XXX-output.txt - Save result JSON - Link to timestamped result files
- Note anomalies - Document anything unexpected immediately
- Take screenshots - For interactive visualizations
- Update experiment status - Mark as Complete
- Add to index - Update table above with key finding
- Update docs - If defaults change, update PARAMETERS.md
- Link from issues/PRs - Reference experiment IDs in commits
# Compare build times across experiments
jq '.[] | {experiment: .name, build_time: .build_time}' \
results/experiment-*.json
# Plot recall vs build time
python3 experiments/scripts/plot-pareto.py results/experiment-*.json# Run t-test between two experiments
python3 experiments/scripts/ttest.py \
results/experiment-001.json \
results/experiment-002.json❌ Bad: "Let's try max_neighbors=48 and see what happens" ✅ Good: "Hypothesis: Increasing max_neighbors from 32→48 will improve recall@10 from 95%→97% but increase index size by 50% and build time by 10%"
- Use benchmark profiles (JSON configs)
- Document exact versions (git commit hash)
- Note hardware specs
- Seed random number generators
- Change one variable at a time (when possible)
- Build on previous experiments
- Reference prior work
- Write for future you (6 months from now)
- Assume reader doesn't have context
- Include enough detail to reproduce exactly
- Not documenting baseline - Always measure before/after
- Cherry-picking results - Document failures too
- Ignoring variance - Run multiple trials, report stddev
- Confounding variables - Did something else change? (OS update, etc.)
- Premature conclusions - Correlation ≠ causation
template.md- Blank experiment templatetemplate-param-sweep.md- For parameter sweepstemplate-scaling.md- For dataset size scaling teststemplate-regression.md- For performance regression investigations