Inside Vector Databases: HNSW Performance Analysis

A benchmark comparing brute-force, HNSWlib, and FAISS (HNSW index) vector search on MS MARCO 1M dataset to demonstrate speed vs accuracy trade-offs.

Overview

This experiment evaluates different vector search methods (Brute-force, HNSW, FAISS) on the MS MARCO dataset (1M documents) to understand:

How approximate algorithms trade accuracy for speed
Real-world performance characteristics at scale
When to use exact vs approximate search

Key Findings

Performance Summary

Method	Implementation	Latency	Throughput	Recall@10	Speedup
Brute Force	sklearn (unoptimized)*	699.43 ms	1.4 QPS	0.8100	1x
HNSWlib	C++ with Python bindings	0.46 ms	2,166 QPS	0.7433 (91.8%)	1,515x
FAISS HNSW	IndexHNSWFlat	0.08 ms	11,805 QPS	0.7683 (94.9%)	8,257x

*Note: All methods tested with M=32, ef_construction=100, ef_search=50 (where applicable). Brute force uses unoptimized sklearn cosine_similarity.

Visualizations

Search Latency Comparison

FAISS achieves sub-millisecond queries while maintaining 95% of brute-force recall.

Throughput (Queries Per Second)

FAISS processes 11,805 queries/second on this dataset.

Recall vs Speed Trade-off

HNSW methods sacrifice ~5-7% recall for 1000-8000x speed improvement.

Recall Comparison

All methods maintain >75% recall@10, with brute-force as the upper bound (81%).

Index Build Time

Both HNSW and FAISS require ~6 minutes to build indices for 1M documents - a one-time cost.

Parameter Sensitivity (ef_search)

Tuning ef_search allows fine-grained control over the speed-accuracy trade-off.

Layer Distribution Analysis

HNSW assigns nodes to hierarchical layers using the formula floor(-ln(U) * mL) where mL = 1/ln(M). Higher M values create flatter hierarchies with more nodes in the base layer, while lower M values create taller structures. This probabilistic assignment creates the skip-list-like structure that enables efficient logarithmic search.

Experiment Setup

Dataset

Source: MS MARCO passage ranking (dev split)
Corpus: 1,000,000 documents (intelligent subset from 8.8M full corpus)
- Strategy: Preserves ALL 7,437 relevant documents from qrels + ~992K randomly sampled documents
- Why subset? Faster embedding generation for quick testing while preserving all relevant documents
- Benchmark integrity: 100% of relevant documents preserved, metrics remain valid
Queries: 100 dev queries (sampled from 6,980 total)
Embeddings: 384-dimensional vectors (sentence-transformers/all-MiniLM-L6-v2)
Ground Truth: Qrels with graded relevance (0=not relevant, 1=relevant, 2=highly relevant)

Methods Evaluated

Brute-Force (Exact Search)
- Implementation: sklearn cosine_similarity (unoptimized NumPy)
- Computes cosine similarity with all documents
- O(n) complexity, 100% accurate
- Baseline: 0.81 Recall@10, 699.43ms latency
HNSWlib
- Parameters: M=32, ef_construction=100, ef_search=50
- Hierarchical navigable small world graphs
- Results: 0.7433 Recall@10 (91.8% of brute-force), 0.46ms latency
- Index build: 512.8s (~8.5 minutes)
FAISS HNSW
- Implementation: faiss.IndexHNSWFlat (optimized C++ with SIMD)
- Parameters: M=32, ef_construction=100, ef_search=50
- Facebook's optimized HNSW implementation
- Results: 0.7683 Recall@10 (94.9% of brute-force), 0.08ms latency
- Index build: 380.0s (~6.3 minutes)

Metrics

Recall@K: Coverage of relevant documents in top-K results (K=1,5,10,20,50,100)
Precision@K: Accuracy of top-K results (K=1,5,10)
MRR (Mean Reciprocal Rank): Ranking quality metric
Latency: Average query time (milliseconds)
QPS: Queries per second (throughput)
Build Time: Index construction time (one-time cost)

Notebooks

000-get_data.py
- Downloads BEIR benchmark datasets (MS MARCO, SciFact)
- Loads corpus, queries, and ground truth qrels
- Explores dataset statistics
001-get_embeddings.py
- Generates 384-dim embeddings using sentence-transformers
- Saves compressed embeddings (.npz format)
- Reduces storage by 50-70% with compression
002-brute_force_similarity.py
- Implements baseline cosine similarity search
- Computes exact Recall@K metrics
- Establishes ground truth for comparison
003-hnswlib_demo.py
- Builds HNSW index with configurable parameters
- Benchmarks search performance
- Analyzes speed vs accuracy trade-offs
004-faiss_demo.py
- Implements FAISS HNSW index
- Compares with HNSWlib implementation
- Measures index build time and memory
005-compare_benchmarks.py
- Aggregates results from all methods
- Generates comparison plots
- Creates summary report
006-layer_assignment.py
- Analyzes HNSW layer distribution across different M values
- Visualizes hierarchical structure
- Demonstrates impact of M parameter on graph topology

Installation & Setup

Prerequisites

Python 3.12+
uv package manager (install instructions)
~5GB disk space for datasets and embeddings

Quick Setup

# Clone repository
git clone https://github.com/yourusername/inside-vectordb.git
cd inside-vectordb

# Create virtual environment and install dependencies
uv venv && source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

Verify Installation

python -c "import sentence_transformers, hnswlib, faiss; print('All dependencies installed successfully!')"

Dependencies

beir - BEIR benchmark datasets
sentence-transformers - Text embedding models
numpy & pandas - Data manipulation
scikit-learn - Cosine similarity and metrics
hnswlib - Fast HNSW implementation
faiss-cpu - Facebook AI Similarity Search
matplotlib & seaborn - Visualization
ipywidgets - Progress bars in notebooks

Running the Experiments

Run Notebooks Sequentially

cd notebooks

# 1. Download dataset
python 000-get_data.py

# 2. Generate embeddings
python 001-get_embeddings.py

# 3. Run brute-force baseline
python 002-brute_force_similarity.py

# 4. Run HNSW benchmarks
python 003-hnswlib_demo.py
python 004-faiss_demo.py

# 5. Generate comparison report
python 005-compare_benchmarks.py

Results are saved to reports/ directory.

Key Takeaways

When to Use Each Method

Brute-Force:

Small datasets (<10K documents)
Need 100% recall guarantee
Offline/batch processing

HNSWlib:

Production search systems
Million-scale datasets
Balance of speed and accuracy

FAISS:

Large-scale deployments
Need maximum throughput
GPU acceleration available

Speed vs Accuracy Trade-off

HNSW methods achieve 1000x+ speedup over brute-force search by sacrificing ~5-7% recall. This is the fundamental trade-off in approximate nearest neighbor search - dramatic speed improvements at the cost of slight accuracy reduction.

Project Structure

inside-vectordb-hnsw/
├── notebooks/           # Experiment notebooks
│   ├── 000-get_data.py
│   ├── 001-get_embeddings.py
│   ├── 002-brute_force_similarity.py
│   ├── 003-hnswlib_demo.py
│   ├── 004-faiss_demo.py
│   ├── 005-compare_benchmarks.py
│   └── utils.py        # Shared metrics utilities
├── data/               # Dataset and embeddings
├── reports/            # Benchmark results (JSON)
│   └── summary/        # Plots and summary report
└── README.md

References

BEIR Benchmark - Heterogeneous IR benchmark
MS MARCO - Large-scale passage ranking dataset
HNSWlib - Fast approximate nearest neighbor search
FAISS - Facebook AI similarity search
Sentence Transformers - State-of-the-art text embeddings

Author: Sagar Sarkale

Blog link: Inside Vector DB

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
notebooks		notebooks
reports		reports
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inside Vector Databases: HNSW Performance Analysis

Overview

Key Findings

Performance Summary

Visualizations

Search Latency Comparison

Throughput (Queries Per Second)

Recall vs Speed Trade-off

Recall Comparison

Index Build Time

Parameter Sensitivity (ef_search)

Layer Distribution Analysis

Experiment Setup

Dataset

Methods Evaluated

Metrics

Notebooks

Installation & Setup

Prerequisites

Quick Setup

Verify Installation

Dependencies

Running the Experiments

Run Notebooks Sequentially

Key Takeaways

When to Use Each Method

Speed vs Accuracy Trade-off

Project Structure

References

About

Uh oh!

Releases

Packages

Languages

sagarsrc/inside-vectordb

Folders and files

Latest commit

History

Repository files navigation

Inside Vector Databases: HNSW Performance Analysis

Overview

Key Findings

Performance Summary

Visualizations

Search Latency Comparison

Throughput (Queries Per Second)

Recall vs Speed Trade-off

Recall Comparison

Index Build Time

Parameter Sensitivity (ef_search)

Layer Distribution Analysis

Experiment Setup

Dataset

Methods Evaluated

Metrics

Notebooks

Installation & Setup

Prerequisites

Quick Setup

Verify Installation

Dependencies

Running the Experiments

Run Notebooks Sequentially

Key Takeaways

When to Use Each Method

Speed vs Accuracy Trade-off

Project Structure

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages