perf: add FalkorDB HNSW vector indices and fix O(n) fulltext re-match by verveguy · Pull Request #1287 · getzep/graphiti

verveguy · 2026-03-01T19:35:05Z

Summary

Fixes two major performance bottlenecks in FalkorDB similarity and fulltext search that cause episode ingestion to take 2-3 minutes per episode instead of 15-25 seconds.

Problem

Vector similarity searches use brute-force O(n) scans: edge_similarity_search, node_similarity_search, and community_similarity_search compute vec.cosineDistance() against every node/edge in the graph, then sort and limit. With a growing graph, each search takes 12-65 seconds.
Edge fulltext search re-matches by uuid: After db.idx.fulltext.queryRelationships returns results, the query does MATCH (n:Entity)-[e:RELATES_TO {uuid: rel.uuid}]->(m:Entity) which scans all RELATES_TO edges to find each match by property. This takes 20-45 seconds per search.

Solution

HNSW vector indices (commit 1):

Add get_vector_indices() function that creates HNSW indices on Entity.name_embedding, Community.name_embedding, and RELATES_TO.fact_embedding
Dimension read from EMBEDDING_DIM env var (default 1024)
Wire index creation into FalkorDriver.build_indices_and_constraints() and deletion into delete_all_indexes()
Add FalkorDB-specific search branches using db.idx.vector.queryNodes / db.idx.vector.queryRelationships for O(log n) approximate nearest neighbor search
Over-fetch by 10x to compensate for post-filtering on group_id, edge_uuids, etc.

Fulltext re-match fix (commit 2):

FalkorDB's queryRelationships already returns the actual relationship object
Use startNode(e) / endNode(e) directly instead of re-matching by uuid
Eliminates the graph-wide scan entirely

Results (measured on real workload)

Search function	Before	After
`edge_similarity_search`	12-65s	2-9 ms
`edge_fulltext_search`	20-45s	5-260 ms
`node_similarity_search`	seconds	3-50 ms
Total episode time	~170s	15-25s

Files changed

graphiti_core/graph_queries.py — add get_vector_indices()
graphiti_core/driver/falkordb_driver.py — wire index creation + deletion
graphiti_core/search/search_utils.py — add indexed search branches for all 3 similarity functions + fix fulltext re-match

Known limitations

FalkorDB issue Add support for FalkorDB Graph database alongside Neo4j usage #525: queryRelationships may return score=0. HNSW ordering is still preserved; only min_score filtering is affected for edges.
Index creation is asynchronous. On first run, searches may briefly use brute-force until the index status changes from UNDER CONSTRUCTION to OPERATIONAL.

Test plan

make lint passes (ruff + pyright)
make test passes (147 unit tests)
Tested with real FalkorDBLite workload — search times confirmed at millisecond range
Test with full FalkorDB server deployment

🤖 Generated with Claude Code

Replace brute-force O(n) cosine distance scans with HNSW indexed lookups for edge, node, and community similarity searches. This addresses the primary performance bottleneck where vector searches were taking 12-33 seconds each during episode ingestion. Changes: - Add get_vector_indices() to create HNSW indices on Entity.name_embedding, Community.name_embedding, and RELATES_TO.fact_embedding - Wire vector index creation/deletion in FalkorDriver - Add FalkorDB-specific indexed search branches using db.idx.vector.queryNodes/queryRelationships in search_utils.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

FalkorDB's queryRelationships returns the actual relationship object, so use startNode/endNode directly instead of re-matching all RELATES_TO edges by uuid property. This was causing 20-45 second fulltext searches due to graph-wide scans on every result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

danielchalef · 2026-03-01T19:35:16Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

verveguy · 2026-03-01T19:36:52Z

I have read the CLA Document and I hereby sign the CLA

Copilot

Pull request overview

This PR improves FalkorDB ingestion/search performance by adding vector (HNSW) indexing support for similarity search and removing an O(n) edge re-match pattern in fulltext search, bringing FalkorDB query performance closer to the indexed behavior of other providers.

Changes:

Add FalkorDB HNSW vector index creation queries (Entity/Community name embeddings, RELATES_TO fact embedding) and wire them into FalkorDB driver index lifecycle.
Add FalkorDB-specific similarity search paths using db.idx.vector.queryNodes / db.idx.vector.queryRelationships with over-fetch + post-filtering.
Fix FalkorDB edge fulltext search to use the returned relationship directly (startNode/endNode) instead of re-matching by uuid.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`graphiti_core/search/search_utils.py`	Adds FalkorDB vector-indexed similarity search branches and fixes FalkorDB edge fulltext re-match performance.
`graphiti_core/graph_queries.py`	Introduces `get_vector_indices()` to create FalkorDB vector indices using `EMBEDDING_DIM`.
`graphiti_core/driver/falkordb_driver.py`	Wires vector index creation into `build_indices_and_constraints()` and vector index deletion into `delete_all_indexes()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T19:39:26Z

graphiti_core/search/search_utils.py

+            MATCH (n:Entity)-[e]->(m:Entity)
+            WITH DISTINCT e, n, m, score


In the FalkorDB vector branch, you already have the relationship bound as e from db.idx.vector.queryRelationships. Re-matching with MATCH (n:Entity)-[e]->(m:Entity) is redundant and can be problematic (variable rebinding / extra work). Prefer WITH e, score, startNode(e) AS n, endNode(e) AS m (optionally assert labels) and drop the WITH DISTINCT if it’s no longer needed.

Suggested change

MATCH (n:Entity)-[e]->(m:Entity)

WITH DISTINCT e, n, m, score

WITH e, score, startNode(e) AS n, endNode(e) AS m

Edge similarity search re-match — Replaced MATCH (n:Entity)-[e]->(m:Entity) / WITH DISTINCT e, n, m, score with WITH e, score, startNode(e) AS n, endNode(e) AS m, consistent with the fix already applied in edge_fulltext_search.

Copilot · 2026-03-01T19:39:27Z

graphiti_core/search/search_utils.py

+    elif driver.provider == GraphProvider.FALKORDB:
+        # Use HNSW vector index for O(log n) search instead of brute-force scan.
+        # Over-fetch to compensate for post-filtering on group_id, edge_uuids, etc.
+        over_fetch_limit = limit * 10
+
+        post_filter_parts = list(filter_queries)
+        post_filter_parts.append('score > $min_score')
+        post_filter = ' WHERE ' + ' AND '.join(post_filter_parts)
+
+        query = (
+            'CALL db.idx.vector.queryRelationships('
+            "'RELATES_TO', 'fact_embedding', $over_fetch_limit, vecf32($search_vector))"
+            """
+            YIELD relationship AS e, score
+            MATCH (n:Entity)-[e]->(m:Entity)
+            WITH DISTINCT e, n, m, score
+            """
+            + post_filter
+            + """
+            RETURN
+            """
+            + get_entity_edge_return_query(driver.provider)
+            + """
+            ORDER BY score DESC
+            LIMIT $limit
+            """
+        )
+
+        records, _, _ = await driver.execute_query(
+            query,
+            search_vector=search_vector,
+            over_fetch_limit=over_fetch_limit,
+            limit=limit,
+            min_score=min_score,
+            routing_='r',
+            **filter_params,
+        )


Test coverage gap: the new FalkorDB HNSW branches (vector queries + post-filtering/min_score behavior) aren’t exercised by the existing search tests (they currently skip FalkorDB). Consider adding a FalkorDBLite-backed integration test or a unit test that asserts the generated Cypher + parameters for the FalkorDB provider.

Test coverage — Added 11 unit tests for the FalkorDB HNSW branches in edge_similarity_search, node_similarity_search, and community_similarity_search. Tests verify correct HNSW index queries, startNode/endNode usage, over-fetch limits, group_id filtering, and min_score filtering.

Copilot · 2026-03-01T19:39:27Z

graphiti_core/driver/falkordb_driver.py

+                elif 'VECTOR' in index_type:
+                    if entity_type == 'NODE':
+                        drop_tasks.append(
+                            self.execute_query(
+                                f'DROP VECTOR INDEX FOR (n:{label}) ON (n.{field_name})'
+                            )
+                        )
+                    elif entity_type == 'RELATIONSHIP':
+                        drop_tasks.append(
+                            self.execute_query(
+                                f'DROP VECTOR INDEX FOR ()-[e:{label}]-() ON (e.{field_name})'
+                            )
+                        )


delete_all_indexes() now drops VECTOR indexes here, but the FalkorDB GraphMaintenanceOperations implementation (graphiti_core/driver/falkordb/operations/graph_ops.py) still only drops RANGE/FULLTEXT. This duplication can lead to different behavior depending on which API a caller uses; consider updating the operations implementation too or consolidating index management in one place.

Suggested change

elif 'VECTOR' in index_type:

if entity_type == 'NODE':

drop_tasks.append(

self.execute_query(

f'DROP VECTOR INDEX FOR (n:{label}) ON (n.{field_name})'

)

)

elif entity_type == 'RELATIONSHIP':

drop_tasks.append(

self.execute_query(

f'DROP VECTOR INDEX FOR ()-[e:{label}]-() ON (e.{field_name})'

)

)

delete_all_indexes duplication — Added elif 'VECTOR' in index_type: branch to FalkorGraphMaintenanceOperations.delete_all_indexes() in graph_ops.py, matching the pattern already in FalkorDriver.delete_all_indexes().

Copilot · 2026-03-01T19:39:27Z

graphiti_core/driver/falkordb_driver.py

+        index_queries = (
+            get_range_indices(self.provider)
+            + get_fulltext_indices(self.provider)
+            + get_vector_indices(self.provider)
+        )


build_indices_and_constraints() now includes vector index creation, but the FalkorDB GraphMaintenanceOperations implementation builds only range/fulltext indices. To avoid inconsistent index state depending on entrypoint, update the operations implementation to include get_vector_indices() (or remove the duplication).

build_indices_and_constraints duplication — Added get_vector_indices() to FalkorGraphMaintenanceOperations.build_indices_and_constraints(), keeping it in sync with FalkorDriver.build_indices_and_constraints().

- Use startNode(e)/endNode(e) in edge_similarity_search instead of MATCH re-scan, consistent with edge_fulltext_search fix - Add VECTOR index support to FalkorGraphMaintenanceOperations (build_indices_and_constraints and delete_all_indexes) - Add unit tests for FalkorDB HNSW branches in edge, node, and community similarity search Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

verveguy · 2026-03-01T19:55:13Z

Thanks for the review! I've addressed all four feedback items in commit 2193599:

Edge similarity search re-match — Replaced MATCH (n:Entity)-[e]->(m:Entity) / WITH DISTINCT e, n, m, score with WITH e, score, startNode(e) AS n, endNode(e) AS m, consistent with the fix already applied in edge_fulltext_search.
Test coverage — Added 11 unit tests for the FalkorDB HNSW branches in edge_similarity_search, node_similarity_search, and community_similarity_search. Tests verify correct HNSW index queries, startNode/endNode usage, over-fetch limits, group_id filtering, and min_score filtering.
delete_all_indexes duplication — Added elif 'VECTOR' in index_type: branch to FalkorGraphMaintenanceOperations.delete_all_indexes() in graph_ops.py, matching the pattern already in FalkorDriver.delete_all_indexes().
build_indices_and_constraints duplication — Added get_vector_indices() to FalkorGraphMaintenanceOperations.build_indices_and_constraints(), keeping it in sync with FalkorDriver.build_indices_and_constraints().

verveguy · 2026-03-01T20:06:58Z

I have read the CLA Document and I hereby sign the CLA

recheck

…env var Avoids duplicating the EMBEDDING_DIM default value in get_vector_indices(), keeping it in sync with the canonical source in embedder/client.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

verveguy · 2026-03-01T20:34:37Z

Pushed f39e686: imports EMBEDDING_DIM from graphiti_core.embedder.client instead of re-reading os.getenv('EMBEDDING_DIM', 1024) in get_vector_indices(), keeping the default in sync with the canonical source.

verveguy · 2026-03-01T20:43:01Z

I have read the CLA Document and I hereby sign the CLA

verveguy · 2026-03-01T20:43:08Z

recheck

claude and others added 2 commits March 1, 2026 13:52

Copilot AI review requested due to automatic review settings March 1, 2026 19:35

Copilot started reviewing on behalf of verveguy March 1, 2026 19:35 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

verveguy mentioned this pull request Mar 1, 2026

FalkorDB HNSW vector indices + fulltext search fix verveguy/graphiti#1

Open

3 tasks

fix: import EMBEDDING_DIM from embedder/client instead of re-reading …

f39e686

…env var Avoids duplicating the EMBEDDING_DIM default value in get_vector_indices(), keeping it in sync with the canonical source in embedder/client.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

verveguy mentioned this pull request Mar 1, 2026

feat: add model size routing to AnthropicClient #1248

Open

4 tasks

		MATCH (n:Entity)-[e]->(m:Entity)
		WITH DISTINCT e, n, m, score

	MATCH (n:Entity)-[e]->(m:Entity)
	WITH DISTINCT e, n, m, score
	WITH e, score, startNode(e) AS n, endNode(e) AS m

Conversation

verveguy commented Mar 1, 2026

Summary

Problem

Solution

Results (measured on real workload)

Files changed

Known limitations

Test plan

Uh oh!

danielchalef commented Mar 1, 2026

Uh oh!

verveguy commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

verveguy Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

verveguy Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

verveguy Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

verveguy Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

verveguy commented Mar 1, 2026

Uh oh!

verveguy commented Mar 1, 2026

Uh oh!

verveguy commented Mar 1, 2026

Uh oh!

verveguy commented Mar 1, 2026

Uh oh!

verveguy commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants