perf: add FalkorDB HNSW vector indices and fix O(n) fulltext re-match#1287
perf: add FalkorDB HNSW vector indices and fix O(n) fulltext re-match#1287verveguy wants to merge 4 commits intogetzep:mainfrom
Conversation
Replace brute-force O(n) cosine distance scans with HNSW indexed lookups for edge, node, and community similarity searches. This addresses the primary performance bottleneck where vector searches were taking 12-33 seconds each during episode ingestion. Changes: - Add get_vector_indices() to create HNSW indices on Entity.name_embedding, Community.name_embedding, and RELATES_TO.fact_embedding - Wire vector index creation/deletion in FalkorDriver - Add FalkorDB-specific indexed search branches using db.idx.vector.queryNodes/queryRelationships in search_utils.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FalkorDB's queryRelationships returns the actual relationship object, so use startNode/endNode directly instead of re-matching all RELATES_TO edges by uuid property. This was causing 20-45 second fulltext searches due to graph-wide scans on every result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Pull request overview
This PR improves FalkorDB ingestion/search performance by adding vector (HNSW) indexing support for similarity search and removing an O(n) edge re-match pattern in fulltext search, bringing FalkorDB query performance closer to the indexed behavior of other providers.
Changes:
- Add FalkorDB HNSW vector index creation queries (Entity/Community name embeddings, RELATES_TO fact embedding) and wire them into FalkorDB driver index lifecycle.
- Add FalkorDB-specific similarity search paths using
db.idx.vector.queryNodes/db.idx.vector.queryRelationshipswith over-fetch + post-filtering. - Fix FalkorDB edge fulltext search to use the returned relationship directly (
startNode/endNode) instead of re-matching byuuid.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
graphiti_core/search/search_utils.py |
Adds FalkorDB vector-indexed similarity search branches and fixes FalkorDB edge fulltext re-match performance. |
graphiti_core/graph_queries.py |
Introduces get_vector_indices() to create FalkorDB vector indices using EMBEDDING_DIM. |
graphiti_core/driver/falkordb_driver.py |
Wires vector index creation into build_indices_and_constraints() and vector index deletion into delete_all_indexes(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
graphiti_core/search/search_utils.py
Outdated
| MATCH (n:Entity)-[e]->(m:Entity) | ||
| WITH DISTINCT e, n, m, score |
There was a problem hiding this comment.
In the FalkorDB vector branch, you already have the relationship bound as e from db.idx.vector.queryRelationships. Re-matching with MATCH (n:Entity)-[e]->(m:Entity) is redundant and can be problematic (variable rebinding / extra work). Prefer WITH e, score, startNode(e) AS n, endNode(e) AS m (optionally assert labels) and drop the WITH DISTINCT if it’s no longer needed.
| MATCH (n:Entity)-[e]->(m:Entity) | |
| WITH DISTINCT e, n, m, score | |
| WITH e, score, startNode(e) AS n, endNode(e) AS m |
There was a problem hiding this comment.
Edge similarity search re-match — Replaced MATCH (n:Entity)-[e]->(m:Entity) / WITH DISTINCT e, n, m, score with WITH e, score, startNode(e) AS n, endNode(e) AS m, consistent with the fix already applied in edge_fulltext_search.
| elif driver.provider == GraphProvider.FALKORDB: | ||
| # Use HNSW vector index for O(log n) search instead of brute-force scan. | ||
| # Over-fetch to compensate for post-filtering on group_id, edge_uuids, etc. | ||
| over_fetch_limit = limit * 10 | ||
|
|
||
| post_filter_parts = list(filter_queries) | ||
| post_filter_parts.append('score > $min_score') | ||
| post_filter = ' WHERE ' + ' AND '.join(post_filter_parts) | ||
|
|
||
| query = ( | ||
| 'CALL db.idx.vector.queryRelationships(' | ||
| "'RELATES_TO', 'fact_embedding', $over_fetch_limit, vecf32($search_vector))" | ||
| """ | ||
| YIELD relationship AS e, score | ||
| MATCH (n:Entity)-[e]->(m:Entity) | ||
| WITH DISTINCT e, n, m, score | ||
| """ | ||
| + post_filter | ||
| + """ | ||
| RETURN | ||
| """ | ||
| + get_entity_edge_return_query(driver.provider) | ||
| + """ | ||
| ORDER BY score DESC | ||
| LIMIT $limit | ||
| """ | ||
| ) | ||
|
|
||
| records, _, _ = await driver.execute_query( | ||
| query, | ||
| search_vector=search_vector, | ||
| over_fetch_limit=over_fetch_limit, | ||
| limit=limit, | ||
| min_score=min_score, | ||
| routing_='r', | ||
| **filter_params, | ||
| ) |
There was a problem hiding this comment.
Test coverage gap: the new FalkorDB HNSW branches (vector queries + post-filtering/min_score behavior) aren’t exercised by the existing search tests (they currently skip FalkorDB). Consider adding a FalkorDBLite-backed integration test or a unit test that asserts the generated Cypher + parameters for the FalkorDB provider.
There was a problem hiding this comment.
Test coverage — Added 11 unit tests for the FalkorDB HNSW branches in edge_similarity_search, node_similarity_search, and community_similarity_search. Tests verify correct HNSW index queries, startNode/endNode usage, over-fetch limits, group_id filtering, and min_score filtering.
| elif 'VECTOR' in index_type: | ||
| if entity_type == 'NODE': | ||
| drop_tasks.append( | ||
| self.execute_query( | ||
| f'DROP VECTOR INDEX FOR (n:{label}) ON (n.{field_name})' | ||
| ) | ||
| ) | ||
| elif entity_type == 'RELATIONSHIP': | ||
| drop_tasks.append( | ||
| self.execute_query( | ||
| f'DROP VECTOR INDEX FOR ()-[e:{label}]-() ON (e.{field_name})' | ||
| ) | ||
| ) |
There was a problem hiding this comment.
delete_all_indexes() now drops VECTOR indexes here, but the FalkorDB GraphMaintenanceOperations implementation (graphiti_core/driver/falkordb/operations/graph_ops.py) still only drops RANGE/FULLTEXT. This duplication can lead to different behavior depending on which API a caller uses; consider updating the operations implementation too or consolidating index management in one place.
| elif 'VECTOR' in index_type: | |
| if entity_type == 'NODE': | |
| drop_tasks.append( | |
| self.execute_query( | |
| f'DROP VECTOR INDEX FOR (n:{label}) ON (n.{field_name})' | |
| ) | |
| ) | |
| elif entity_type == 'RELATIONSHIP': | |
| drop_tasks.append( | |
| self.execute_query( | |
| f'DROP VECTOR INDEX FOR ()-[e:{label}]-() ON (e.{field_name})' | |
| ) | |
| ) |
There was a problem hiding this comment.
delete_all_indexes duplication — Added elif 'VECTOR' in index_type: branch to FalkorGraphMaintenanceOperations.delete_all_indexes() in graph_ops.py, matching the pattern already in FalkorDriver.delete_all_indexes().
| index_queries = ( | ||
| get_range_indices(self.provider) | ||
| + get_fulltext_indices(self.provider) | ||
| + get_vector_indices(self.provider) | ||
| ) |
There was a problem hiding this comment.
build_indices_and_constraints() now includes vector index creation, but the FalkorDB GraphMaintenanceOperations implementation builds only range/fulltext indices. To avoid inconsistent index state depending on entrypoint, update the operations implementation to include get_vector_indices() (or remove the duplication).
There was a problem hiding this comment.
build_indices_and_constraints duplication — Added get_vector_indices() to FalkorGraphMaintenanceOperations.build_indices_and_constraints(), keeping it in sync with FalkorDriver.build_indices_and_constraints().
- Use startNode(e)/endNode(e) in edge_similarity_search instead of MATCH re-scan, consistent with edge_fulltext_search fix - Add VECTOR index support to FalkorGraphMaintenanceOperations (build_indices_and_constraints and delete_all_indexes) - Add unit tests for FalkorDB HNSW branches in edge, node, and community similarity search Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the review! I've addressed all four feedback items in commit 2193599:
|
|
I have read the CLA Document and I hereby sign the CLA recheck |
…env var Avoids duplicating the EMBEDDING_DIM default value in get_vector_indices(), keeping it in sync with the canonical source in embedder/client.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pushed f39e686: imports |
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
Summary
Fixes two major performance bottlenecks in FalkorDB similarity and fulltext search that cause episode ingestion to take 2-3 minutes per episode instead of 15-25 seconds.
Problem
Vector similarity searches use brute-force O(n) scans:
edge_similarity_search,node_similarity_search, andcommunity_similarity_searchcomputevec.cosineDistance()against every node/edge in the graph, then sort and limit. With a growing graph, each search takes 12-65 seconds.Edge fulltext search re-matches by uuid: After
db.idx.fulltext.queryRelationshipsreturns results, the query doesMATCH (n:Entity)-[e:RELATES_TO {uuid: rel.uuid}]->(m:Entity)which scans all RELATES_TO edges to find each match by property. This takes 20-45 seconds per search.Solution
HNSW vector indices (commit 1):
get_vector_indices()function that creates HNSW indices onEntity.name_embedding,Community.name_embedding, andRELATES_TO.fact_embeddingEMBEDDING_DIMenv var (default 1024)FalkorDriver.build_indices_and_constraints()and deletion intodelete_all_indexes()db.idx.vector.queryNodes/db.idx.vector.queryRelationshipsfor O(log n) approximate nearest neighbor searchFulltext re-match fix (commit 2):
queryRelationshipsalready returns the actual relationship objectstartNode(e)/endNode(e)directly instead of re-matching by uuidResults (measured on real workload)
edge_similarity_searchedge_fulltext_searchnode_similarity_searchFiles changed
graphiti_core/graph_queries.py— addget_vector_indices()graphiti_core/driver/falkordb_driver.py— wire index creation + deletiongraphiti_core/search/search_utils.py— add indexed search branches for all 3 similarity functions + fix fulltext re-matchKnown limitations
queryRelationshipsmay returnscore=0. HNSW ordering is still preserved; onlymin_scorefiltering is affected for edges.UNDER CONSTRUCTIONtoOPERATIONAL.Test plan
make lintpasses (ruff + pyright)make testpasses (147 unit tests)🤖 Generated with Claude Code