Skip to content

perf: add FalkorDB HNSW vector indices and fix O(n) fulltext re-match#1287

Open
verveguy wants to merge 4 commits intogetzep:mainfrom
verveguy:upstream/falkordb-vector-indices
Open

perf: add FalkorDB HNSW vector indices and fix O(n) fulltext re-match#1287
verveguy wants to merge 4 commits intogetzep:mainfrom
verveguy:upstream/falkordb-vector-indices

Conversation

@verveguy
Copy link

@verveguy verveguy commented Mar 1, 2026

Summary

Fixes two major performance bottlenecks in FalkorDB similarity and fulltext search that cause episode ingestion to take 2-3 minutes per episode instead of 15-25 seconds.

Problem

  1. Vector similarity searches use brute-force O(n) scans: edge_similarity_search, node_similarity_search, and community_similarity_search compute vec.cosineDistance() against every node/edge in the graph, then sort and limit. With a growing graph, each search takes 12-65 seconds.

  2. Edge fulltext search re-matches by uuid: After db.idx.fulltext.queryRelationships returns results, the query does MATCH (n:Entity)-[e:RELATES_TO {uuid: rel.uuid}]->(m:Entity) which scans all RELATES_TO edges to find each match by property. This takes 20-45 seconds per search.

Solution

HNSW vector indices (commit 1):

  • Add get_vector_indices() function that creates HNSW indices on Entity.name_embedding, Community.name_embedding, and RELATES_TO.fact_embedding
  • Dimension read from EMBEDDING_DIM env var (default 1024)
  • Wire index creation into FalkorDriver.build_indices_and_constraints() and deletion into delete_all_indexes()
  • Add FalkorDB-specific search branches using db.idx.vector.queryNodes / db.idx.vector.queryRelationships for O(log n) approximate nearest neighbor search
  • Over-fetch by 10x to compensate for post-filtering on group_id, edge_uuids, etc.

Fulltext re-match fix (commit 2):

  • FalkorDB's queryRelationships already returns the actual relationship object
  • Use startNode(e) / endNode(e) directly instead of re-matching by uuid
  • Eliminates the graph-wide scan entirely

Results (measured on real workload)

Search function Before After
edge_similarity_search 12-65s 2-9 ms
edge_fulltext_search 20-45s 5-260 ms
node_similarity_search seconds 3-50 ms
Total episode time ~170s 15-25s

Files changed

  • graphiti_core/graph_queries.py — add get_vector_indices()
  • graphiti_core/driver/falkordb_driver.py — wire index creation + deletion
  • graphiti_core/search/search_utils.py — add indexed search branches for all 3 similarity functions + fix fulltext re-match

Known limitations

  • FalkorDB issue Add support for FalkorDB Graph database alongside Neo4j usage #525: queryRelationships may return score=0. HNSW ordering is still preserved; only min_score filtering is affected for edges.
  • Index creation is asynchronous. On first run, searches may briefly use brute-force until the index status changes from UNDER CONSTRUCTION to OPERATIONAL.

Test plan

  • make lint passes (ruff + pyright)
  • make test passes (147 unit tests)
  • Tested with real FalkorDBLite workload — search times confirmed at millisecond range
  • Test with full FalkorDB server deployment

🤖 Generated with Claude Code

claude and others added 2 commits March 1, 2026 13:52
Replace brute-force O(n) cosine distance scans with HNSW indexed
lookups for edge, node, and community similarity searches. This
addresses the primary performance bottleneck where vector searches
were taking 12-33 seconds each during episode ingestion.

Changes:
- Add get_vector_indices() to create HNSW indices on Entity.name_embedding,
  Community.name_embedding, and RELATES_TO.fact_embedding
- Wire vector index creation/deletion in FalkorDriver
- Add FalkorDB-specific indexed search branches using
  db.idx.vector.queryNodes/queryRelationships in search_utils.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FalkorDB's queryRelationships returns the actual relationship object,
so use startNode/endNode directly instead of re-matching all RELATES_TO
edges by uuid property. This was causing 20-45 second fulltext searches
due to graph-wide scans on every result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 1, 2026 19:35
@danielchalef
Copy link
Member


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@verveguy
Copy link
Author

verveguy commented Mar 1, 2026

I have read the CLA Document and I hereby sign the CLA

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves FalkorDB ingestion/search performance by adding vector (HNSW) indexing support for similarity search and removing an O(n) edge re-match pattern in fulltext search, bringing FalkorDB query performance closer to the indexed behavior of other providers.

Changes:

  • Add FalkorDB HNSW vector index creation queries (Entity/Community name embeddings, RELATES_TO fact embedding) and wire them into FalkorDB driver index lifecycle.
  • Add FalkorDB-specific similarity search paths using db.idx.vector.queryNodes / db.idx.vector.queryRelationships with over-fetch + post-filtering.
  • Fix FalkorDB edge fulltext search to use the returned relationship directly (startNode/endNode) instead of re-matching by uuid.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
graphiti_core/search/search_utils.py Adds FalkorDB vector-indexed similarity search branches and fixes FalkorDB edge fulltext re-match performance.
graphiti_core/graph_queries.py Introduces get_vector_indices() to create FalkorDB vector indices using EMBEDDING_DIM.
graphiti_core/driver/falkordb_driver.py Wires vector index creation into build_indices_and_constraints() and vector index deletion into delete_all_indexes().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +455 to +456
MATCH (n:Entity)-[e]->(m:Entity)
WITH DISTINCT e, n, m, score
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the FalkorDB vector branch, you already have the relationship bound as e from db.idx.vector.queryRelationships. Re-matching with MATCH (n:Entity)-[e]->(m:Entity) is redundant and can be problematic (variable rebinding / extra work). Prefer WITH e, score, startNode(e) AS n, endNode(e) AS m (optionally assert labels) and drop the WITH DISTINCT if it’s no longer needed.

Suggested change
MATCH (n:Entity)-[e]->(m:Entity)
WITH DISTINCT e, n, m, score
WITH e, score, startNode(e) AS n, endNode(e) AS m

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge similarity search re-match — Replaced MATCH (n:Entity)-[e]->(m:Entity) / WITH DISTINCT e, n, m, score with WITH e, score, startNode(e) AS n, endNode(e) AS m, consistent with the fix already applied in edge_fulltext_search.

Comment on lines +441 to +477
elif driver.provider == GraphProvider.FALKORDB:
# Use HNSW vector index for O(log n) search instead of brute-force scan.
# Over-fetch to compensate for post-filtering on group_id, edge_uuids, etc.
over_fetch_limit = limit * 10

post_filter_parts = list(filter_queries)
post_filter_parts.append('score > $min_score')
post_filter = ' WHERE ' + ' AND '.join(post_filter_parts)

query = (
'CALL db.idx.vector.queryRelationships('
"'RELATES_TO', 'fact_embedding', $over_fetch_limit, vecf32($search_vector))"
"""
YIELD relationship AS e, score
MATCH (n:Entity)-[e]->(m:Entity)
WITH DISTINCT e, n, m, score
"""
+ post_filter
+ """
RETURN
"""
+ get_entity_edge_return_query(driver.provider)
+ """
ORDER BY score DESC
LIMIT $limit
"""
)

records, _, _ = await driver.execute_query(
query,
search_vector=search_vector,
over_fetch_limit=over_fetch_limit,
limit=limit,
min_score=min_score,
routing_='r',
**filter_params,
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage gap: the new FalkorDB HNSW branches (vector queries + post-filtering/min_score behavior) aren’t exercised by the existing search tests (they currently skip FalkorDB). Consider adding a FalkorDBLite-backed integration test or a unit test that asserts the generated Cypher + parameters for the FalkorDB provider.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage — Added 11 unit tests for the FalkorDB HNSW branches in edge_similarity_search, node_similarity_search, and community_similarity_search. Tests verify correct HNSW index queries, startNode/endNode usage, over-fetch limits, group_id filtering, and min_score filtering.

Comment on lines +295 to +307
elif 'VECTOR' in index_type:
if entity_type == 'NODE':
drop_tasks.append(
self.execute_query(
f'DROP VECTOR INDEX FOR (n:{label}) ON (n.{field_name})'
)
)
elif entity_type == 'RELATIONSHIP':
drop_tasks.append(
self.execute_query(
f'DROP VECTOR INDEX FOR ()-[e:{label}]-() ON (e.{field_name})'
)
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete_all_indexes() now drops VECTOR indexes here, but the FalkorDB GraphMaintenanceOperations implementation (graphiti_core/driver/falkordb/operations/graph_ops.py) still only drops RANGE/FULLTEXT. This duplication can lead to different behavior depending on which API a caller uses; consider updating the operations implementation too or consolidating index management in one place.

Suggested change
elif 'VECTOR' in index_type:
if entity_type == 'NODE':
drop_tasks.append(
self.execute_query(
f'DROP VECTOR INDEX FOR (n:{label}) ON (n.{field_name})'
)
)
elif entity_type == 'RELATIONSHIP':
drop_tasks.append(
self.execute_query(
f'DROP VECTOR INDEX FOR ()-[e:{label}]-() ON (e.{field_name})'
)
)

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete_all_indexes duplication — Added elif 'VECTOR' in index_type: branch to FalkorGraphMaintenanceOperations.delete_all_indexes() in graph_ops.py, matching the pattern already in FalkorDriver.delete_all_indexes().

Comment on lines +315 to +319
index_queries = (
get_range_indices(self.provider)
+ get_fulltext_indices(self.provider)
+ get_vector_indices(self.provider)
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_indices_and_constraints() now includes vector index creation, but the FalkorDB GraphMaintenanceOperations implementation builds only range/fulltext indices. To avoid inconsistent index state depending on entrypoint, update the operations implementation to include get_vector_indices() (or remove the duplication).

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_indices_and_constraints duplication — Added get_vector_indices() to FalkorGraphMaintenanceOperations.build_indices_and_constraints(), keeping it in sync with FalkorDriver.build_indices_and_constraints().

- Use startNode(e)/endNode(e) in edge_similarity_search instead of
  MATCH re-scan, consistent with edge_fulltext_search fix
- Add VECTOR index support to FalkorGraphMaintenanceOperations
  (build_indices_and_constraints and delete_all_indexes)
- Add unit tests for FalkorDB HNSW branches in edge, node, and
  community similarity search

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@verveguy
Copy link
Author

verveguy commented Mar 1, 2026

Thanks for the review! I've addressed all four feedback items in commit 2193599:

  1. Edge similarity search re-match — Replaced MATCH (n:Entity)-[e]->(m:Entity) / WITH DISTINCT e, n, m, score with WITH e, score, startNode(e) AS n, endNode(e) AS m, consistent with the fix already applied in edge_fulltext_search.

  2. Test coverage — Added 11 unit tests for the FalkorDB HNSW branches in edge_similarity_search, node_similarity_search, and community_similarity_search. Tests verify correct HNSW index queries, startNode/endNode usage, over-fetch limits, group_id filtering, and min_score filtering.

  3. delete_all_indexes duplication — Added elif 'VECTOR' in index_type: branch to FalkorGraphMaintenanceOperations.delete_all_indexes() in graph_ops.py, matching the pattern already in FalkorDriver.delete_all_indexes().

  4. build_indices_and_constraints duplication — Added get_vector_indices() to FalkorGraphMaintenanceOperations.build_indices_and_constraints(), keeping it in sync with FalkorDriver.build_indices_and_constraints().

@verveguy
Copy link
Author

verveguy commented Mar 1, 2026

I have read the CLA Document and I hereby sign the CLA

recheck

…env var

Avoids duplicating the EMBEDDING_DIM default value in get_vector_indices(),
keeping it in sync with the canonical source in embedder/client.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@verveguy
Copy link
Author

verveguy commented Mar 1, 2026

Pushed f39e686: imports EMBEDDING_DIM from graphiti_core.embedder.client instead of re-reading os.getenv('EMBEDDING_DIM', 1024) in get_vector_indices(), keeping the default in sync with the canonical source.

@verveguy
Copy link
Author

verveguy commented Mar 1, 2026

I have read the CLA Document and I hereby sign the CLA

@verveguy
Copy link
Author

verveguy commented Mar 1, 2026

recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants