Skip to content

feat(milvus): add Milvus/Zilliz Cloud vector store backend#1273

Open
supmo668 wants to merge 7 commits intogetzep:mainfrom
supmo668:feat/milvus-vector-store
Open

feat(milvus): add Milvus/Zilliz Cloud vector store backend#1273
supmo668 wants to merge 7 commits intogetzep:mainfrom
supmo668:feat/milvus-vector-store

Conversation

@supmo668
Copy link
Contributor

Summary

Milvus/Zilliz Cloud implementation for the pluggable VectorStoreClient interface introduced in #1264. Provides hybrid vector + BM25 search as an acceleration layer alongside any graph database (Neo4j or FalkorDB).

Depends on: #1264 (Part 1 — provider-agnostic plugin interface)
RFC: #1263

Type of Change

  • New feature

Objective

Enable Graphiti users to leverage Milvus/Zilliz Cloud for high-performance hybrid retrieval (dense vector + BM25 sparse) while keeping their existing graph database for entity resolution, episode storage, and graph traversal. This is an overlay, not a replacement — the graph DB remains the source of truth.

Architecture

flowchart LR
    subgraph "Write Path"
        A[Graphiti Core] -->|primary| B[Graph DB<br/>Neo4j / FalkorDB]
        A -->|dual-write| C[Milvus / Zilliz]
    end
    subgraph "Read Path"
        D[Search Query] --> E{SearchInterface}
        E -->|hybrid search| C
        E -->|graph traversal| B
    end
Loading

Key Features

  • 4 collections: entity_nodes, entity_edges, episodic_nodes, community_nodes
  • Hybrid search: HNSW dense vectors (COSINE) + BM25 sparse vectors
  • Multi-tenancy: group_id as partition key
  • Auto-attach: MilvusSearchInterface auto-wired when MilvusVectorStoreClient is passed
  • Backfill utility: Sync existing graph DB data to Milvus for brownfield deployments
  • MCP server: Full Docker Compose stacks for Neo4j+Milvus and FalkorDB+Milvus
  • MILVUS_URI auto-detect: Set the env var and the MCP server auto-configures

New Files

File Purpose
vector_store/milvus_client.py MilvusVectorStoreClient — 10 domain-aware methods
vector_store/milvus_utils.py Collection schemas, serialization, constants
vector_store/milvus_search_interface.py SearchInterface impl for hybrid retrieval
vector_store/milvus_graph_operations.py GraphOperationsInterface impl for CRUD
utils/vector_store_sync.py Backfill utility for graph DB → Milvus sync

Testing

  • Unit tests added/updated (153 pass)
  • Integration tests added/updated (10 pass against Zilliz Cloud)
  • All existing tests pass
Test File Count Description
test_milvus_vector_store_client.py 46 Client save/delete/ensure_ready/errors
test_milvus_utils.py ~30 Schema and serialization helpers
test_milvus_search_interface.py ~30 SearchInterface contract
test_milvus_graph_operations.py ~30 GraphOperationsInterface contract
test_milvus_search_int.py 10 Integration tests (Zilliz Cloud)
test_vector_store_sync.py ~20 Backfill utility
test_graphiti_vector_store.py +3 Milvus auto-attach tests

Breaking Changes

  • This PR contains breaking changes

No breaking changes. Milvus is an optional extra (pip install "graphiti-core[milvus]"). All existing behavior is unchanged.

Checklist

  • Code follows project style guidelines (make lint passes — 0 errors)
  • Self-review completed
  • No secrets or sensitive information committed
  • pyright passes (0 errors, 0 warnings)

Related Issues

Part 2 of RFC #1263
Depends on #1264 (Part 1)

supmo668 and others added 7 commits February 25, 2026 15:36
Add a provider-agnostic base class for vector store backends with 10
domain-aware methods (save/delete for entity nodes, entity edges,
episodic nodes, and community nodes plus bulk clear operations).

- graphiti_core/vector_store/client.py: VectorStoreClient base class
  with non-abstract methods that raise NotImplementedError for backward
  compatibility
- graphiti_core/vector_store/__init__.py: export only base classes (no
  optional provider imports per CONTRIBUTING.md)
- graphiti_core/driver/driver.py: add vector_store attribute (Any type
  to avoid circular imports)
- pyproject.toml: add milvus optional extra and pytest-asyncio dev dep

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Insert best-effort vector store sync points after graph DB writes so
that a plugged-in VectorStoreClient stays in sync without requiring
changes to the graph database layer.

Hooks added in:
- nodes.py: save/delete for entity, episodic, and community nodes
- edges.py: save/delete for entity edges
- bulk_utils.py: bulk entity node/edge save after batch writes
- community_operations.py: delete community nodes on removal
- graph_data_operations.py: clear_all on full data wipe
- edge_operations.py / node_operations.py: formatting cleanup

All hooks are guarded by `if driver.vector_store is not None` and
wrapped in try/except to keep vector store failures non-fatal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete the provider-agnostic integration layer:

- graphiti_core/graphiti.py: accept vector_store param, attach to
  driver, call ensure_ready() on build_indices, close on shutdown
- mcp_server: VectorStoreAppConfig schema, VectorStoreFactory skeleton
  (no provider implementations yet), server wiring for vector store
  lifecycle
- tests/test_dual_write.py: 23 unit tests covering all dual-write
  hooks (save, delete, bulk, error resilience)
- tests/test_graphiti_vector_store.py: 7 unit tests for constructor
  integration, lifecycle (close, build_indices), and generic vector
  store behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement MilvusVectorStoreClient with 4 collections (entity_nodes,
entity_edges, episodic_nodes, community_nodes), hybrid search via
HNSW dense vectors + BM25 sparse vectors, and group_id partition keys
for multi-tenancy.

- milvus_client.py: MilvusVectorStoreClient extending VectorStoreClient
  with all 10 domain-aware methods, lazy pymilvus import, COSINE HNSW
- milvus_utils.py: collection schemas, serialization helpers, constants
  (INT64 epoch ms for datetimes, 0 = null sentinel)
- milvus_search_interface.py: SearchInterface impl for hybrid retrieval
- milvus_graph_operations.py: GraphOperationsInterface impl for CRUD

Requires pymilvus>=2.5.3 (installed via graphiti-core[milvus] extra).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- graphiti.py: auto-attach MilvusSearchInterface when a
  MilvusVectorStoreClient is passed (lazy import, try/except guard)
- vector_store_sync.py: backfill utility to sync existing graph DB
  data into Milvus collections for brownfield deployments
- search.py: add query text to debug log for observability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- schema.py: MilvusProviderConfig, VectorStoreProvidersConfig
- factories.py: VectorStoreFactory milvus case with env var overrides
- graphiti_mcp_server.py: MILVUS_URI auto-detect, MilvusSearchInterface
  auto-attach after Graphiti client creation
- Dockerfile.standalone: INSTALL_MILVUS build arg for optional pymilvus
- Docker Compose stacks for Neo4j+Milvus and FalkorDB+Milvus
- YAML configs for both deployment variants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_milvus_vector_store_client.py: 46 tests for MilvusVectorStoreClient
  (save, delete, ensure_ready, error handling, serialization)
- test_milvus_utils.py: schema and utility function tests
- test_milvus_search_interface.py: SearchInterface contract tests
- test_milvus_graph_operations.py: GraphOperationsInterface tests
- test_milvus_search_int.py: 10 integration tests against Zilliz Cloud
  (hybrid search, BM25, date filtering, edge cases)
- test_vector_store_sync.py: backfill utility tests
- test_graphiti_vector_store.py: Milvus auto-attach and override tests

153 unit tests pass, 10 integration tests pass against Zilliz Cloud.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@supmo668
Copy link
Contributor Author

All CI checks pass (ruff, pyright, unit-tests, database-integration-tests, CodeQL). Rebased on latest main.

This is Part 2 of the vector store integration (RFC #1263), building on the plugin interface in #1264. The Milvus implementation is fully self-contained — 153 unit tests + 10 integration tests against Zilliz Cloud all pass.

Happy to walk through any of the implementation choices or address feedback. @danielchalef @prasmussen15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant