Skip to content

Support shared indexes across machines via git-remote-based collection naming #271

@RyanColton

Description

@RyanColton

Problem

Collection names are currently derived from the MD5 hash of the absolute local file path:

// context.js
const normalizedPath = path.resolve(codebasePath);
const hash = crypto.createHash('md5').update(normalizedPath).digest('hex');
return `${prefix}_${hash.substring(0, 8)}`;

This means two users indexing the same codebase on different machines (e.g. /Users/alice/monorepo vs /Users/bob/projects/monorepo) produce different collection names and cannot share indexes. On a team, this results in:

  • Redundant collections consuming Zilliz Cloud quota (especially painful on the free tier's 5-collection limit)
  • Redundant embedding API calls (and cost) for the same codebase
  • No way for a team to maintain a single shared index

Proposed Solution

Use a stable, machine-independent identifier for collection naming when available. The most natural candidate for code repositories is the git remote URL:

// Example: derive hash from git remote origin URL instead of local path
const remoteUrl = execSync('git -C <path> remote get-url origin').toString().trim();
const hash = crypto.createHash('md5').update(remoteUrl).digest('hex');

This could be implemented as a fallback chain:

  1. If the user provides an explicit collectionName parameter — use that
  2. If the indexed path is a git repo — hash the remote origin URL (+ optional subpath for monorepos)
  3. Fall back to the current behavior (hash of absolute path)

For monorepos, the hash input could be ${remoteUrl}:${relativeSubpath} so that indexing different subdirectories still produces distinct collections.

Additional Benefits

  • Teams share a single Zilliz Cloud token and automatically reuse each other's indexes
  • Re-cloning or moving a repo doesn't orphan the old collection
  • Git worktrees of the same repo would share indexes instead of consuming extra slots

Related Bug: [object Object] error when collection limit is reached

While debugging this, we discovered that when the Zilliz free tier collection limit (5) is hit, the error surfaces as:

Error validating collection creation: [object Object]

This is because checkCollectionLimit() in milvus-vectordb.js checks error.message for the "exceeded the limit" pattern, but the Milvus SDK returns a plain object (not an Error instance) with the info in .reason and .detail instead of .message:

{
  "error_code": "UnexpectedError",
  "reason": "exceeded the limit number of collections[dbName=...][limit=5]",
  "code": 102,
  "retriable": false,
  "detail": "exceeded the limit number of collections[dbName=...][limit=5]"
}

The fix in milvus-vectordb.js line ~627 would be:

const errorMessage = error.message || error.reason || error.detail || JSON.stringify(error);

And similarly in handlers.js line ~222:

text: `Error validating collection creation: ${validationError.message || validationError.reason || JSON.stringify(validationError)}`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions