-
Notifications
You must be signed in to change notification settings - Fork 504
Description
Problem
Collection names are currently derived from the MD5 hash of the absolute local file path:
// context.js
const normalizedPath = path.resolve(codebasePath);
const hash = crypto.createHash('md5').update(normalizedPath).digest('hex');
return `${prefix}_${hash.substring(0, 8)}`;This means two users indexing the same codebase on different machines (e.g. /Users/alice/monorepo vs /Users/bob/projects/monorepo) produce different collection names and cannot share indexes. On a team, this results in:
- Redundant collections consuming Zilliz Cloud quota (especially painful on the free tier's 5-collection limit)
- Redundant embedding API calls (and cost) for the same codebase
- No way for a team to maintain a single shared index
Proposed Solution
Use a stable, machine-independent identifier for collection naming when available. The most natural candidate for code repositories is the git remote URL:
// Example: derive hash from git remote origin URL instead of local path
const remoteUrl = execSync('git -C <path> remote get-url origin').toString().trim();
const hash = crypto.createHash('md5').update(remoteUrl).digest('hex');This could be implemented as a fallback chain:
- If the user provides an explicit
collectionNameparameter — use that - If the indexed path is a git repo — hash the remote origin URL (+ optional subpath for monorepos)
- Fall back to the current behavior (hash of absolute path)
For monorepos, the hash input could be ${remoteUrl}:${relativeSubpath} so that indexing different subdirectories still produces distinct collections.
Additional Benefits
- Teams share a single Zilliz Cloud token and automatically reuse each other's indexes
- Re-cloning or moving a repo doesn't orphan the old collection
- Git worktrees of the same repo would share indexes instead of consuming extra slots
Related Bug: [object Object] error when collection limit is reached
While debugging this, we discovered that when the Zilliz free tier collection limit (5) is hit, the error surfaces as:
Error validating collection creation: [object Object]
This is because checkCollectionLimit() in milvus-vectordb.js checks error.message for the "exceeded the limit" pattern, but the Milvus SDK returns a plain object (not an Error instance) with the info in .reason and .detail instead of .message:
{
"error_code": "UnexpectedError",
"reason": "exceeded the limit number of collections[dbName=...][limit=5]",
"code": 102,
"retriable": false,
"detail": "exceeded the limit number of collections[dbName=...][limit=5]"
}The fix in milvus-vectordb.js line ~627 would be:
const errorMessage = error.message || error.reason || error.detail || JSON.stringify(error);And similarly in handlers.js line ~222:
text: `Error validating collection creation: ${validationError.message || validationError.reason || JSON.stringify(validationError)}`