-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[Bug]: Elasticsearch Terms queries on text fields silently match 0 documents — delete, filter, and update operations broken #873
Description
相关组件
后端服务 (elasticsearch/v7/repository.go, elasticsearch/v8/repository.go)
Bug 描述
Commit 3daaabf (refactor(elasticsearch): remove ".keyword" suffix from ID fields in Elasticsearch queries) removed the .keyword suffix from all Terms queries in both v7 and v8 Elasticsearch repositories (22 locations total), claiming alignment with "new index mapping requirements."
However, createIndexIfNotExists() creates indexes without any explicit mapping — no dynamic_templates, no field-level mapping. Elasticsearch's default dynamic mapping maps string fields as text with a .keyword sub-field:
"knowledge_id": {"type": "text", "fields": {"keyword": {"type": "keyword", "ignore_above": 256}}}Terms queries require keyword type for exact matching. When executed against a text field, they silently match 0 documents and return success — no error is raised.
Impact (all 22 locations in v7 + v8):
| Category | Functions | Effect |
|---|---|---|
| Delete (6) | DeleteByChunkIDList, DeleteBySourceIDList, DeleteByKnowledgeIDList |
Delete reports success but 0 documents actually deleted — orphaned embeddings persist in index |
| Search filter (10) | getBaseConds — KB, knowledge, tag, exclude knowledge, exclude chunk |
Filters silently ignored — search returns results across all KBs instead of scoped results |
| Update (6) | BatchUpdateEnabled (enable/disable), BatchUpdateRecommended/BatchUpdateChunkTagID |
Update targets 0 documents — chunk enable/disable/tag operations silently fail |
Evidence (from our deployment using OpenSearch as both keyword search and vector DB):
# App log — falsely reports success
19:57:28.736 [Elasticsearch] Deleting indices by knowledge IDs, count: 1
19:57:29.107 [Elasticsearch] Successfully deleted documents by knowledge IDs
# Direct OpenSearch query — 25 documents with embeddings still present
GET /index/_search {"query":{"match":{"knowledge_id":"f817466e-..."}}}
→ hits.total.value: 25
The structs.go comment added in the same commit documents an "expected" mapping with dynamic_templates and explicit keyword fields, but this mapping is never applied in code. It is aspirational documentation that was mistaken for implemented behavior.
期望行为
- All
Termsqueries use.keywordsuffix (e.g.,knowledge_id.keyword) to match against the keyword sub-field - Delete operations actually remove documents from the index
- Search filters correctly scope results to the specified KB/knowledge/tag
- Batch update operations correctly target the specified chunks
建议的修复方案
git revert 3daaabf — this cleanly restores .keyword in all 22 query locations (v7 + v8) and removes the misleading mapping documentation from structs.go.
The .keyword suffix works correctly for both:
- Existing indexes: default dynamic mapping creates
text+.keywordsub-field - Any future indexes with explicit
keywordmapping: accessing.keywordon akeywordfield is a no-op (returns the field itself)
No conflicts: zero commits have touched these 3 files since 3daaabf.
操作系统
All (server-side logic bug, platform-independent)
确认事项
- 我已经搜索了现有的 issues,确认这是一个新问题