Skip to content

[Bug]: Elasticsearch Terms queries on text fields silently match 0 documents — delete, filter, and update operations broken #873

@ochanism

Description

@ochanism

相关组件

后端服务 (elasticsearch/v7/repository.go, elasticsearch/v8/repository.go)

Bug 描述

Commit 3daaabf (refactor(elasticsearch): remove ".keyword" suffix from ID fields in Elasticsearch queries) removed the .keyword suffix from all Terms queries in both v7 and v8 Elasticsearch repositories (22 locations total), claiming alignment with "new index mapping requirements."

However, createIndexIfNotExists() creates indexes without any explicit mapping — no dynamic_templates, no field-level mapping. Elasticsearch's default dynamic mapping maps string fields as text with a .keyword sub-field:

"knowledge_id": {"type": "text", "fields": {"keyword": {"type": "keyword", "ignore_above": 256}}}

Terms queries require keyword type for exact matching. When executed against a text field, they silently match 0 documents and return success — no error is raised.

Impact (all 22 locations in v7 + v8):

Category Functions Effect
Delete (6) DeleteByChunkIDList, DeleteBySourceIDList, DeleteByKnowledgeIDList Delete reports success but 0 documents actually deleted — orphaned embeddings persist in index
Search filter (10) getBaseConds — KB, knowledge, tag, exclude knowledge, exclude chunk Filters silently ignored — search returns results across all KBs instead of scoped results
Update (6) BatchUpdateEnabled (enable/disable), BatchUpdateRecommended/BatchUpdateChunkTagID Update targets 0 documents — chunk enable/disable/tag operations silently fail

Evidence (from our deployment using OpenSearch as both keyword search and vector DB):

# App log — falsely reports success
19:57:28.736  [Elasticsearch] Deleting indices by knowledge IDs, count: 1
19:57:29.107  [Elasticsearch] Successfully deleted documents by knowledge IDs

# Direct OpenSearch query — 25 documents with embeddings still present
GET /index/_search {"query":{"match":{"knowledge_id":"f817466e-..."}}}
→ hits.total.value: 25

The structs.go comment added in the same commit documents an "expected" mapping with dynamic_templates and explicit keyword fields, but this mapping is never applied in code. It is aspirational documentation that was mistaken for implemented behavior.

期望行为

  • All Terms queries use .keyword suffix (e.g., knowledge_id.keyword) to match against the keyword sub-field
  • Delete operations actually remove documents from the index
  • Search filters correctly scope results to the specified KB/knowledge/tag
  • Batch update operations correctly target the specified chunks

建议的修复方案

git revert 3daaabf — this cleanly restores .keyword in all 22 query locations (v7 + v8) and removes the misleading mapping documentation from structs.go.

The .keyword suffix works correctly for both:

  • Existing indexes: default dynamic mapping creates text + .keyword sub-field
  • Any future indexes with explicit keyword mapping: accessing .keyword on a keyword field is a no-op (returns the field itself)

No conflicts: zero commits have touched these 3 files since 3daaabf.

操作系统

All (server-side logic bug, platform-independent)

确认事项

  • 我已经搜索了现有的 issues,确认这是一个新问题

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions