Skip to content

fix: increase ANN search K from 50 to 200 for 72K+ person pool#7

Merged
davidhoo merged 1 commit into
mainfrom
fix/ann-search-k-200
May 19, 2026
Merged

fix: increase ANN search K from 50 to 200 for 72K+ person pool#7
davidhoo merged 1 commit into
mainfrom
fix/ann-search-k-200

Conversation

@davidhoo
Copy link
Copy Markdown
Owner

Summary

  • annSearchK 50 → 200:72K+ stranger 规模下 K=50 太小,高相似度候选人(如 77%)被挤出 top-50,导致合并建议遗漏
  • annHNSWEfSearch 100 → 200:HNSW 要求 efSearch >= K,必须同步调整,否则召回率反而变差

Background

实测发现两组相似度明显超过 55% 阈值的人物对(66.8% 和 77.0%)没有出现在合并建议中。查 NAS 数据库确认:

  • 系统共有 72,851 个 stranger,ANN 索引约 36 万个 face 节点
  • 每次巡检 42 个 target × 5 prototype = 210 次查询,K=50 仅覆盖 10,500 个节点
  • 对于胡波(3505 张脸)这类核心人物,embedding 空间中相似度 >66.8% 的 stranger face 节点超过 50 个,347126 被直接挤出排名
  • 改为 K=200 后每次巡检覆盖 42,000 个节点,性能影响极小(每小时多约 1 秒)

Test plan

  • 部署到 NAS 后触发 Rebuild,确认 266568(惠新宸)找回 264982(77%)
  • 确认 264862(胡波)找回 347126(66.8%)
  • 确认巡检 CPU 占用无明显变化

🤖 Generated with Claude Code

With 72K+ stranger persons (~364K face nodes in the HNSW index),
K=50 per prototype query was too small: high-similarity candidates
(e.g. 77%) were being missed because 50+ other stranger face nodes
ranked above them for prominent persons like 胡波 (3505 faces).

Increase annSearchK 50→200 and annHNSWEfSearch 100→200 (must be
>= K per HNSW guidelines) so each of the ~210 per-patrol queries
covers a wider neighborhood, ensuring borderline pairs above the
merge-suggestion threshold (55%) are not silently dropped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@davidhoo davidhoo merged commit 556a777 into main May 19, 2026
2 checks passed
@davidhoo davidhoo deleted the fix/ann-search-k-200 branch May 19, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant