Skip to content

fix: detect concurrent MarkDirty during patrol slice to prevent stale ANN#8

Merged
davidhoo merged 1 commit into
mainfrom
fix/dirty-generation-guard
May 19, 2026
Merged

fix: detect concurrent MarkDirty during patrol slice to prevent stale ANN#8
davidhoo merged 1 commit into
mainfrom
fix/dirty-generation-guard

Conversation

@davidhoo
Copy link
Copy Markdown
Owner

Summary

  • 新增 DirtyGeneration uint64 字段到 personMergeSuggestionState,每次 MarkDirty 调用时递增
  • RunBackgroundSlice 在 slice 开始时捕获 generation,结束时只有 generation 未变才推进 CursorTargetID;若 generation 已变(说明 slice 执行期间有并发 MarkDirty),保持 cursor=0,让下一次运行从头重扫并重建 ANN

Root cause

当 patrol slice 正在运行时,如果用户执行合并操作触发了 MarkDirty(将 cursor 重置为 0),slice 完成时会无条件写入 CursorTargetID = lastTarget.ID,覆盖掉 MarkDirty 的 cursor=0。下一个 slice 看到 cursor > 所有 target ID,认为本轮巡检完成,设 dirty=false——MarkDirty 触发的"从头重扫"信号就此丢失。

实测症状:271495(牛牛)于 15:33 CST 完成合并,ANN 在 15:30 开始构建,用到了合并前的旧 prototype;MarkDirty 在 15:33 重置 cursor=0,但被 15:41 完成的 slice 覆盖,导致 264884(84.5% 相似度)始终不出现在合并建议里。

Test plan

  • 部署后,在 patrol 运行期间触发合并,验证 1 分钟内自动触发重扫(不需等 1 小时)
  • 确认 264884 在下次巡检后出现在 271495 的合并建议中

🤖 Generated with Claude Code

… ANN

If a merge/split fires MarkDirty while a patrol slice is already running,
the slice would finish and overwrite CursorTargetID with lastTarget.ID,
silently discarding the cursor reset (to 0) that MarkDirty had set.
The next "empty-targets" slice would then see cursor > all IDs and mark
dirty=false, so the re-scan from scratch (with a fresh ANN built after
the concurrent write) never happened.

Add DirtyGeneration to personMergeSuggestionState, incremented on every
MarkDirty call. At the end of each slice, only advance CursorTargetID if
the generation is unchanged; otherwise keep cursor=0 so the next run
starts a fresh re-scan with an up-to-date ANN index.

Observed symptom: person 264884 (84.5% similarity with 271495/牛牛) was
missing from merge suggestions because 271495 was merged at 15:33 CST
while the patrol ANN was being built, and the resulting MarkDirty was
overwritten by the slice completion at 15:41 CST.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@davidhoo davidhoo merged commit 57d5bde into main May 19, 2026
2 checks passed
@davidhoo davidhoo deleted the fix/dirty-generation-guard branch May 19, 2026 08:04
davidhoo added a commit that referenced this pull request May 19, 2026
…#9)

The hourly stale re-run adds little value: all person-modifying operations
(merge, split, move_faces, category change, detection, recluster) already
call MarkDirty immediately, so re-running on unchanged data produces
identical suggestions. The DirtyGeneration fix (PR #8) also closed the
race condition where a concurrent MarkDirty could be silently lost.

24h is a more appropriate fallback interval — pure safety net for edge
cases like a missed MarkDirty trigger, not a primary discovery mechanism.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant