test(ci): de-flake wall-clock-sensitive tests (fuzzy-match perf + AgentModelsTab toast)#213
Merged
Merged
Conversation
Two tests assert wall-clock timing and intermittently fail on shared GitHub runners (both bit us on 2026-06-01): - test_fuzzy_match_10k_under_500ms: hard 500ms bound, failed at 636ms on a push run to main with code that runs ~200ms locally. Now best-of-3 runs (damps scheduler/CPU-frequency noise) with a 3x budget when CI=true. Renamed to test_fuzzy_match_10k_perf_budget. Still catches what it exists for: order-of-magnitude algorithmic regressions. - AgentModelsTab "preset card" toast: inner waitFor 5000ms timeout failed at 5082ms on a PR run. Bumped to 10s inner / 20s outer. Constraint: perf test must still catch O(n^2) regressions — bound relaxed, not removed Rejected: skipping perf test on CI entirely | loses regression coverage where it matters most Confidence: high Scope-risk: narrow Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
CI / Backend (Python 3.12)failed on the push to main after #211 merged — not because of #211, but becausetest_fuzzy_match_10k_under_500msasserts a hard 500ms wall-clock bound that shared GitHub runners can't reliably hit (636ms on that run; ~200ms locally). The same class of failure hitAgentModelsTab.test.tsxon #209's CI earlier the same day (toastwaitFortimed out at 5082ms vs its 5000ms budget).What
test_fuzzy_match_10k_under_500ms<500ms<500mslocally /<1500mswhenCI=true; renamed totest_fuzzy_match_10k_perf_budgetAgentModelsTabpreset toastwaitFor5s, outer 15sThe perf bound is relaxed, not removed — it still catches the O(n²)-style algorithmic regressions it exists for.
Verification
pytest tests/test_graph_protocol.py -k fuzzy→ 9 passed; ruff format/check cleanvitest run AgentModelsTab.test.tsx→ 5 passed🤖 Generated with Claude Code