Skip to content

test(ci): de-flake wall-clock-sensitive tests (fuzzy-match perf + AgentModelsTab toast)#213

Merged
alan5543 merged 1 commit into
mainfrom
fix/deflake-ci-timing-tests
Jun 1, 2026
Merged

test(ci): de-flake wall-clock-sensitive tests (fuzzy-match perf + AgentModelsTab toast)#213
alan5543 merged 1 commit into
mainfrom
fix/deflake-ci-timing-tests

Conversation

@alan5543
Copy link
Copy Markdown
Member

@alan5543 alan5543 commented Jun 1, 2026

Why

CI / Backend (Python 3.12) failed on the push to main after #211 merged — not because of #211, but because test_fuzzy_match_10k_under_500ms asserts a hard 500ms wall-clock bound that shared GitHub runners can't reliably hit (636ms on that run; ~200ms locally). The same class of failure hit AgentModelsTab.test.tsx on #209's CI earlier the same day (toast waitFor timed out at 5082ms vs its 5000ms budget).

What

Test Before After
test_fuzzy_match_10k_under_500ms single run, hard <500ms best-of-3 runs, <500ms locally / <1500ms when CI=true; renamed to test_fuzzy_match_10k_perf_budget
AgentModelsTab preset toast inner waitFor 5s, outer 15s inner 10s, outer 20s

The perf bound is relaxed, not removed — it still catches the O(n²)-style algorithmic regressions it exists for.

Verification

  • pytest tests/test_graph_protocol.py -k fuzzy → 9 passed; ruff format/check clean
  • vitest run AgentModelsTab.test.tsx → 5 passed

🤖 Generated with Claude Code

Two tests assert wall-clock timing and intermittently fail on shared
GitHub runners (both bit us on 2026-06-01):

- test_fuzzy_match_10k_under_500ms: hard 500ms bound, failed at 636ms on
  a push run to main with code that runs ~200ms locally. Now best-of-3
  runs (damps scheduler/CPU-frequency noise) with a 3x budget when
  CI=true. Renamed to test_fuzzy_match_10k_perf_budget. Still catches
  what it exists for: order-of-magnitude algorithmic regressions.

- AgentModelsTab "preset card" toast: inner waitFor 5000ms timeout failed
  at 5082ms on a PR run. Bumped to 10s inner / 20s outer.

Constraint: perf test must still catch O(n^2) regressions — bound relaxed, not removed
Rejected: skipping perf test on CI entirely | loses regression coverage where it matters most
Confidence: high
Scope-risk: narrow

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alan5543 alan5543 merged commit 30ed8d1 into main Jun 1, 2026
9 checks passed
@alan5543 alan5543 deleted the fix/deflake-ci-timing-tests branch June 1, 2026 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant