Skip to content

test(opencode): stabilize runner cancel tests#876

Open
Astro-Han wants to merge 6 commits into
devfrom
codex/fix-runner-cancel-test-race
Open

test(opencode): stabilize runner cancel tests#876
Astro-Han wants to merge 6 commits into
devfrom
codex/fix-runner-cancel-test-race

Conversation

@Astro-Han
Copy link
Copy Markdown
Owner

@Astro-Han Astro-Han commented May 23, 2026

Summary

  • Add bounded runner-state and Deferred synchronization helpers for runner tests.
  • Replace fixed sleep / yieldNow synchronization in cancel, shell, and queued-caller tests with explicit state and waiter handshakes.
  • Wait for cancellation tests' blocking work fiber to start before calling cancel, closing the CI-only timeouts exposed by follow-up runs.
  • Narrow the Deferred runtime waiter-count check to only the shared-run caller tests that have no public Effect hook for "second caller is waiting on the existing run".
  • No related issue; this is a CI flake follow-up from recent post-merge failures.

Why

Recent post-merge CI runs timed out in Runner > cancel with onInterrupt resolves callers gracefully. The original test forked ensureRunning(...), slept for 10ms, then cancelled. On a slow runner, cancel can run before the runner has entered Running, making cancel a no-op and leaving the Effect.never waiter to time out.

Review follow-up also pointed out that two queued-caller tests still used Effect.yieldNow as a scheduling hint. Those now wait for the shared run Deferred to have both waiters attached before the test advances.

Later PR CI runs exposed the same class of race one layer deeper. runner.state === "Running" proves the runner state was installed, but it does not prove the work fiber has actually started. runner.state === "Shell" has the same boundary for shell masking assertions: it proves shell state was installed, but not that the shell effect and interrupt finalizers have started. The affected tests now use a local blocked-work helper with a public Deferred start handshake before cancelling.

The remaining Deferred runtime waiter-count check is intentionally private to this test file and only used for the two shared-run caller assertions. Effect does not expose a public signal for "this second caller is now awaiting the same run Deferred", and replacing that proof with sleep/yield would weaken the tests again.

Related Issue

None.

Human Review Status

Pending

Review Focus

Please check that the test helpers stay local to runner test synchronization and that the queued-caller, cancel, and shell masking assertions now wait for the relevant caller/work fiber to attach before cancel/release.

Risk Notes

No product behavior risk; this is test-only. Skipped conditional checklist items: visible UI or copy check because no visible UI or copy changed; platform/packaging impact because no runtime platform surface changed; docs/release/dependencies/permissions/generated-content checks because none of those surfaces were touched.

How To Verify

RED proof: bun --cwd packages/opencode -e '...' confirmed cancel-before-running leaves the waiter timing out.
Focused runner tests: cd packages/opencode && bun test test/effect/runner.test.ts --timeout 30000 -> 30 pass, 0 fail.
Focused runner repeat with CI Bun: for i in {1..100}; do /tmp/pawwork-bun-1.3.13/bun-darwin-aarch64/bun test test/effect/runner.test.ts --timeout 30000; done -> 100/100 runs passed.
Typecheck: cd packages/opencode && bun run typecheck -> passed.
Opencode CI subset with CI Bun: PATH=/tmp/pawwork-bun-1.3.13/bun-darwin-aarch64:$PATH bun turbo test:ci --filter=opencode -> 3071 pass, 9 skip, 1 todo, 0 fail.
Diff check: git diff --check -> no whitespace errors.

Screenshots or Recordings

Not applicable; no visible UI changes.

Checklist

How to use this checklist:

  • Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.
  • The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.
  • Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.
  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.
  • I described the review focus and any meaningful risks.
  • I replaced the example block in How To Verify with the real verification steps and the key result for each.
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
  • (conditional) I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
  • (conditional) I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
  • (conditional) I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
  • I reviewed the final diff for unrelated changes and suspicious dependency changes.
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English.

@Astro-Han Astro-Han added bug Something isn't working flaky-test Non-deterministic test failure P3 Low priority harness Model harness, prompts, tool descriptions, and session mechanics labels May 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

Warning

Review limit reached

@Astro-Han, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 25 minutes and 6 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9318ced1-c24e-4b80-8c11-071284b4cbcd

📥 Commits

Reviewing files that changed from the base of the PR and between d9d052e and c9fd387.

📒 Files selected for processing (1)
  • packages/opencode/test/effect/runner.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-runner-cancel-test-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P3 (only low-risk paths changed (packages/opencode/test/effect/runner.test.ts)).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the reliability of the Runner tests by replacing fixed sleep durations with a polling helper function, waitForRunnerState, which waits for the runner to reach a specific state. Feedback suggests refactoring this helper to use idiomatic Effect services, specifically replacing Date.now() with Effect.currentTimeMillis and using Effect.dieMessage for timeout errors to ensure consistency and better testability.

Comment thread packages/opencode/test/effect/runner.test.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working flaky-test Non-deterministic test failure harness Model harness, prompts, tool descriptions, and session mechanics P3 Low priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant