fix(server): clear _task_control on stop_task to prevent terminate#733
fix(server): clear _task_control on stop_task to prevent terminate#733dsapandora wants to merge 6 commits intodevelopfrom
Conversation
📝 WalkthroughWalkthrough
Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TaskServer
participant InnerTask
participant TaskRegistry
participant MonitorService
participant Dashboard
Note over Client,TaskServer: Client requests task stop
Client->>TaskServer: stop_task(token)
TaskServer->>InnerTask: await control.task.stop_task()
InnerTask-->>TaskServer: stopped
TaskServer->>TaskRegistry: remove_task(token)
TaskRegistry-->>TaskServer: removal confirmed
TaskServer->>MonitorService: unsubscribe(token)
MonitorService-->>TaskServer: unsubscribed
TaskServer->>Dashboard: emit event: task_removed(token)
Dashboard-->>TaskServer: ack
TaskServer-->>Client: stop acknowledged
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
|
No description provided. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/ai/src/ai/modules/task/task_server.py`:
- Around line 1318-1320: Instead of directly popping the registry entry with
self._task_control.pop(token, None) in the stop path, call the existing cleanup
routine remove_task(token) (or extract/pop logic into the shared helper used by
remove_task) so that monitor subscription pruning and the task_removed dashboard
event still execute; locate the stop code that currently uses
self._task_control.pop and replace it with a call to remove_task(token) (or the
new shared helper) to ensure consistent connection-monitor state and lifecycle
signaling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eadbc6bd-4c68-4665-8adf-30a7e06faafc
📒 Files selected for processing (2)
packages/ai/src/ai/modules/task/task_server.pypackages/client-typescript/tests/RocketRideClient.test.ts
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/client-typescript/tests/RocketRideClient.test.ts (1)
153-177:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnsure pipeline termination runs even if status polling/assertions fail.
await client.terminate(result.token)is currently after the polling + assertions. If the retry loop throws (server never returns state) or an assertion fails, cleanup won’t run, leaving the task/token around and potentially making subsequent integration tests flaky.Suggested change
- it('should get pipeline status', async () => { - const result = await client.use({ - pipeline: getEchoPipeline(), - token: PIPELINE_TOKEN, - }); - - // Retry a few times in case server is busy (tests may run in parallel) - const maxAttempts = 5; - const delayMs = 2000; - let status: Awaited<ReturnType<typeof client.getTaskStatus>> | null = null; - for (let attempt = 1; attempt <= maxAttempts; attempt++) { - try { - status = await client.getTaskStatus(result.token); - break; - } catch (e) { - if (attempt === maxAttempts) throw e; - await new Promise((r) => setTimeout(r, delayMs)); - } - } - - expect(status).toHaveProperty('state'); - expect(Object.values(TASK_STATE)).toContain(status!.state); - - await client.terminate(result.token); - }, TEST_CONFIG.timeout); + it( + 'should get pipeline status', + async () => { + const result = await client.use({ + pipeline: getEchoPipeline(), + token: PIPELINE_TOKEN, + }); + + try { + // Retry a few times in case server is busy (tests may run in parallel) + const maxAttempts = 5; + const delayMs = 2000; + let status: Awaited<ReturnType<typeof client.getTaskStatus>> | null = null; + for (let attempt = 1; attempt <= maxAttempts; attempt++) { + try { + status = await client.getTaskStatus(result.token); + break; + } catch (e) { + if (attempt === maxAttempts) throw e; + await new Promise((r) => setTimeout(r, delayMs)); + } + } + + expect(status).not.toBeNull(); + expect(status).toHaveProperty('state'); + expect(Object.values(TASK_STATE)).toContain(status!.state); + } finally { + // Best-effort cleanup so we don't leak tasks/tokens on failure + await client.terminate(result.token).catch(() => {}); + } + }, + TEST_CONFIG.timeout + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/client-typescript/tests/RocketRideClient.test.ts` around lines 153 - 177, Wrap the polling/assertion block in a try/finally so client.terminate(result.token) always runs: call client.use(...) to get result, then perform the retry loop and asserts inside try, and invoke client.terminate(result.token) in the finally block (ensuring result and result.token are in scope before the try). This guarantees termination even if client.getTaskStatus throws or assertions fail while still using the same result.token.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/client-typescript/tests/RocketRideClient.test.ts`:
- Around line 153-177: Wrap the polling/assertion block in a try/finally so
client.terminate(result.token) always runs: call client.use(...) to get result,
then perform the retry loop and asserts inside try, and invoke
client.terminate(result.token) in the finally block (ensuring result and
result.token are in scope before the try). This guarantees termination even if
client.getTaskStatus throws or assertions fail while still using the same
result.token.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0a6c76c3-8ff9-4c80-85c8-5e6fd2dcfb62
📒 Files selected for processing (1)
packages/client-typescript/tests/RocketRideClient.test.ts
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/ai/src/ai/modules/task/task_server.py (1)
1321-1323:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRoute stop-time removal through shared cleanup (or equivalent helper), not raw registry pop.
Line 1323 directly pops
_task_control, which skipsremove_task()side effects (monitor subscription pruning andtask_removeddashboard broadcast). That can leave stale monitor state and inconsistent lifecycle signaling after terminate.Suggested fix
- # Free the slot so a subsequent use() with the same token does - # not race against a phantom registry entry. - self._task_control.pop(token, None) + # Free the slot using shared cleanup to keep monitor state and + # dashboard lifecycle events consistent. + await self.remove_task(token)If double-stop is a concern here, extract a shared “post-stop cleanup” helper from
remove_task()and call that from both paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/ai/src/ai/modules/task/task_server.py` around lines 1321 - 1323, Instead of directly popping self._task_control in the terminate path, invoke the shared post-stop cleanup used by remove_task() (or extract such a helper from remove_task()) so that monitor subscription pruning and the task_removed dashboard broadcast still run; update the terminate/stop code that currently calls self._task_control.pop(token, None) to call the new helper (or remove_task(token) if appropriate) and ensure it is idempotent to handle double-stop calls safely while preserving monitor/state cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/ai/src/ai/modules/task/task_server.py`:
- Around line 1321-1323: Instead of directly popping self._task_control in the
terminate path, invoke the shared post-stop cleanup used by remove_task() (or
extract such a helper from remove_task()) so that monitor subscription pruning
and the task_removed dashboard broadcast still run; update the terminate/stop
code that currently calls self._task_control.pop(token, None) to call the new
helper (or remove_task(token) if appropriate) and ensure it is idempotent to
handle double-stop calls safely while preserving monitor/state cleanup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dd53a7da-6820-41cd-8848-37b32bc7d58c
📒 Files selected for processing (1)
packages/ai/src/ai/modules/task/task_server.py
Rod-Christensen
left a comment
There was a problem hiding this comment.
This is by design. Any task can restart but it removed 60 seconds after termination.
Summary
TaskServer.stop_task()did not remove the token from_task_controlafter stopping the task, leaving phantom registry entries. Any subsequentuse()reusing the same token races against the orphaned entry — manifesting as either a hang/timeout or a"Task has already completed"error from the server.To fix it, I route stop-time cleanup through the existing
remove_task()helper so the registry pop, monitor subscription pruning, andtask_removeddashboard broadcast all happen consistently — instead of callingcontrol.task.stop_task()directly and leaking the slot.This should resolve intermittent CI failures in TS/Python integration tests that reuse tokens across
terminate → usesequences (most visible on Windows/Linux runner under load).Also aligned the should get pipeline status test timeout (90s → TEST_CONFIG.timeout, 120s) to match the rest of the suite
Type
Testing
./builder testpassesChecklist
Linked Issue
Fixes #
Summary by CodeRabbit
Bug Fixes
Tests