fix(server): clear _task_control on stop_task to prevent terminate by dsapandora · Pull Request #733 · rocketride-org/rocketride-server

dsapandora · 2026-04-30T20:45:54Z

Summary

TaskServer.stop_task() did not remove the token from _task_control after stopping the task, leaving phantom registry entries. Any subsequent use() reusing the same token races against the orphaned entry — manifesting as either a hang/timeout or a "Task has already completed" error from the server.

To fix it, I route stop-time cleanup through the existing remove_task() helper so the registry pop, monitor subscription pruning, and task_removed dashboard broadcast all happen consistently — instead of calling control.task.stop_task() directly and leaking the slot.

This should resolve intermittent CI failures in TS/Python integration tests that reuse tokens across terminate → use sequences (most visible on Windows/Linux runner under load).

Also aligned the should get pipeline status test timeout (90s → TEST_CONFIG.timeout, 120s) to match the rest of the suite

Type

Testing

Tests added or updated
Tested locally
./builder test passes

Checklist

Commit messages follow conventional commits
No secrets or credentials included
Wiki updated (if applicable)
Breaking changes documented (if applicable)

Linked Issue

Fixes #

Summary by CodeRabbit

Bug Fixes
- Task termination now fully removes stopped tasks from the server registry immediately, avoiding phantom entries and race conditions and ensuring monitors/cleanup run as expected.
Tests
- Integration test timeout switched to the shared test configuration for more consistent, reliable timing across environments.

…e race

coderabbitai · 2026-04-30T20:46:06Z

📝 Walkthrough

Walkthrough

stop_task() for LAUNCH/EXECUTE tasks now awaits control.task.stop_task() and then explicitly removes the task’s control record from the server registry via self.remove_task(token), ensuring the registry entry is cleared. A TypeScript test timeout was switched to use TEST_CONFIG.timeout instead of a hardcoded value.

Changes

Cohort / File(s)	Summary
Task Server change `packages/ai/src/ai/modules/task/task_server.py`	For `LAUNCH`/`EXECUTE` tasks, `stop_task()` now calls `await control.task.stop_task()` then `self.remove_task(token)`, removing the control record from the server registry (unregisters task, unsubscribes monitors, emits `task_removed`).
Test timeout refactor `packages/client-typescript/tests/RocketRideClient.test.ts`	Replaced hardcoded Jest timeout (`90000`) with shared `TEST_CONFIG.timeout`; test behavior unchanged.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TaskServer
    participant InnerTask
    participant TaskRegistry
    participant MonitorService
    participant Dashboard

    Note over Client,TaskServer: Client requests task stop
    Client->>TaskServer: stop_task(token)
    TaskServer->>InnerTask: await control.task.stop_task()
    InnerTask-->>TaskServer: stopped
    TaskServer->>TaskRegistry: remove_task(token)
    TaskRegistry-->>TaskServer: removal confirmed
    TaskServer->>MonitorService: unsubscribe(token)
    MonitorService-->>TaskServer: unsubscribed
    TaskServer->>Dashboard: emit event: task_removed(token)
    Dashboard-->>TaskServer: ack
    TaskServer-->>Client: stop acknowledged

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

module:client-ts

Suggested reviewers

jmaionchi
stepmikhaylov

Poem

🐇 I stopped the inner hare with care,

then nudged its record from the lair.
Monitors quiet, dashboard sings,
I hop off light on nimble springs. 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(server): clear _task_control on stop_task to prevent terminate' directly describes the main change: removing task control records during stop_task to prevent issues.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/stop-task-cleanup-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-30T20:46:06Z

No description provided.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/ai/src/ai/modules/task/task_server.py`:
- Around line 1318-1320: Instead of directly popping the registry entry with
self._task_control.pop(token, None) in the stop path, call the existing cleanup
routine remove_task(token) (or extract/pop logic into the shared helper used by
remove_task) so that monitor subscription pruning and the task_removed dashboard
event still execute; locate the stop code that currently uses
self._task_control.pop and replace it with a call to remove_task(token) (or the
new shared helper) to ensure consistent connection-monitor state and lifecycle
signaling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eadbc6bd-4c68-4665-8adf-30a7e06faafc

📥 Commits

Reviewing files that changed from the base of the PR and between a665f46 and 68a86f5.

📒 Files selected for processing (2)

packages/ai/src/ai/modules/task/task_server.py
packages/client-typescript/tests/RocketRideClient.test.ts

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/client-typescript/tests/RocketRideClient.test.ts (1)

153-177: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure pipeline termination runs even if status polling/assertions fail.

await client.terminate(result.token) is currently after the polling + assertions. If the retry loop throws (server never returns state) or an assertion fails, cleanup won’t run, leaving the task/token around and potentially making subsequent integration tests flaky.

Suggested change

-		it('should get pipeline status', async () => {
-			const result = await client.use({
-				pipeline: getEchoPipeline(),
-				token: PIPELINE_TOKEN,
-			});
-
-			// Retry a few times in case server is busy (tests may run in parallel)
-			const maxAttempts = 5;
-			const delayMs = 2000;
-			let status: Awaited<ReturnType<typeof client.getTaskStatus>> | null = null;
-			for (let attempt = 1; attempt <= maxAttempts; attempt++) {
-				try {
-					status = await client.getTaskStatus(result.token);
-					break;
-				} catch (e) {
-					if (attempt === maxAttempts) throw e;
-					await new Promise((r) => setTimeout(r, delayMs));
-				}
-			}
-
-			expect(status).toHaveProperty('state');
-			expect(Object.values(TASK_STATE)).toContain(status!.state);
-
-			await client.terminate(result.token);
-		}, TEST_CONFIG.timeout);
+		it(
+			'should get pipeline status',
+			async () => {
+				const result = await client.use({
+					pipeline: getEchoPipeline(),
+					token: PIPELINE_TOKEN,
+				});
+
+				try {
+					// Retry a few times in case server is busy (tests may run in parallel)
+					const maxAttempts = 5;
+					const delayMs = 2000;
+					let status: Awaited<ReturnType<typeof client.getTaskStatus>> | null = null;
+					for (let attempt = 1; attempt <= maxAttempts; attempt++) {
+						try {
+							status = await client.getTaskStatus(result.token);
+							break;
+						} catch (e) {
+							if (attempt === maxAttempts) throw e;
+							await new Promise((r) => setTimeout(r, delayMs));
+						}
+					}
+
+					expect(status).not.toBeNull();
+					expect(status).toHaveProperty('state');
+					expect(Object.values(TASK_STATE)).toContain(status!.state);
+				} finally {
+					// Best-effort cleanup so we don't leak tasks/tokens on failure
+					await client.terminate(result.token).catch(() => {});
+				}
+			},
+			TEST_CONFIG.timeout
+		);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/client-typescript/tests/RocketRideClient.test.ts` around lines 153 -
177, Wrap the polling/assertion block in a try/finally so
client.terminate(result.token) always runs: call client.use(...) to get result,
then perform the retry loop and asserts inside try, and invoke
client.terminate(result.token) in the finally block (ensuring result and
result.token are in scope before the try). This guarantees termination even if
client.getTaskStatus throws or assertions fail while still using the same
result.token.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/client-typescript/tests/RocketRideClient.test.ts`:
- Around line 153-177: Wrap the polling/assertion block in a try/finally so
client.terminate(result.token) always runs: call client.use(...) to get result,
then perform the retry loop and asserts inside try, and invoke
client.terminate(result.token) in the finally block (ensuring result and
result.token are in scope before the try). This guarantees termination even if
client.getTaskStatus throws or assertions fail while still using the same
result.token.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0a6c76c3-8ff9-4c80-85c8-5e6fd2dcfb62

📥 Commits

Reviewing files that changed from the base of the PR and between 093decd and 0279d0f.

📒 Files selected for processing (1)

packages/client-typescript/tests/RocketRideClient.test.ts

coderabbitai

♻️ Duplicate comments (1)

packages/ai/src/ai/modules/task/task_server.py (1)
1321-1323: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Route stop-time removal through shared cleanup (or equivalent helper), not raw registry pop.

Line 1323 directly pops _task_control, which skips remove_task() side effects (monitor subscription pruning and task_removed dashboard broadcast). That can leave stale monitor state and inconsistent lifecycle signaling after terminate.
Suggested fix
-                # Free the slot so a subsequent use() with the same token does
-                # not race against a phantom registry entry.
-                self._task_control.pop(token, None)
+                # Free the slot using shared cleanup to keep monitor state and
+                # dashboard lifecycle events consistent.
+                await self.remove_task(token)
If double-stop is a concern here, extract a shared “post-stop cleanup” helper from remove_task() and call that from both paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ai/src/ai/modules/task/task_server.py` around lines 1321 - 1323,
Instead of directly popping self._task_control in the terminate path, invoke the
shared post-stop cleanup used by remove_task() (or extract such a helper from
remove_task()) so that monitor subscription pruning and the task_removed
dashboard broadcast still run; update the terminate/stop code that currently
calls self._task_control.pop(token, None) to call the new helper (or
remove_task(token) if appropriate) and ensure it is idempotent to handle
double-stop calls safely while preserving monitor/state cleanup.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/ai/src/ai/modules/task/task_server.py`:
- Around line 1321-1323: Instead of directly popping self._task_control in the
terminate path, invoke the shared post-stop cleanup used by remove_task() (or
extract such a helper from remove_task()) so that monitor subscription pruning
and the task_removed dashboard broadcast still run; update the terminate/stop
code that currently calls self._task_control.pop(token, None) to call the new
helper (or remove_task(token) if appropriate) and ensure it is idempotent to
handle double-stop calls safely while preserving monitor/state cleanup.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd53a7da-6820-41cd-8848-37b32bc7d58c

📥 Commits

Reviewing files that changed from the base of the PR and between 0279d0f and 4c7798e.

📒 Files selected for processing (1)

packages/ai/src/ai/modules/task/task_server.py

Rod-Christensen

This is by design. Any task can restart but it removed 60 seconds after termination.

fix(server): clear _task_control on stop_task to prevent terminate→us…

68a86f5

…e race

dsapandora requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners April 30, 2026 20:45

github-actions Bot added module:ai AI/ML modules module:client-typescript labels Apr 30, 2026

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread packages/ai/src/ai/modules/task/task_server.py Outdated

dsapandora added 2 commits April 30, 2026 23:12

fix(server): clear _task_control on stop_task to prevent terminate

093decd

fix(server): clear _task_control on stop_task to prevent terminate

0279d0f

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

ci(test): run builder test sequentially to isolate cross-suite flakes

4c7798e

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

kwit75 added 2 commits May 1, 2026 14:16

Merge branch 'develop' into fix/stop-task-cleanup-race

713458f

Merge branch 'develop' into fix/stop-task-cleanup-race

66210b2

Rod-Christensen reviewed May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): clear _task_control on stop_task to prevent terminate#733

fix(server): clear _task_control on stop_task to prevent terminate#733
dsapandora wants to merge 6 commits intodevelopfrom
fix/stop-task-cleanup-race

dsapandora commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Rod-Christensen left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dsapandora commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type

Testing

Checklist

Linked Issue

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Rod-Christensen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dsapandora commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading