Skip to content

fix: make concurrency key per-task to prevent stream collision between same-type background agents#3099

Open
titet11 wants to merge 1 commit intocode-yeongyu:devfrom
titet11:fix/concurrency-key-collision
Open

fix: make concurrency key per-task to prevent stream collision between same-type background agents#3099
titet11 wants to merge 1 commit intocode-yeongyu:devfrom
titet11:fix/concurrency-key-collision

Conversation

@titet11
Copy link
Copy Markdown

@titet11 titet11 commented Apr 3, 2026

Summary

When launching 2+ background agents of the same type simultaneously (e.g., two Oracle agents), the first agent's stream is always silently killed mid-word during the thinking/reasoning process. The second agent completes normally. This only happens with same-type agents — launching agents of different types (e.g., Oracle + Explore) works perfectly.

Tested against version v3.14.0 (compiled dist/index.js).

Root Cause

getConcurrencyKeyFromInput() in src/features/background-agent/manager.ts (line 629) generates the concurrency key based on model identity (providerID/modelID):

// BEFORE (bug)
private getConcurrencyKeyFromInput(input: LaunchInput): string {
  if (input.model) {
    return `${input.model.providerID}/${input.model.modelID}`
  }
  return input.agent
}

Two agents of the same type resolve to the same model, producing identical concurrency keys. This causes them to:

  1. Share the same queue in queuesByKey
  2. Share the same concurrency slots in ConcurrencyManager
  3. Route through the same provider endpoint simultaneously

The shared concurrency key creates a collision at the provider/transport level — when the second agent's streaming request arrives at the same endpoint, the first agent's active stream is silently terminated (no error, no abort signal — the stream just stops mid-token).

Why it only affects same-type agents

Scenario Agent 1 Key Agent 2 Key Collision?
2× Oracle provider/model-X provider/model-X YES — same key
Oracle + Explore provider/model-X provider/model-Y NO — different keys

Different agent types resolve to different models, producing independent concurrency keys. Same-type agents resolve to the same model, creating the collision.

Fix

Append the unique taskId to the concurrency key, ensuring every task gets its own independent concurrency lane:

// AFTER (fix)
private getConcurrencyKeyFromInput(input: LaunchInput): string {
  if (input.model) {
    return `${input.model.providerID}/${input.model.modelID}/${input.taskId}`
  }
  return `${input.agent}/${input.taskId}`
}

This eliminates the shared-key collision while preserving the ConcurrencyManager semaphore architecture (which still enforces per-model limits via the providerID/modelID prefix).

Testing

  • Before fix: Launched 2 Oracle agents simultaneously → first agent always died mid-thinking (stream cut at arbitrary word, no error). Reproducible 100% of the time.
  • After fix: Launched 2 Oracle agents simultaneously → both agents completed successfully. Tested multiple times with no recurrence.

Additional Findings (not included in this PR)

During the investigation of this bug, several other timeout/kill mechanisms were identified in dist/index.js (v3.14.0) that can prematurely terminate background agents. These are documented here for awareness:

Timeout constants that are too aggressive for long-running agents

Constant File (source) Default Value Issue
TASK_TTL_MS constants.ts 30 min Tasks running longer than 30 minutes are pruned as stale
TERMINAL_TASK_TTL_MS constants.ts 30 min Completed/failed tasks cleaned up after 30 min
SESSION_TTL_MS auto-retry.ts 30 min Session-level TTL for retry tracking
DEFAULT_POLL_TIMEOUT_MS timing.ts 30 min Sync task polling timeout
DEFERRED_SESSION_TTL_MS tmux-subagent/manager.ts 5 min Queued sessions that can't connect are dropped after 5 min
DEFAULT_STALE_TIMEOUT_MS constants.ts 45 min Tasks with no activity for 45 min are killed
DEFAULT_MESSAGE_STALENESS_TIMEOUT_MS constants.ts 60 min Tasks with no new messages for 60 min are killed

These timeouts are reasonable for typical usage but can cause issues with agents that perform deep analysis (Oracle, Metis) or complex multi-step implementations (Hephaestus) that may run for extended periods. Users can override most of these via oh-my-opencode.json config under background_task.

Circuit breaker maxToolCalls default

The DEFAULT_MAX_TOOL_CALLS constant (4000) is the upper bound, but users may inadvertently configure a much lower value (e.g., 200) in their oh-my-opencode.json under background_task.maxToolCalls. At ~2.8 seconds per tool call, a value of 200 kills agents after approximately 9 minutes. The default of 4000 is appropriate; the issue is that there's no validation or warning when users set this too low.


Applies to: v3.14.0 (dist/index.js line 629 in compiled output, src/features/background-agent/manager.ts line 629 in source)


Summary by cubic

Make concurrency keys unique per task to prevent stream collisions when running multiple same-type background agents. This stops the first agent’s stream from being cut mid-output and lets concurrent runs finish.

  • Bug Fixes
    • Updated getConcurrencyKeyFromInput to append taskId for both model and agent keys (providerID/modelID/taskId and agent/taskId).
    • Eliminates shared queues/slots that killed streams for same-type agents, while keeping per-model limits via the prefix.

Written for commit a4c598b. Summary will update on new commits.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

All contributors have signed the CLA. Thank you! ✅
Posted by the CLA Assistant Lite bot.

@titet11
Copy link
Copy Markdown
Author

titet11 commented Apr 3, 2026

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Apr 3, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Confidence score: 3/5

  • There is a concrete regression risk in src/features/background-agent/manager.ts: appending taskId to the concurrency key prevents ConcurrencyManager from matching modelConcurrency entries, which can effectively bypass intended per-model throttling.
  • Given the medium severity (6/10) and solid confidence (7/10), this is more than a minor cleanup issue and could impact runtime behavior under load, so merge risk is moderate until corrected.
  • Pay close attention to src/features/background-agent/manager.ts - concurrency key construction appears to break per-model limit lookups and task counting.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:631">
P2: Appending taskId to the concurrency key breaks per-model concurrency limits because ConcurrencyManager uses the full key string for limit lookups and counts, so modelConcurrency entries no longer match and each task gets its own slot.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

private getConcurrencyKeyFromInput(input: LaunchInput): string {
if (input.model) {
return `${input.model.providerID}/${input.model.modelID}`
return `${input.model.providerID}/${input.model.modelID}/${input.taskId}`
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Appending taskId to the concurrency key breaks per-model concurrency limits because ConcurrencyManager uses the full key string for limit lookups and counts, so modelConcurrency entries no longer match and each task gets its own slot.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/manager.ts, line 631:

<comment>Appending taskId to the concurrency key breaks per-model concurrency limits because ConcurrencyManager uses the full key string for limit lookups and counts, so modelConcurrency entries no longer match and each task gets its own slot.</comment>

<file context>
@@ -628,9 +628,9 @@ export class BackgroundManager {
   private getConcurrencyKeyFromInput(input: LaunchInput): string {
     if (input.model) {
-      return `${input.model.providerID}/${input.model.modelID}`
+      return `${input.model.providerID}/${input.model.modelID}/${input.taskId}`
     }
-    return input.agent
</file context>
Fix with Cubic

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right that appending taskId to the concurrency key effectively bypasses per-model concurrency limits set via modelConcurrency, since each task now gets its own unique key and therefore its own slot.

However, the current behavior (using only providerID/modelID as the key) causes a critical bug: when two or more background agents share the same model, they end up sharing the same concurrency lane. This leads to provider-level stream collisions where the first agent's inference stream is silently killed mid-response when the second agent starts streaming. The result is truncated/corrupted output with no error — the agent just stops producing tokens.

This fix prioritizes correctness (each agent gets an independent stream) over concurrency limiting. I'm aware this is a tradeoff and I don't currently have a solution that preserves both behaviors — independent streams per task AND per-model concurrency limits.

A proper fix would likely require separating the two concerns: one key for stream isolation (per-task) and a separate mechanism for concurrency counting (per-model). I'd welcome suggestions from the maintainers on how to best approach this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant