Skip to content

fix: increase default timeout values to prevent premature agent termination#3100

Open
titet11 wants to merge 2 commits intocode-yeongyu:devfrom
titet11:fix/increase-default-timeouts
Open

fix: increase default timeout values to prevent premature agent termination#3100
titet11 wants to merge 2 commits intocode-yeongyu:devfrom
titet11:fix/increase-default-timeouts

Conversation

@titet11
Copy link
Copy Markdown

@titet11 titet11 commented Apr 3, 2026

Problem

Background agents running complex, long-duration tasks (deep code analysis, multi-file refactoring, architecture reviews) are silently terminated by multiple timeout mechanisms before they can complete their work. This is especially problematic for orchestration-heavy setups where agents like Oracle or Hephaestus routinely need 30+ minutes to finish.

The current defaults are too aggressive for real-world multi-agent workflows:

Constant Current Default Effect
TASK_TTL_MS 30 min Task marked as error after 30 min
TERMINAL_TASK_TTL_MS 30 min Terminal tasks pruned after 30 min
SESSION_TTL_MS 30 min Session state cleaned up after 30 min of inactivity
DEFAULT_POLL_TIMEOUT_MS 30 min Sync polling gives up after 30 min
DEFAULT_STALE_TIMEOUT_MS 45 min Task considered stale after 45 min without progress
DEFAULT_MESSAGE_STALENESS_TIMEOUT_MS 60 min Task considered stale after 60 min from start
DEFERRED_SESSION_TTL_MS 5 min Queued sessions waiting for concurrency slots dropped after 5 min

Root Cause

These timeouts were designed for short-lived tasks, but multi-agent orchestration introduces workloads that routinely exceed these limits. An Oracle agent performing deep diagnostic analysis can take 1-2 hours. A Hephaestus agent implementing complex changes across multiple files can take similar time. When these timeouts fire, the agent's work is silently lost — there is no error message, no recovery, just truncated output.

The DEFERRED_SESSION_TTL_MS at 5 minutes is particularly problematic: when all concurrency slots are full, new tasks are queued as "deferred sessions." If slots don't free up within 5 minutes, the queued task is silently dropped. In heavy orchestration scenarios with multiple agents, this happens frequently.

Solution

Increase all timeout defaults to 100 hours (360,000,000 ms) to effectively disable premature termination of long-running agent workloads. The DEFERRED_SESSION_TTL_MS is increased to 1 hour (3,600,000 ms) to give queued tasks reasonable time to acquire a concurrency slot.

All values remain user-configurable via oh-my-opencode.jsonc under the background_task section, so users who prefer shorter timeouts can still set them.

Changes

src/features/background-agent/constants.ts

  • TASK_TTL_MS: 30 * 60 * 1000100 * 60 * 60 * 1000 (30 min → 100 hours)
  • TERMINAL_TASK_TTL_MS: 30 * 60 * 1000100 * 60 * 60 * 1000 (30 min → 100 hours)
  • DEFAULT_STALE_TIMEOUT_MS: 2_700_000360_000_000 (45 min → 100 hours)
  • DEFAULT_MESSAGE_STALENESS_TIMEOUT_MS: 3_600_000360_000_000 (60 min → 100 hours)

src/hooks/runtime-fallback/auto-retry.ts

  • SESSION_TTL_MS: 30 * 60 * 1000100 * 60 * 60 * 1000 (30 min → 100 hours)

src/tools/delegate-task/timing.ts

  • DEFAULT_POLL_TIMEOUT_MS: 30 * 60 * 1000100 * 60 * 60 * 1000 (30 min → 100 hours)

src/features/tmux-subagent/manager.ts

  • DEFERRED_SESSION_TTL_MS: 5 * 60 * 100060 * 60 * 1000 (5 min → 1 hour)

Why these values?

  • 100 hours effectively removes the timeout as a constraint — agents will only terminate when they genuinely complete their work or are explicitly cancelled by the user. This aligns with the principle that the orchestrator (Sisyphus) should decide when an agent is done, not an arbitrary timer.
  • 1 hour for deferred sessions gives queued tasks ample time to acquire slots even under heavy concurrency pressure.
  • All values remain overridable via user config (background_task.taskTtlMs, background_task.staleTimeoutMs, etc.), so users who prefer shorter timeouts can still set them explicitly.

These changes apply to version v3.14.0 of the source code.

Testing

  • Verified that all modified constants are the defaults used when no user config override is present.
  • The config schema (background-task.ts) already supports user overrides for all these values — no schema changes needed.
  • Users who prefer shorter timeouts can set them in their oh-my-opencode.jsonc config file.
  • Production-tested: these exact values have been running in a patched dist/index.js for multiple days with zero premature agent terminations.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Requires human review: There is a discrepancy between the PR description (claiming 100 hours) and the actual code changes (setting values to 6 hours), and increased TTLs may cause resource retention regressions.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/tools/delegate-task/timing.ts">

<violation number="1" location="src/tools/delegate-task/timing.ts:6">
P2: This sets the default poll timeout to 100 hours, which is inconsistent with the intended 6-hour defaults and will keep sync polling alive far longer than expected. If this is a typo, reduce it to 6 hours to match the rest of the change.</violation>
</file>

<file name="src/hooks/runtime-fallback/auto-retry.ts">

<violation number="1" location="src/hooks/runtime-fallback/auto-retry.ts:14">
P2: SESSION_TTL_MS is now 100 hours, which is far higher than the 6-hour defaults described elsewhere in this change. If the intent is to align with the 6-hour default, this will retain stale session state much longer than intended.</violation>
</file>

<file name="src/features/background-agent/constants.ts">

<violation number="1" location="src/features/background-agent/constants.ts:4">
P2: These timeout constants are set to 100 hours, which doesn’t match the intended 6‑hour defaults in the PR description. This will make stale/TTL cleanup 16× longer than expected.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

let WAIT_FOR_SESSION_INTERVAL_MS = 100
let WAIT_FOR_SESSION_TIMEOUT_MS = 60000
const DEFAULT_POLL_TIMEOUT_MS = 30 * 60 * 1000
const DEFAULT_POLL_TIMEOUT_MS = 100 * 60 * 60 * 1000
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This sets the default poll timeout to 100 hours, which is inconsistent with the intended 6-hour defaults and will keep sync polling alive far longer than expected. If this is a typo, reduce it to 6 hours to match the rest of the change.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/delegate-task/timing.ts, line 6:

<comment>This sets the default poll timeout to 100 hours, which is inconsistent with the intended 6-hour defaults and will keep sync polling alive far longer than expected. If this is a typo, reduce it to 6 hours to match the rest of the change.</comment>

<file context>
@@ -3,7 +3,7 @@ let MIN_STABILITY_TIME_MS = 10000
 let WAIT_FOR_SESSION_INTERVAL_MS = 100
 let WAIT_FOR_SESSION_TIMEOUT_MS = 60000
-const DEFAULT_POLL_TIMEOUT_MS = 6 * 60 * 60 * 1000
+const DEFAULT_POLL_TIMEOUT_MS = 100 * 60 * 60 * 1000
 let MAX_POLL_TIME_MS = DEFAULT_POLL_TIMEOUT_MS
 let SESSION_CONTINUATION_STABILITY_MS = 5000
</file context>
Suggested change
const DEFAULT_POLL_TIMEOUT_MS = 100 * 60 * 60 * 1000
const DEFAULT_POLL_TIMEOUT_MS = 6 * 60 * 60 * 1000
Fix with Cubic

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — same as the other constants. The first commit had incorrect 6-hour values, corrected in the second commit to the intended 100 hours. The PR description has been updated accordingly.

import { getAgentDisplayName } from "../../shared/agent-display-names"

const SESSION_TTL_MS = 30 * 60 * 1000
const SESSION_TTL_MS = 100 * 60 * 60 * 1000
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: SESSION_TTL_MS is now 100 hours, which is far higher than the 6-hour defaults described elsewhere in this change. If the intent is to align with the 6-hour default, this will retain stale session state much longer than intended.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/hooks/runtime-fallback/auto-retry.ts, line 14:

<comment>SESSION_TTL_MS is now 100 hours, which is far higher than the 6-hour defaults described elsewhere in this change. If the intent is to align with the 6-hour default, this will retain stale session state much longer than intended.</comment>

<file context>
@@ -11,7 +11,7 @@ import { getLastUserRetryParts } from "./last-user-retry-parts"
 import { getAgentDisplayName } from "../../shared/agent-display-names"
 
-const SESSION_TTL_MS = 6 * 60 * 60 * 1000
+const SESSION_TTL_MS = 100 * 60 * 60 * 1000
 
 declare function setTimeout(callback: () => void | Promise<void>, delay?: number): RuntimeFallbackTimeout
</file context>
Suggested change
const SESSION_TTL_MS = 100 * 60 * 60 * 1000
const SESSION_TTL_MS = 6 * 60 * 60 * 1000
Fix with Cubic

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — same reasoning as the other constants. 100 hours ensures session state is never cleaned up while an agent is still working. The first commit had incorrect 6-hour values which were corrected in the follow-up commit. The PR description reflects the correct 100-hour target.


export const TASK_TTL_MS = 30 * 60 * 1000
export const TERMINAL_TASK_TTL_MS = 30 * 60 * 1000
export const TASK_TTL_MS = 100 * 60 * 60 * 1000
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: These timeout constants are set to 100 hours, which doesn’t match the intended 6‑hour defaults in the PR description. This will make stale/TTL cleanup 16× longer than expected.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/constants.ts, line 4:

<comment>These timeout constants are set to 100 hours, which doesn’t match the intended 6‑hour defaults in the PR description. This will make stale/TTL cleanup 16× longer than expected.</comment>

<file context>
@@ -1,11 +1,11 @@
 
-export const TASK_TTL_MS = 6 * 60 * 60 * 1000
-export const TERMINAL_TASK_TTL_MS = 6 * 60 * 60 * 1000
+export const TASK_TTL_MS = 100 * 60 * 60 * 1000
+export const TERMINAL_TASK_TTL_MS = 100 * 60 * 60 * 1000
 export const MIN_STABILITY_TIME_MS = 10 * 1000
</file context>
Suggested change
export const TASK_TTL_MS = 100 * 60 * 60 * 1000
export const TASK_TTL_MS = 6 * 60 * 60 * 1000
Fix with Cubic

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, not a typo. The first commit incorrectly used 6-hour values, which was fixed in the second commit to the correct 100-hour values (360,000,000 ms). The PR description has been updated to reflect this.

100 hours is the intended value — the goal is to effectively disable these timeouts so that agents are never prematurely terminated by an arbitrary timer. The orchestrator (parent agent) is responsible for deciding when a task is done, not a timeout. All values remain user-configurable via oh-my-opencode.jsonc for users who prefer shorter timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant