fix: increase default timeout values to prevent premature agent termination#3100
fix: increase default timeout values to prevent premature agent termination#3100titet11 wants to merge 2 commits intocode-yeongyu:devfrom
Conversation
There was a problem hiding this comment.
No issues found across 4 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Requires human review: There is a discrepancy between the PR description (claiming 100 hours) and the actual code changes (setting values to 6 hours), and increased TTLs may cause resource retention regressions.
There was a problem hiding this comment.
3 issues found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/tools/delegate-task/timing.ts">
<violation number="1" location="src/tools/delegate-task/timing.ts:6">
P2: This sets the default poll timeout to 100 hours, which is inconsistent with the intended 6-hour defaults and will keep sync polling alive far longer than expected. If this is a typo, reduce it to 6 hours to match the rest of the change.</violation>
</file>
<file name="src/hooks/runtime-fallback/auto-retry.ts">
<violation number="1" location="src/hooks/runtime-fallback/auto-retry.ts:14">
P2: SESSION_TTL_MS is now 100 hours, which is far higher than the 6-hour defaults described elsewhere in this change. If the intent is to align with the 6-hour default, this will retain stale session state much longer than intended.</violation>
</file>
<file name="src/features/background-agent/constants.ts">
<violation number="1" location="src/features/background-agent/constants.ts:4">
P2: These timeout constants are set to 100 hours, which doesn’t match the intended 6‑hour defaults in the PR description. This will make stale/TTL cleanup 16× longer than expected.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| let WAIT_FOR_SESSION_INTERVAL_MS = 100 | ||
| let WAIT_FOR_SESSION_TIMEOUT_MS = 60000 | ||
| const DEFAULT_POLL_TIMEOUT_MS = 30 * 60 * 1000 | ||
| const DEFAULT_POLL_TIMEOUT_MS = 100 * 60 * 60 * 1000 |
There was a problem hiding this comment.
P2: This sets the default poll timeout to 100 hours, which is inconsistent with the intended 6-hour defaults and will keep sync polling alive far longer than expected. If this is a typo, reduce it to 6 hours to match the rest of the change.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/delegate-task/timing.ts, line 6:
<comment>This sets the default poll timeout to 100 hours, which is inconsistent with the intended 6-hour defaults and will keep sync polling alive far longer than expected. If this is a typo, reduce it to 6 hours to match the rest of the change.</comment>
<file context>
@@ -3,7 +3,7 @@ let MIN_STABILITY_TIME_MS = 10000
let WAIT_FOR_SESSION_INTERVAL_MS = 100
let WAIT_FOR_SESSION_TIMEOUT_MS = 60000
-const DEFAULT_POLL_TIMEOUT_MS = 6 * 60 * 60 * 1000
+const DEFAULT_POLL_TIMEOUT_MS = 100 * 60 * 60 * 1000
let MAX_POLL_TIME_MS = DEFAULT_POLL_TIMEOUT_MS
let SESSION_CONTINUATION_STABILITY_MS = 5000
</file context>
| const DEFAULT_POLL_TIMEOUT_MS = 100 * 60 * 60 * 1000 | |
| const DEFAULT_POLL_TIMEOUT_MS = 6 * 60 * 60 * 1000 |
There was a problem hiding this comment.
Intentional — same as the other constants. The first commit had incorrect 6-hour values, corrected in the second commit to the intended 100 hours. The PR description has been updated accordingly.
| import { getAgentDisplayName } from "../../shared/agent-display-names" | ||
|
|
||
| const SESSION_TTL_MS = 30 * 60 * 1000 | ||
| const SESSION_TTL_MS = 100 * 60 * 60 * 1000 |
There was a problem hiding this comment.
P2: SESSION_TTL_MS is now 100 hours, which is far higher than the 6-hour defaults described elsewhere in this change. If the intent is to align with the 6-hour default, this will retain stale session state much longer than intended.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/hooks/runtime-fallback/auto-retry.ts, line 14:
<comment>SESSION_TTL_MS is now 100 hours, which is far higher than the 6-hour defaults described elsewhere in this change. If the intent is to align with the 6-hour default, this will retain stale session state much longer than intended.</comment>
<file context>
@@ -11,7 +11,7 @@ import { getLastUserRetryParts } from "./last-user-retry-parts"
import { getAgentDisplayName } from "../../shared/agent-display-names"
-const SESSION_TTL_MS = 6 * 60 * 60 * 1000
+const SESSION_TTL_MS = 100 * 60 * 60 * 1000
declare function setTimeout(callback: () => void | Promise<void>, delay?: number): RuntimeFallbackTimeout
</file context>
| const SESSION_TTL_MS = 100 * 60 * 60 * 1000 | |
| const SESSION_TTL_MS = 6 * 60 * 60 * 1000 |
There was a problem hiding this comment.
Intentional — same reasoning as the other constants. 100 hours ensures session state is never cleaned up while an agent is still working. The first commit had incorrect 6-hour values which were corrected in the follow-up commit. The PR description reflects the correct 100-hour target.
|
|
||
| export const TASK_TTL_MS = 30 * 60 * 1000 | ||
| export const TERMINAL_TASK_TTL_MS = 30 * 60 * 1000 | ||
| export const TASK_TTL_MS = 100 * 60 * 60 * 1000 |
There was a problem hiding this comment.
P2: These timeout constants are set to 100 hours, which doesn’t match the intended 6‑hour defaults in the PR description. This will make stale/TTL cleanup 16× longer than expected.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/constants.ts, line 4:
<comment>These timeout constants are set to 100 hours, which doesn’t match the intended 6‑hour defaults in the PR description. This will make stale/TTL cleanup 16× longer than expected.</comment>
<file context>
@@ -1,11 +1,11 @@
-export const TASK_TTL_MS = 6 * 60 * 60 * 1000
-export const TERMINAL_TASK_TTL_MS = 6 * 60 * 60 * 1000
+export const TASK_TTL_MS = 100 * 60 * 60 * 1000
+export const TERMINAL_TASK_TTL_MS = 100 * 60 * 60 * 1000
export const MIN_STABILITY_TIME_MS = 10 * 1000
</file context>
| export const TASK_TTL_MS = 100 * 60 * 60 * 1000 | |
| export const TASK_TTL_MS = 6 * 60 * 60 * 1000 |
There was a problem hiding this comment.
This is intentional, not a typo. The first commit incorrectly used 6-hour values, which was fixed in the second commit to the correct 100-hour values (360,000,000 ms). The PR description has been updated to reflect this.
100 hours is the intended value — the goal is to effectively disable these timeouts so that agents are never prematurely terminated by an arbitrary timer. The orchestrator (parent agent) is responsible for deciding when a task is done, not a timeout. All values remain user-configurable via oh-my-opencode.jsonc for users who prefer shorter timeouts.
Problem
Background agents running complex, long-duration tasks (deep code analysis, multi-file refactoring, architecture reviews) are silently terminated by multiple timeout mechanisms before they can complete their work. This is especially problematic for orchestration-heavy setups where agents like Oracle or Hephaestus routinely need 30+ minutes to finish.
The current defaults are too aggressive for real-world multi-agent workflows:
TASK_TTL_MSTERMINAL_TASK_TTL_MSSESSION_TTL_MSDEFAULT_POLL_TIMEOUT_MSDEFAULT_STALE_TIMEOUT_MSDEFAULT_MESSAGE_STALENESS_TIMEOUT_MSDEFERRED_SESSION_TTL_MSRoot Cause
These timeouts were designed for short-lived tasks, but multi-agent orchestration introduces workloads that routinely exceed these limits. An Oracle agent performing deep diagnostic analysis can take 1-2 hours. A Hephaestus agent implementing complex changes across multiple files can take similar time. When these timeouts fire, the agent's work is silently lost — there is no error message, no recovery, just truncated output.
The
DEFERRED_SESSION_TTL_MSat 5 minutes is particularly problematic: when all concurrency slots are full, new tasks are queued as "deferred sessions." If slots don't free up within 5 minutes, the queued task is silently dropped. In heavy orchestration scenarios with multiple agents, this happens frequently.Solution
Increase all timeout defaults to 100 hours (360,000,000 ms) to effectively disable premature termination of long-running agent workloads. The
DEFERRED_SESSION_TTL_MSis increased to 1 hour (3,600,000 ms) to give queued tasks reasonable time to acquire a concurrency slot.All values remain user-configurable via
oh-my-opencode.jsoncunder thebackground_tasksection, so users who prefer shorter timeouts can still set them.Changes
src/features/background-agent/constants.tsTASK_TTL_MS:30 * 60 * 1000→100 * 60 * 60 * 1000(30 min → 100 hours)TERMINAL_TASK_TTL_MS:30 * 60 * 1000→100 * 60 * 60 * 1000(30 min → 100 hours)DEFAULT_STALE_TIMEOUT_MS:2_700_000→360_000_000(45 min → 100 hours)DEFAULT_MESSAGE_STALENESS_TIMEOUT_MS:3_600_000→360_000_000(60 min → 100 hours)src/hooks/runtime-fallback/auto-retry.tsSESSION_TTL_MS:30 * 60 * 1000→100 * 60 * 60 * 1000(30 min → 100 hours)src/tools/delegate-task/timing.tsDEFAULT_POLL_TIMEOUT_MS:30 * 60 * 1000→100 * 60 * 60 * 1000(30 min → 100 hours)src/features/tmux-subagent/manager.tsDEFERRED_SESSION_TTL_MS:5 * 60 * 1000→60 * 60 * 1000(5 min → 1 hour)Why these values?
background_task.taskTtlMs,background_task.staleTimeoutMs, etc.), so users who prefer shorter timeouts can still set them explicitly.These changes apply to version
v3.14.0of the source code.Testing
background-task.ts) already supports user overrides for all these values — no schema changes needed.oh-my-opencode.jsoncconfig file.dist/index.jsfor multiple days with zero premature agent terminations.