FIFO workflow completion, durable replay queue, and failPendingTasks by sethconvex · Pull Request #211 · get-convex/workflow

sethconvex · 2026-02-26T16:30:11Z

Summary

Core executor performance improvements for high-throughput workflow processing:

FIFO completion ordering: Task queue indexed by [shard, workflowCreatedAt] so earlier-created workflows complete first. CLAIM_LIMIT matches MAX_CONCURRENCY (both 50) so executors re-query frequently, picking up later steps for earlier workflows instead of always grabbing step-0 tasks for newer ones.
Concurrency tuning: 50 concurrency × 100 shards = 5,000 concurrent slots. This works because real-world throughput is gated by the LLM API (Anthropic), not local compute — higher per-shard concurrency wastes V8 memory (64 MB limit per action) holding idle HTTP connections and response buffers.
Durable replay queue: Replaces fire-and-forget scheduler.runAfter safety net with a persistent replayQueue table. Entries are inserted atomically with result recording and only deleted after successful replay. Invariant: a workflow with runResult=null and no in-progress steps always has a row in replayQueue. Eliminates permanently stuck workflows.
failPendingTasks: Force-fails all queued tasks in a shard, marks steps as failed, inserts replay entries so workflows complete (as failures) rather than getting stuck. Useful for operational cleanup after crashes.
bumpEpoch: Stops running executors without starting new ones. Executors detect the stale epoch, drain in-flight tasks, and exit gracefully.
clearReplayQueue/clearTaskQueue: Shard-indexed bulk cleanup for operational recovery.

Benchmark results (20k real Claude Haiku workflows)

19,886 completed | 114 failed (0.57%) | 0 stuck

Timing:
  p50  = 12.1 min
  p90  = 20.8 min
  p99  = 22.1 min
  slow = 26.6 min

Priority ordering (creation-time deciles):
  Earliest 2k workflows: median 4.3 min to complete
  Latest 2k workflows:   median 20.1 min to complete

Test plan

10k real Claude Haiku benchmark: 9,968 completed, 32 failed, 0 stuck
20k real Claude Haiku benchmark: 19,886 completed, 114 failed, 0 stuck
Priority analysis confirms monotonically increasing completion time by creation order
failPendingTasks tested for operational cleanup of stale queues
bumpEpoch tested for stopping executors without starting new ones

🤖 Generated with Claude Code

Core executor performance improvements: - CLAIM_LIMIT=50 (matches MAX_CONCURRENCY) so executors re-query the task queue frequently, picking up later steps for earlier workflows instead of always grabbing step-0 tasks for newer workflows. - Task queue indexed by [shard, workflowCreatedAt] (ascending) so tasks for earlier-created workflows are always claimed first. This gives FIFO completion ordering: the first 2k of 20k workflows finish in ~4 min median while the last 2k take ~20 min. - 50 concurrency × 100 shards = 5000 concurrent slots. This works because real-world throughput is gated by the LLM API (Anthropic), not local compute. Higher per-shard concurrency wastes V8 memory (64 MB limit) holding idle HTTP connections. - Durable replay queue: replaces fire-and-forget scheduler.runAfter safety net with a persistent replayQueue table. Entries are inserted atomically with result recording and only deleted after successful replay. Eliminates permanently stuck workflows. - failPendingTasks mutation: force-fails all queued tasks in a shard, marks steps as failed, and inserts replay entries so workflows complete (as failures) rather than getting stuck forever. - bumpEpoch mutation: stops running executors without starting new ones. Executors detect the stale epoch and drain gracefully. - clearReplayQueue/clearTaskQueue: shard-indexed bulk cleanup for operational recovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sethconvex · 2026-02-26T16:30:29Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

Add executor benchmarks, viz, and diagnostic tools #212
FIFO workflow completion, durable replay queue, and failPendingTasks #211 👈 (View in Graphite)
feat: executor mode with query/mutation routing and per-step retries #210 : 1 other dependent PR (#209 )
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

pkg-pr-new · 2026-02-26T16:30:40Z

Open in StackBlitz

npm i https://pkg.pr.new/get-convex/workflow/@convex-dev/workflow@211

commit: a634363

coderabbitai · 2026-02-26T16:31:27Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch executor-perf-impl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

This was referenced Feb 26, 2026

chore: executor benchmarks, scripts, docs, and visualization #209

Open

feat: executor mode with query/mutation routing and per-step retries #210

Open

sethconvex mentioned this pull request Feb 26, 2026

Add executor benchmarks, viz, and diagnostic tools #212

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIFO workflow completion, durable replay queue, and failPendingTasks#211

FIFO workflow completion, durable replay queue, and failPendingTasks#211
sethconvex wants to merge 1 commit intoexecutor-mode-corefrom
executor-perf-impl

sethconvex commented Feb 26, 2026 •

edited

Loading

Uh oh!

sethconvex commented Feb 26, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sethconvex commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark results (20k real Claude Haiku workflows)

Test plan

Uh oh!

sethconvex commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sethconvex commented Feb 26, 2026 •

edited

Loading

sethconvex commented Feb 26, 2026 •

edited

Loading