Skip to content

Runbook: concurrent same-instance cell advance can double-execute side effects (resolve with advancer B6/B8) #403

Description

@glassBead-tc

Summary

NotebookEngineRuntime.executeInstanceCell (and the batch gate) advance a runbook instance with a non-atomic read → executeCell → append sequence:

  1. listCellExecutions(instanceId) (read snapshot)
  2. assertCellExecutable(...) (ordering check)
  3. executeCell(...) (side effects)
  4. appendCellExecution(record) with seq = max(snapshot)+1

Two concurrent advances of the same instance + same cell can both pass step 2, both run step 3, and both compute the same seq; one append then fails on the (instance_id, seq) unique constraint. Result: the cell's side effects ran twice, but the durable log shows a single execution.

Flagged in PR #402 — Codex P1 (comment 3401523245) and Greptile P2 (3401254368).

Why it's deferred (not fixed in #402)

  • The durable record stays consistent: (instance_id, seq) uniqueness prevents double rows. Only the side effect can double-run.
  • The runbook tables are append-only (migration 20260612120000 revokes UPDATE/DELETE), so the textbook "reserve a running row, then finalize it" pattern is blocked — you cannot update a reserved row to its outcome.
  • Concurrent same-instance advance only arises once something advances instances automatically or contends for them — i.e. B6 (await↔claim binding) + B8 (the advancer), which do not exist yet. The spec's v0 model is single-holder ("any agent holding the instance, or a cron tick"). Building a cross-replica reservation table now would be infrastructure for a scenario the system cannot yet produce.

v0 assumption (documented in code)

Instance advance is single-writer per instance. The (instance_id, seq) unique constraint is the durable backstop; cross-replica concurrent same-instance advance is unsupported in v0.

Resolution

Design the cross-replica advance discipline with B8 (Advancer v0), where contention originates. Options to weigh then: a mutable runbook_instance_cursor coordination table (separate from the append-only executions) with an atomic compare-and-set claim before executeCell; or a Postgres advisory lock per instance.

Spec: SPEC-AGX-SUBSTRATE units B5/B6/B8.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingjavascriptPull requests that update javascript code

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions