diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1e23cf6..a73674b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,11 +8,19 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ### Added
 
+- **Per-instance fan-out resume contract** (proposal 0009, accepted in spec v0.18.0). The engine now writes a checkpoint record at every `completed` event inside a fan-out instance (in addition to the existing outermost-graph + subgraph-internal + fan-out node completion saves). On resume the engine consults the saved record's `fan_out_progress` field and treats each instance as `completed` (skip, contribution rolls forward), `in_flight` (re-run from subgraph entry), or `not_started` (dispatch normally). The `append` reducer's no-double-merge guarantee holds across resume because `completed` is a one-shot accumulator state.
+- **`FanOutProgress` and `FanOutInstanceProgress` public dataclasses** on `openarmature.checkpoint`. The `CheckpointRecord.fan_out_progress` field is now `tuple[FanOutProgress, ...]` (default empty tuple), with per-instance state, result, and `completed_inner_positions` observability. Was a `None` placeholder under proposal 0008.
+- **`FanOutInternalSaveBatching` config** on `InMemoryCheckpointer`. Backends MAY opt into batching scoped to fan-out instance internal saves to bound the write volume of high-instance-count fan-outs. Outermost-graph, subgraph-internal, and the fan-out node's own completion save remain synchronous regardless. Default off. Buffered-but-unflushed saves are lost on crash by design; on resume, instances whose `completed` state was only buffered revert and re-run. Surfaces a new optional `save_fan_out_internal` / `save_fan_out_in_flight_failure` Checkpointer Protocol seam; backends that don't implement either fall back to the standard `save`.
 - **Patterns docs section** at `docs/patterns/`, sibling to Concepts. Seeded with four recipes drawn from downstream usage and proposal 0008's alternatives section: parameterized entry point, tool-dispatch-as-node, session-as-checkpoint-resume, and bypass-if-output-exists. Patterns are user-level how-to recipes composing existing primitives, not framework contracts; new patterns can be added without spec coordination. Each page follows a problem / approach / snippet / when this is the right pattern / when it isn't / cross-references structure.
 
+### Changed
+
+- **Fan-out resume behavior** flipped from atomic restart (0008's v1 contract) to per-instance resume. A crash mid-fan-out used to re-run the entire fan-out on resume; now only the instances that did not complete-and-record their contribution re-run. The economics matter for large fan-outs of expensive work (LLM calls, long extractions): an 80% complete fan-out crash now restores 80% of its results rather than discarding them.
+- **`SQLiteCheckpointer` schema** picks up a new `fan_out_progress_blob` column (added via `ALTER TABLE` for backward compatibility with pre-0009 databases). Pre-0009 rows back-fill as NULL on load and round-trip as the empty-tuple default. Both `pickle` and `json` serialization modes round-trip the new field.
+
 ### Notes
 
-- **Pinned spec version bumped to v0.17.1.** Proposal 0019 (multi-provider wire-format extension) reframes llm-provider §8 as a catalog of wire-format mappings, with the existing OpenAI-compatible body nested under §8.1. Purely textual on the spec side — no behavioral change, no fixture changes. Code and doc references to §8.X updated to match the new structure (§8.1 → §8.1.1, §8.2 → §8.1.2, §8.3 → §8.1.3, §8.5.1 → §8.1.5.1, §8.1.1 → §8.1.1.1). All existing conformance fixtures continue to pass.
+- **Pinned spec version bumped from v0.17.0 to v0.18.1 over this Unreleased cycle.** Three spec versions absorbed: v0.17.1 (proposal 0019, multi-provider wire-format extension; purely textual reframe of llm-provider §8 as a catalog of wire-format mappings, OpenAI-compatible body nested under §8.1, code references updated to §8.1 / §8.1.1 / §8.1.2 / §8.1.3 / §8.1.5.1 / §8.1.1.1), v0.18.0 (proposal 0009, per-instance fan-out resume; pipeline-utilities §10.3 / §10.7 revised, §10.11 added with per-instance state machine plus composition rules plus configurable batching; the `append` reducer no-double-merge invariant from §10.11.1 is the load-bearing correctness story; see Added / Changed above), and v0.18.1 (fixture-only patch on `release/v0.18.1` correcting an off-by-one literal in fixture 052's expected `results`). All existing conformance fixtures continue to pass.
 
 ## [0.8.0] — 2026-05-23
 
diff --git a/docs/concepts/checkpointing.md b/docs/concepts/checkpointing.md
index 00427fd..9aaf838 100644
--- a/docs/concepts/checkpointing.md
+++ b/docs/concepts/checkpointing.md
@@ -26,9 +26,12 @@ graph = (
 ```
 
 The engine writes a record at every `completed` event for outermost-
-graph nodes and subgraph-internal nodes. **Fan-out instance internal
-events do NOT save** in the shipping version. Atomic-restart is the
-fan-out contract.
+graph nodes, subgraph-internal nodes, and fan-out instance internal
+nodes. **Per-instance fan-out resume** is the contract: on resume the
+engine re-runs only the instances that did not complete-and-record
+their contribution into the fan-out accumulator in the prior run;
+completed instances skip and their contributions roll forward to the
+fan-in step.
 
 ## Saves are synchronous-by-contract
 
@@ -79,8 +82,8 @@ class CheckpointRecord:
     completed_positions: tuple[NodePosition, ...]
     parent_states: tuple[Any, ...]
     last_saved_at: float
-    schema_version: str = CHECKPOINT_SCHEMA_VERSION
-    fan_out_progress: None = field(default=None)
+    schema_version: str = ""
+    fan_out_progress: tuple[FanOutProgress, ...] = field(default=())
 ```
 
 Field framing worth getting right:
@@ -103,9 +106,19 @@ Field framing worth getting right:
   Outermost first; empty for an outer-level save. Inner-node saves
   populate it so resume can re-enter a subgraph from the right
   depth without re-projecting.
-- **`fan_out_progress: None` is reserved** for a future per-instance
-  fan-out resume mode (planned, not yet shipped). In the shipping
-  version it's always `None`.
+- **`fan_out_progress` carries per-fan-out-node progress** when one
+  or more fan-outs are in flight at save time. Each `FanOutProgress`
+  entry records the fan-out's name, namespace, instance count, and a
+  per-instance state machine (`not_started` / `in_flight` /
+  `completed`) plus the recorded contribution for finalized
+  instances. On resume the engine consults this field to decide
+  which instances skip (their contributions roll forward) vs re-run
+  (re-execute from the inner subgraph's declared entry node). Empty
+  tuple when no fan-outs are in flight. See
+  [Resume semantics](fan-out.md#resume-semantics) on the fan-out
+  page for the full per-instance contract including reducer
+  composition, error_policy semantics, and the optional
+  fan-out-internal save batching.
 
 ## The Checkpointer Protocol
 
diff --git a/docs/concepts/fan-out.md b/docs/concepts/fan-out.md
index 40d9060..be0d58f 100644
--- a/docs/concepts/fan-out.md
+++ b/docs/concepts/fan-out.md
@@ -125,14 +125,59 @@ namespace.
 ## Resume semantics
 
 A fan-out node's `completed` event triggers a save like any other
-outermost-graph or subgraph-internal node. **Per-instance internal
-events do NOT save** in the shipping version; on resume, the
-fan-out re-runs end-to-end if it hadn't completed (atomic restart).
-
-A per-instance fan-out resume mode is planned but not yet shipped.
-The `fan_out_progress` field on `CheckpointRecord` is reserved for
-its eventual contents. Until it lands, atomic restart is the
-shipping behavior.
+outermost-graph or subgraph-internal node. Per-instance internal
+events also save, and the resume contract is **per-instance**: the
+engine consults the saved record's `fan_out_progress` entry for
+this fan-out and treats each instance as one of three states:
+
+- **`completed`**: the instance ran to completion in the prior run
+  and recorded its contribution into the accumulator. The engine
+  skips re-execution on resume; the contribution rolls forward to
+  the fan-in step.
+- **`in_flight`**: the instance began execution but its terminal
+  inner node had not yet fired `completed` at save time, so no
+  contribution was recorded. On resume the engine re-runs the
+  instance from the subgraph's declared entry node.
+  `completed_inner_positions` on the saved record are observational
+  only; they do NOT serve as a per-inner-node resume point.
+- **`not_started`**: the instance was not dispatched at save time.
+  On resume the engine dispatches it normally.
+
+The `append` reducer's no-double-merge guarantee holds because
+`completed` is a one-shot accumulator state: every completed
+instance's contribution rolls forward exactly once at fan-in.
+
+Under `error_policy: collect`, a failed instance's error record IS
+a `completed` contribution (the error rolls forward through the
+`errors_field` bucket rather than `target_field`). Under
+`error_policy: fail_fast`, a failed instance leaves the saved
+record with that instance in `in_flight` state; cancelled siblings
+are `in_flight` or `not_started`. None are `completed`, so resume
+re-runs them all.
+
+Per-instance saves can be high-volume in fan-outs with many
+instances or many inner nodes per instance. `Checkpointer` backends
+MAY opt into **configurable batching** scoped to fan-out instance
+internal saves; outermost-graph, subgraph-internal, and the fan-out
+node's own completion save remain synchronous. The in-memory
+backend exposes the knob via:
+
+```python
+from openarmature.checkpoint import (
+    InMemoryCheckpointer,
+    FanOutInternalSaveBatching,
+)
+
+cp = InMemoryCheckpointer(
+    fan_out_internal_save_batching=FanOutInternalSaveBatching(flush_every=10),
+)
+```
+
+Buffered-but-unflushed saves are lost on crash by design:
+instances whose `completed` state was only buffered revert to
+`in_flight` / `not_started` on resume and re-run. The trade-off is
+explicit (fewer writes per fan-out instance vs some redundant
+re-execution on crash recovery); default is no batching.
 
 ## When to reach for fan-out
 
diff --git a/docs/examples/05-fan-out-with-retry.md b/docs/examples/05-fan-out-with-retry.md
index 35911a0..7fffe0b 100644
--- a/docs/examples/05-fan-out-with-retry.md
+++ b/docs/examples/05-fan-out-with-retry.md
@@ -46,6 +46,28 @@ sentinel headline that always raises `ProviderUnavailable`; under
   `error_policy` at runtime. Inner-instance events carry
   `fan_out_index` but not the config.
 
+## Composing with checkpointing
+
+This example doesn't register a `Checkpointer`, but the fan-out
+pattern composes cleanly with checkpoint resume. When a fan-out
+runs under a registered backend, the resume contract is
+**per-instance**: instances that completed in the prior run skip
+re-execution and their contributions roll forward through the
+fan-in step; instances that were `in_flight` at save time re-run
+from the subgraph's entry node; not-started instances dispatch
+normally. The `append` reducer's no-double-merge guarantee holds
+across resume because `completed` is a one-shot accumulator state.
+
+Composition with `instance_middleware` (retry): on resume, an
+instance's `attempt_index` resets to 0 (a fresh retry budget) per
+spec graph-engine §6's resume semantics. So a retry-exhausted
+instance whose `in_flight` state was saved gets a fresh budget on
+the resumed run.
+
+See [Resume semantics in fan-out](../concepts/fan-out.md#resume-semantics)
+and the [Checkpointing concept page](../concepts/checkpointing.md)
+for the full contract.
+
 ## How to run
 
 ```bash
diff --git a/openarmature-spec b/openarmature-spec
index 5f8d25f..079a082 160000
--- a/openarmature-spec
+++ b/openarmature-spec
@@ -1 +1 @@
-Subproject commit 5f8d25fc3d6b97575e5aa4055550c36bcf83deee
+Subproject commit 079a082621bdaf277f9d803999830464370050e6
diff --git a/pyproject.toml b/pyproject.toml
index d3935c3..75a093a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -48,7 +48,7 @@ Repository = "https://github.com/LunarCommand/openarmature-python"
 Specification = "https://github.com/LunarCommand/openarmature-spec"
 
 [tool.openarmature]
-spec_version = "0.17.1"
+spec_version = "0.18.1"
 
 [dependency-groups]
 dev = [
diff --git a/src/openarmature/__init__.py b/src/openarmature/__init__.py
index 65beec1..c5bf5cb 100644
--- a/src/openarmature/__init__.py
+++ b/src/openarmature/__init__.py
@@ -1,4 +1,4 @@
 """OpenArmature: workflow framework for LLM pipelines and tool-calling agents."""
 
 __version__ = "0.8.0"
-__spec_version__ = "0.17.1"
+__spec_version__ = "0.18.1"
diff --git a/src/openarmature/checkpoint/__init__.py b/src/openarmature/checkpoint/__init__.py
index ebb139e..b569af7 100644
--- a/src/openarmature/checkpoint/__init__.py
+++ b/src/openarmature/checkpoint/__init__.py
@@ -20,7 +20,7 @@
 restores from a prior record.
 """
 
-from .backends import InMemoryCheckpointer, SerializationMode, SQLiteCheckpointer
+from .backends import FanOutInternalSaveBatching, InMemoryCheckpointer, SerializationMode, SQLiteCheckpointer
 from .errors import (
     CheckpointError,
     CheckpointNotFound,
@@ -36,6 +36,8 @@
     CheckpointFilter,
     CheckpointRecord,
     CheckpointSummary,
+    FanOutInstanceProgress,
+    FanOutProgress,
     NodePosition,
 )
 
@@ -51,6 +53,9 @@
     "CheckpointStateMigrationMissing",
     "CheckpointSummary",
     "Checkpointer",
+    "FanOutInstanceProgress",
+    "FanOutInternalSaveBatching",
+    "FanOutProgress",
     "InMemoryCheckpointer",
     "MigrationRegistry",
     "NodePosition",
diff --git a/src/openarmature/checkpoint/backends/__init__.py b/src/openarmature/checkpoint/backends/__init__.py
index 9b985ad..f2dab84 100644
--- a/src/openarmature/checkpoint/backends/__init__.py
+++ b/src/openarmature/checkpoint/backends/__init__.py
@@ -17,10 +17,11 @@
     from openarmature.checkpoint import InMemoryCheckpointer, SQLiteCheckpointer
 """
 
-from .memory import InMemoryCheckpointer
+from .memory import FanOutInternalSaveBatching, InMemoryCheckpointer
 from .sqlite import SerializationMode, SQLiteCheckpointer
 
 __all__ = [
+    "FanOutInternalSaveBatching",
     "InMemoryCheckpointer",
     "SQLiteCheckpointer",
     "SerializationMode",
diff --git a/src/openarmature/checkpoint/backends/memory.py b/src/openarmature/checkpoint/backends/memory.py
index b2c0c2f..f6dc348 100644
--- a/src/openarmature/checkpoint/backends/memory.py
+++ b/src/openarmature/checkpoint/backends/memory.py
@@ -12,10 +12,34 @@
 
 import asyncio
 from collections.abc import Iterable
+from dataclasses import dataclass
 
 from ..protocol import CheckpointFilter, CheckpointRecord, CheckpointSummary
 
 
+@dataclass(frozen=True)
+class FanOutInternalSaveBatching:
+    """Per-Checkpointer-instance configuration for §10.11.4 fan-out
+    internal save batching.
+
+    Applies ONLY to fan-out instance internal saves. Outermost-graph,
+    subgraph-internal, and fan-out node completion saves remain
+    synchronous per §10.3.
+
+    - ``flush_every``: flush the buffer every N buffered saves. ``0``
+      / negative means batching is disabled (every save flushes
+      immediately). The buffered save count resets at each flush.
+
+    Buffered-but-unflushed saves are LOST on crash per §10.11.4;
+    on resume, instances whose completed state was buffered-only
+    revert to ``in_flight`` / ``not_started`` and re-run. The §10.11.1
+    reducer correctness holds because their contributions hadn't
+    durably committed.
+    """
+
+    flush_every: int = 0
+
+
 class InMemoryCheckpointer:
     """Dict-backed Checkpointer.
 
@@ -36,6 +60,16 @@ class InMemoryCheckpointer:
     reference, so a version mismatch on resume raises
     ``CheckpointRecordInvalid`` rather than consulting the
     migration registry.
+
+    **Fan-out internal save batching** (per spec §10.11.4): optional
+    via the ``fan_out_internal_save_batching`` constructor parameter.
+    Default is no batching (every save flushes immediately). When
+    enabled, fan-out instance internal saves buffer in memory and
+    flush every ``flush_every`` saves. Outermost-graph,
+    subgraph-internal, and fan-out node completion saves bypass the
+    buffer entirely (they remain synchronous). On crash, buffered
+    saves are lost — by design, per §10.11.4's documented cost
+    trade-off.
     """
 
     # Per spec §10.12.1: in-memory storage holds live typed-state
@@ -47,20 +81,115 @@ class InMemoryCheckpointer:
     # so Pyright accepts a class-attribute override here.
     supports_state_migration: bool = False
 
-    def __init__(self) -> None:
+    def __init__(
+        self,
+        *,
+        fan_out_internal_save_batching: FanOutInternalSaveBatching | None = None,
+    ) -> None:
         self._records: dict[str, CheckpointRecord] = {}
         self._lock = asyncio.Lock()
+        self._fan_out_batching = fan_out_internal_save_batching
+        # Buffered fan-out internal saves keyed by invocation_id. Each
+        # entry holds the latest buffered record for that invocation;
+        # subsequent buffered saves overwrite (the most recent state
+        # is what would have flushed). Per-invocation counts of
+        # buffered saves decide when to flush per ``flush_every``;
+        # keeping counts per-invocation isolates concurrent
+        # invocations that share the same checkpointer.
+        self._fan_out_buffer: dict[str, CheckpointRecord] = {}
+        self._fan_out_buffer_counts: dict[str, int] = {}
 
     async def save(self, invocation_id: str, record: CheckpointRecord) -> None:
         """Store ``record`` under ``invocation_id``, replacing any
         previous record for the same id. Not durable across process
-        restarts."""
+        restarts.
+
+        Per §10.11.4: outermost-graph, subgraph-internal, and
+        fan-out node completion saves are synchronous regardless of
+        the batching configuration. The engine routes fan-out
+        instance internal saves through :meth:`save_fan_out_internal`
+        instead; this method bypasses the buffer.
+        """
         async with self._lock:
+            # Flush any buffered fan-out internal saves for this
+            # invocation before recording the (synchronous) save —
+            # otherwise a fan-out node completion save could land in
+            # the persistent slot while a more-recent buffered
+            # in-flight save sits in the buffer, inverting the
+            # save order.
+            self._flush_invocation_buffer_locked(invocation_id)
             self._records[invocation_id] = record
 
+    async def save_fan_out_internal(self, invocation_id: str, record: CheckpointRecord) -> None:
+        """Buffer a fan-out instance internal save under the §10.11.4
+        batching policy. When batching is disabled (default), behaves
+        identically to :meth:`save` — every save is synchronously
+        durable. When ``flush_every`` is positive, the save is
+        buffered; the buffer flushes when the count reaches the
+        configured threshold.
+        """
+        if self._fan_out_batching is None or self._fan_out_batching.flush_every <= 0:
+            await self.save(invocation_id, record)
+            return
+        async with self._lock:
+            self._fan_out_buffer[invocation_id] = record
+            self._fan_out_buffer_counts[invocation_id] = self._fan_out_buffer_counts.get(invocation_id, 0) + 1
+            if self._fan_out_buffer_counts[invocation_id] >= self._fan_out_batching.flush_every:
+                self._flush_invocation_buffer_locked(invocation_id)
+
+    async def save_fan_out_in_flight_failure(
+        self,
+        invocation_id: str,
+        record: CheckpointRecord,
+    ) -> None:
+        """Buffer an "instance failed mid-execution" save under §10.11.4
+        batching. The failure save records the in_flight state of an
+        instance whose terminal inner node raised; this save closes the
+        in_flight observability gap (per §10.11) for instances whose
+        subgraphs have no sibling-completed save to piggyback on.
+
+        Under batching, this save buffers BUT does NOT count toward
+        the flush threshold. The rationale: this save logically
+        represents "the moment of crash" — a real crash wouldn't
+        complete an extra save first; the buffered records (and this
+        one) would simply be lost. The batching count-trigger mechanism
+        is meant for steady-state save flow, not the abort path.
+
+        Backends without batching route this to a synchronous
+        :meth:`save` — the failure save is durable in the non-batching
+        case (fixture 048's in_flight observability requirement).
+        """
+        if self._fan_out_batching is None or self._fan_out_batching.flush_every <= 0:
+            await self.save(invocation_id, record)
+            return
+        async with self._lock:
+            # Overwrite the buffer slot (the most-recent state is
+            # what the next flush would capture if one fires later)
+            # but DO NOT increment the count or trigger a flush.
+            # On crash, this record is lost along with the rest of
+            # the buffer — by design per §10.11.4.
+            self._fan_out_buffer[invocation_id] = record
+
+    def _flush_invocation_buffer_locked(self, invocation_id: str) -> None:
+        """Caller-holds-lock helper: flush this invocation's buffered
+        fan-out internal save (if any) into the persistent records
+        dict. Resets only this invocation's buffer count, leaving
+        other invocations' accounting untouched so concurrent
+        invocations sharing the checkpointer don't interfere with
+        each other's flush thresholds."""
+        buffered = self._fan_out_buffer.pop(invocation_id, None)
+        if buffered is not None:
+            self._records[invocation_id] = buffered
+        self._fan_out_buffer_counts.pop(invocation_id, None)
+
     async def load(self, invocation_id: str) -> CheckpointRecord | None:
         """Return the saved record for ``invocation_id`` or ``None``
-        if nothing has been saved under that id."""
+        if nothing has been saved under that id. Per §10.11.4:
+        buffered-but-unflushed fan-out internal saves are NOT visible
+        to ``load`` — that's the crash-loses-buffered contract. To
+        simulate a crash before the buffer flushes, drop the
+        Checkpointer reference; the buffer is in-memory only.
+        """
         async with self._lock:
             return self._records.get(invocation_id)
 
@@ -91,4 +220,4 @@ async def delete(self, invocation_id: str) -> None:
             self._records.pop(invocation_id, None)
 
 
-__all__ = ["InMemoryCheckpointer"]
+__all__ = ["FanOutInternalSaveBatching", "InMemoryCheckpointer"]
diff --git a/src/openarmature/checkpoint/backends/sqlite.py b/src/openarmature/checkpoint/backends/sqlite.py
index a004f22..3273e75 100644
--- a/src/openarmature/checkpoint/backends/sqlite.py
+++ b/src/openarmature/checkpoint/backends/sqlite.py
@@ -45,12 +45,23 @@
     CheckpointFilter,
     CheckpointRecord,
     CheckpointSummary,
+    FanOutInstanceProgress,
+    FanOutProgress,
     NodePosition,
 )
 
 SerializationMode = Literal["pickle", "json"]
 
 
+# Proposal 0009 / spec v0.18.0 sqlite serialization choice (Q3 in the
+# impl plan): the new ``fan_out_progress`` field on CheckpointRecord
+# gets a dedicated BLOB column (Plan B). Plan A from the impl plan
+# was JSON-blob expansion of an existing blob, but each existing blob
+# encodes one specific field (state, positions, parent_states); adding
+# fan_out_progress as a new column keeps the field-to-blob mapping
+# obvious and avoids smearing two semantically distinct fields into
+# one. The column is added via ALTER TABLE for backward compatibility
+# with databases written before this proposal landed.
 _SCHEMA_DDL = """
 CREATE TABLE IF NOT EXISTS checkpoints (
     invocation_id     TEXT PRIMARY KEY,
@@ -66,6 +77,66 @@
     ON checkpoints (correlation_id);
 """
 
+# Idempotent column add for the fan_out_progress blob. Older databases
+# created before proposal 0009 lack this column; SQLite has no
+# ``ADD COLUMN IF NOT EXISTS``, so we attempt the ADD and swallow the
+# duplicate-column error.
+_FAN_OUT_PROGRESS_COLUMN_DDL = "ALTER TABLE checkpoints ADD COLUMN fan_out_progress_blob BLOB"
+
+
+def _fan_out_progress_to_dict(fp: FanOutProgress) -> dict[str, Any]:
+    """Serialize a frozen :class:`FanOutProgress` entry to a dict shape
+    the configured serialization mode round-trips cleanly.
+
+    JSON mode walks tuples as lists already; pickle mode round-trips
+    dicts identically. The shape mirrors the dataclass fields one for
+    one with namespace/positions flattened to lists.
+    """
+    return {
+        "fan_out_node_name": fp.fan_out_node_name,
+        "namespace": list(fp.namespace),
+        "instance_count": fp.instance_count,
+        "instances": [
+            {
+                "state": inst.state,
+                "result": inst.result,
+                "completed_inner_positions": [asdict(p) for p in inst.completed_inner_positions],
+            }
+            for inst in fp.instances
+        ],
+    }
+
+
+def _fan_out_progress_from_dict(d: dict[str, Any]) -> FanOutProgress:
+    """Inverse of :func:`_fan_out_progress_to_dict` — rebuild a frozen
+    :class:`FanOutProgress` from its dict shape, restoring positions
+    as :class:`NodePosition` instances."""
+    instances: list[FanOutInstanceProgress] = []
+    for inst in cast("list[dict[str, Any]]", d["instances"]):
+        inner_positions = tuple(
+            NodePosition(
+                namespace=tuple(p["namespace"]),
+                node_name=p["node_name"],
+                step=p["step"],
+                attempt_index=p.get("attempt_index", 0),
+                fan_out_index=p.get("fan_out_index"),
+            )
+            for p in cast("list[dict[str, Any]]", inst.get("completed_inner_positions", []))
+        )
+        instances.append(
+            FanOutInstanceProgress(
+                state=inst["state"],
+                result=inst.get("result"),
+                completed_inner_positions=inner_positions,
+            )
+        )
+    return FanOutProgress(
+        fan_out_node_name=d["fan_out_node_name"],
+        namespace=tuple(d["namespace"]),
+        instance_count=d["instance_count"],
+        instances=tuple(instances),
+    )
+
 
 def _to_json_native(obj: Any) -> Any:
     """Walk ``obj`` converting Pydantic ``BaseModel`` instances to
@@ -125,6 +196,16 @@ def _connect(self) -> sqlite3.Connection:
     def _initialize_sync(self) -> None:
         with self._connect() as conn:
             conn.executescript(_SCHEMA_DDL)
+            # Add the fan_out_progress_blob column for databases written
+            # before proposal 0009. Idempotent: subsequent runs against
+            # an already-migrated database hit "duplicate column" and
+            # swallow it. New databases pick the column up via the
+            # initial table create + this ALTER, equivalent end state.
+            try:
+                conn.execute(_FAN_OUT_PROGRESS_COLUMN_DDL)
+            except sqlite3.OperationalError as exc:
+                if "duplicate column" not in str(exc).lower():
+                    raise
 
     async def _ensure_initialized(self) -> None:
         if self._initialized:
@@ -168,14 +249,21 @@ def _decode(self, blob: bytes, recorded_mode: str, invocation_id: str) -> Any:
 
     async def save(self, invocation_id: str, record: CheckpointRecord) -> None:
         """Upsert ``record`` under ``invocation_id``. The state,
-        completed positions, and parent-state stack are serialized via
-        the configured :class:`SerializationMode` and written in a
-        single statement. Writes are durable on return (WAL mode,
-        per-write fsync at the SQLite layer)."""
+        completed positions, parent-state stack, and (per proposal 0009)
+        per-fan-out-node progress are serialized via the configured
+        :class:`SerializationMode` and written in a single statement.
+        Writes are durable on return (WAL mode, per-write fsync at the
+        SQLite layer)."""
         await self._ensure_initialized()
         state_blob = self._encode(record.state)
         positions_blob = self._encode([asdict(p) for p in record.completed_positions])
         parent_states_blob = self._encode(list(record.parent_states))
+        # Per pipeline-utilities §10.11: serialize the per-fan-out-node
+        # progress sequence. Empty tuple is the common case (no fan-outs
+        # in flight at save time) and round-trips as an empty list.
+        fan_out_progress_blob = self._encode(
+            [_fan_out_progress_to_dict(fp) for fp in record.fan_out_progress]
+        )
         serialization_mode = self._serialization
 
         def _do() -> None:
@@ -185,16 +273,17 @@ def _do() -> None:
                     INSERT INTO checkpoints
                         (invocation_id, correlation_id, state_blob,
                          positions_blob, parent_states_blob, last_saved_at,
-                         schema_version, serialization)
-                    VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+                         schema_version, serialization, fan_out_progress_blob)
+                    VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
                     ON CONFLICT(invocation_id) DO UPDATE SET
-                        correlation_id     = excluded.correlation_id,
-                        state_blob         = excluded.state_blob,
-                        positions_blob     = excluded.positions_blob,
-                        parent_states_blob = excluded.parent_states_blob,
-                        last_saved_at      = excluded.last_saved_at,
-                        schema_version     = excluded.schema_version,
-                        serialization      = excluded.serialization
+                        correlation_id        = excluded.correlation_id,
+                        state_blob            = excluded.state_blob,
+                        positions_blob        = excluded.positions_blob,
+                        parent_states_blob    = excluded.parent_states_blob,
+                        last_saved_at         = excluded.last_saved_at,
+                        schema_version        = excluded.schema_version,
+                        serialization         = excluded.serialization,
+                        fan_out_progress_blob = excluded.fan_out_progress_blob
                     """,
                     (
                         invocation_id,
@@ -205,6 +294,7 @@ def _do() -> None:
                         record.last_saved_at,
                         record.schema_version,
                         serialization_mode,
+                        fan_out_progress_blob,
                     ),
                 )
 
@@ -225,7 +315,7 @@ def _do() -> tuple[Any, ...] | None:
                     """
                     SELECT correlation_id, state_blob, positions_blob,
                            parent_states_blob, last_saved_at,
-                           schema_version, serialization
+                           schema_version, serialization, fan_out_progress_blob
                     FROM checkpoints
                     WHERE invocation_id = ?
                     """,
@@ -245,6 +335,7 @@ def _do() -> tuple[Any, ...] | None:
             last_saved_at,
             schema_version,
             recorded_serialization,
+            fan_out_progress_blob,
         ) = row
         # Note: per spec §10.12 (proposal 0014), version mismatches
         # are no longer rejected at the backend boundary. The engine
@@ -265,6 +356,20 @@ def _do() -> tuple[Any, ...] | None:
             )
             for p in position_dicts
         )
+        # fan_out_progress_blob may be NULL on rows written before
+        # proposal 0009 (the column was added via ALTER TABLE and
+        # back-fills as NULL on pre-existing rows). Treat NULL as
+        # "no fan-outs in flight at save time" — the empty-tuple
+        # default on CheckpointRecord.
+        if fan_out_progress_blob is None:
+            fan_out_progress: tuple[FanOutProgress, ...] = ()
+        else:
+            fan_out_progress_dicts = self._decode(
+                fan_out_progress_blob,
+                recorded_serialization,
+                invocation_id,
+            )
+            fan_out_progress = tuple(_fan_out_progress_from_dict(fp) for fp in fan_out_progress_dicts)
         return CheckpointRecord(
             invocation_id=invocation_id,
             correlation_id=correlation_id,
@@ -273,6 +378,7 @@ def _do() -> tuple[Any, ...] | None:
             parent_states=tuple(parent_states),
             last_saved_at=last_saved_at,
             schema_version=schema_version,
+            fan_out_progress=fan_out_progress,
         )
 
     async def list(self, filter: CheckpointFilter | None = None) -> Iterable[CheckpointSummary]:
diff --git a/src/openarmature/checkpoint/protocol.py b/src/openarmature/checkpoint/protocol.py
index aaabc57..4e85e31 100644
--- a/src/openarmature/checkpoint/protocol.py
+++ b/src/openarmature/checkpoint/protocol.py
@@ -34,7 +34,7 @@
 
 from collections.abc import Iterable
 from dataclasses import dataclass, field
-from typing import Any, Protocol
+from typing import Any, Literal, Protocol
 
 
 # Spec: realizes pipeline-utilities §10.2 NodePosition. Field semantics
@@ -80,18 +80,84 @@ class NodePosition:
     fan_out_index: int | None = None
 
 
+# Spec: realizes pipeline-utilities §10.11 per-instance fan-out
+# progress. Promoted from a None placeholder under proposal 0008 to
+# populated structures under proposal 0009: each fan-out node that is
+# in flight at save time contributes one FanOutProgress entry with
+# per-instance state, and saves now fire at every fan-out instance
+# internal completed event (not only at fan-out node completion).
+@dataclass(frozen=True)
+class FanOutInstanceProgress:
+    """Per-instance progress entry inside a fan-out's
+    :attr:`FanOutProgress.instances` sequence.
+
+    Fields:
+
+    - ``state``: one of ``"completed"``, ``"in_flight"``,
+      ``"not_started"``. The ``completed`` state is the load-bearing
+      correctness contract: an instance marked ``completed`` MUST have
+      its contribution recorded into the accumulator AND that
+      contribution MUST be reflected in ``result``. Reducer composition
+      rules (§10.11.1) depend on this exactly-once guarantee.
+    - ``result``: for ``completed`` instances, the durable contribution
+      to the fan-out accumulator (a success value for the
+      ``target_field`` bucket, or under ``collect`` error policy an
+      error entry for the ``errors_field`` bucket). Typed per the
+      parent state schema's ``target_field`` / ``errors_field``
+      (representation is implementation-defined per §10.11; Python
+      stores as ``Any`` since dynamic typing absorbs the variance).
+      Unused for ``in_flight`` and ``not_started``.
+    - ``completed_inner_positions``: for ``in_flight`` instances, a
+      tuple of ``NodePosition`` entries captured at save time. Same
+      shape as :attr:`CheckpointRecord.completed_positions` but scoped
+      to this instance's inner subgraph rather than the outer graph.
+      Empty when the instance fired its first ``started`` event but
+      no inner ``completed`` event yet. Observational only:
+      ``in_flight`` instances re-enter at the subgraph entry node on
+      resume, not at any of these positions. Unused for ``completed``
+      and ``not_started``.
+    """
+
+    state: Literal["completed", "in_flight", "not_started"]
+    result: Any = None
+    completed_inner_positions: tuple[NodePosition, ...] = ()
+
+
+@dataclass(frozen=True)
+class FanOutProgress:
+    """Per-fan-out-node progress entry inside a
+    :attr:`CheckpointRecord.fan_out_progress` sequence.
+
+    Fields:
+
+    - ``fan_out_node_name``: the fan-out node's name in its containing
+      graph.
+    - ``namespace``: the chain of outer subgraph-node names enclosing
+      the fan-out (empty for outermost-graph fan-outs). Disambiguates
+      fan-outs of the same name in different nested-subgraph contexts.
+    - ``instance_count``: the resolved instance count for this fan-out
+      (per pipeline-utilities §9 count or items_field mode).
+    - ``instances``: a tuple of per-instance entries indexed by
+      ``fan_out_index`` (``instances[i]`` is the entry for
+      ``fan_out_index=i``). Length equals ``instance_count``.
+    """
+
+    fan_out_node_name: str
+    namespace: tuple[str, ...]
+    instance_count: int
+    instances: tuple[FanOutInstanceProgress, ...]
+
+
 # Spec: realizes pipeline-utilities §10.2 CheckpointRecord.
-# ``fan_out_progress`` is reserved for proposal 0009 (per-instance
-# fan-out resume); always ``None`` in the shipping version.
 @dataclass(frozen=True)
 class CheckpointRecord:
     """One invocation's progress at one save point.
 
     Frozen: backends MUST treat the record as immutable. The engine
     builds a fresh record per ``completed`` event rather than mutating
-    a shared one. The ``fan_out_progress`` field is reserved for a
-    future per-instance fan-out resume mode; in the shipping version
-    it is always ``None``.
+    a shared one. The ``fan_out_progress`` field (per §10.11) carries
+    per-fan-out-node entries when one or more fan-outs are in flight
+    at save time; an empty tuple means no fan-out progress to record.
     """
 
     invocation_id: str
@@ -105,7 +171,7 @@ class CheckpointRecord:
     # Empty-string sentinel for state classes that don't declare a
     # version; non-empty declares migration-eligibility.
     schema_version: str = ""
-    fan_out_progress: None = field(default=None)
+    fan_out_progress: tuple[FanOutProgress, ...] = field(default=())
 
 
 # Spec: realizes pipeline-utilities §10.1 CheckpointSummary. The four
@@ -224,5 +290,7 @@ async def delete(self, invocation_id: str) -> None:
     "CheckpointRecord",
     "CheckpointSummary",
     "Checkpointer",
+    "FanOutInstanceProgress",
+    "FanOutProgress",
     "NodePosition",
 ]
diff --git a/src/openarmature/graph/compiled.py b/src/openarmature/graph/compiled.py
index 1de9413..bc26f4a 100644
--- a/src/openarmature/graph/compiled.py
+++ b/src/openarmature/graph/compiled.py
@@ -26,7 +26,7 @@
 import asyncio
 import time
 import uuid
-from collections.abc import Callable, Iterable, Mapping
+from collections.abc import Callable, Iterable, Mapping, Sequence
 from dataclasses import dataclass, field
 from dataclasses import replace as dataclass_replace
 from typing import TYPE_CHECKING, Any, cast
@@ -57,6 +57,8 @@
 from openarmature.checkpoint.protocol import (
     Checkpointer,
     CheckpointRecord,
+    FanOutInstanceProgress,
+    FanOutProgress,
     NodePosition,
 )
 from openarmature.observability.correlation import (
@@ -96,6 +98,8 @@
     SubscribedObserver,
     _coerce_subscribed,
     _dispatch,
+    _FanOutExecutionState,
+    _FanOutInstanceState,
     _InvocationContext,
     _QueuedItem,
     deliver_loop,
@@ -263,6 +267,191 @@ def _no_op_finalize(_edge_error: RuntimeGraphError | None) -> None:
     silently per proposal 0012 + fixture 013."""
 
 
+# Helpers for the proposal 0009 per-instance fan-out resume contract.
+# The shared mutable ``fan_out_progress_state`` dict on
+# _InvocationContext is keyed by ``(namespace, fan_out_node_name)``;
+# these helpers locate / project / mutate it consistently.
+
+
+def _find_innermost_fan_out_instance_state(
+    context: _InvocationContext,
+) -> _FanOutInstanceState | None:
+    """Locate the per-instance state for the innermost active fan-out
+    relative to ``context``.
+
+    A node running inside fan-out instance ``i`` of fan-out ``F``
+    sees ``context.namespace_prefix`` ending with ``F``'s own name
+    and ``context.fan_out_index == i``. Walk the namespace prefix
+    back to find the longest matching key in ``fan_out_progress_state``
+    so nested fan-outs route to the right level.
+
+    Returns ``None`` when no match is found — defensive against an
+    inner node firing outside any registered fan-out (shouldn't
+    happen if ``FanOutNode.run_with_context`` correctly registers
+    each fan-out before descending). Callers that expect a hit
+    surface the missing-state case as a no-op rather than a crash.
+    """
+    if context.fan_out_index is None:
+        return None
+    prefix = context.namespace_prefix
+    state_dict = context.fan_out_progress_state
+    # Walk the prefix from longest to shortest. The innermost
+    # fan-out's full key is (namespace_before_fan_out, fan_out_name)
+    # where namespace_before_fan_out + (fan_out_name,) == prefix.
+    for split in range(len(prefix), 0, -1):
+        key = (prefix[: split - 1], prefix[split - 1])
+        if key in state_dict:
+            exec_state = state_dict[key]
+            idx = context.fan_out_index
+            if 0 <= idx < len(exec_state.instances):
+                return exec_state.instances[idx]
+    return None
+
+
+def _project_fan_out_progress(
+    state_dict: Mapping[tuple[tuple[str, ...], str], _FanOutExecutionState],
+) -> tuple[FanOutProgress, ...]:
+    """Project the engine-internal mutable per-fan-out state into the
+    frozen :class:`FanOutProgress` shape on a saved record.
+
+    Per §10.11's snapshot semantics, a save fires with ALL concurrent
+    fan-out instances' states captured at the moment of the save —
+    not just the one whose ``completed`` event triggered the save.
+    This projection enumerates the whole dict; the engine save site
+    calls it once per save regardless of which fan-out's inner node
+    fired the event.
+
+    Deterministic ordering: sort by (namespace, fan_out_node_name).
+    Two saves carrying the same logical state then serialize
+    byte-identically, which matters for backends that hash records.
+    """
+    out: list[FanOutProgress] = []
+    for (namespace, name), exec_state in sorted(state_dict.items()):
+        instances = tuple(
+            FanOutInstanceProgress(
+                state=inst.state,
+                result=inst.result,
+                completed_inner_positions=tuple(inst.completed_inner_positions),
+            )
+            for inst in exec_state.instances
+        )
+        out.append(
+            FanOutProgress(
+                fan_out_node_name=name,
+                namespace=namespace,
+                instance_count=exec_state.instance_count,
+                instances=instances,
+            )
+        )
+    return tuple(out)
+
+
+def _restore_fan_out_progress_state(
+    saved: Sequence[FanOutProgress],
+) -> dict[tuple[tuple[str, ...], str], _FanOutExecutionState]:
+    """Inverse projection of :func:`_project_fan_out_progress`. On resume
+    the loaded record's frozen ``fan_out_progress`` tuple gets unpacked
+    into the mutable per-fan-out tracking dict that ``FanOutNode``
+    consults to decide which instances to skip vs re-run.
+
+    Extra-output state isn't preserved across resume — the spec models
+    ``result`` as a single accumulator entry and is silent on
+    ``extra_outputs``. Reconstructing them would require either
+    serializing them on the record (a spec change) or recomputing them
+    (defeating the point of skip-on-resume). Fixtures don't exercise
+    ``extra_outputs`` on the resume path; if a future workload needs
+    them, surface as a follow-on.
+
+    ``result_is_error`` distinguishes success contributions from
+    collect-mode error contributions. The public ``FanOutInstanceProgress``
+    shape doesn't carry this flag (the spec presents ``result`` as a
+    single typed entry), so it's reconstructed by structural pattern-
+    matching: an error record is a ``dict`` with the engine's
+    canonical ``fan_out_index`` + ``category`` keys (per
+    ``_fan_in_collect``). Success values from the user's state schema
+    aren't expected to take this exact shape.
+    """
+    out: dict[tuple[tuple[str, ...], str], _FanOutExecutionState] = {}
+    for fp in saved:
+        instances: list[_FanOutInstanceState] = []
+        for inst in fp.instances:
+            result_is_error = _looks_like_error_record(inst.result)
+            instances.append(
+                _FanOutInstanceState(
+                    state=inst.state,
+                    result=inst.result,
+                    result_is_error=result_is_error,
+                    extra_outputs={},
+                    completed_inner_positions=list(inst.completed_inner_positions),
+                )
+            )
+        key = (fp.namespace, fp.fan_out_node_name)
+        out[key] = _FanOutExecutionState(
+            fan_out_node_name=fp.fan_out_node_name,
+            namespace=fp.namespace,
+            instance_count=fp.instance_count,
+            instances=instances,
+        )
+    return out
+
+
+def _looks_like_error_record(value: Any) -> bool:
+    """Heuristic: identify the engine's error_record shape from
+    ``_fan_in_collect`` (``dict[str, str]`` with ``fan_out_index`` +
+    ``category`` keys). Used to reconstruct
+    ``_FanOutInstanceState.result_is_error`` from the public
+    ``FanOutInstanceProgress.result`` on resume.
+    """
+    if not isinstance(value, dict):
+        return False
+    value_dict = cast("dict[str, Any]", value)
+    return "fan_out_index" in value_dict and "category" in value_dict
+
+
+async def _save_fan_out_internal(
+    checkpointer: Any,
+    invocation_id: str,
+    record: CheckpointRecord,
+) -> None:
+    """Route a fan-out-internal save through the checkpointer's
+    optional batching seam.
+
+    Per spec §10.11.4, Checkpointer backends MAY support batching
+    scoped to fan-out internal saves. When the backend exposes a
+    ``save_fan_out_internal`` coroutine, route there so it can buffer
+    or flush per its configuration. Otherwise, fall back to the
+    standard ``save`` — non-batching backends see no behavioral change.
+    """
+    saver = getattr(checkpointer, "save_fan_out_internal", None)
+    if saver is None:
+        await checkpointer.save(invocation_id, record)
+        return
+    await saver(invocation_id, record)
+
+
+async def _save_fan_out_in_flight_failure(  # pyright: ignore[reportUnusedFunction]
+    checkpointer: Any,
+    invocation_id: str,
+    record: CheckpointRecord,
+) -> None:
+    """Route an "instance failed mid-execution" save through the
+    checkpointer's failure-save seam (§10.11.4 + the in_flight
+    observability gap §10.11).
+
+    Backends that expose ``save_fan_out_in_flight_failure`` get the
+    save directly; under batching, the typical implementation
+    buffers without triggering the flush count (preserving the
+    "buffered saves lost on crash" model). Backends that don't
+    expose the hook fall back to ``save`` so non-batching backends
+    keep the failure save durable.
+    """
+    saver = getattr(checkpointer, "save_fan_out_in_flight_failure", None)
+    if saver is None:
+        await checkpointer.save(invocation_id, record)
+        return
+    await saver(invocation_id, record)
+
+
 @dataclass(frozen=True)
 class _MigrationSummary:
     """Per-resume migration-chain metadata threaded out of
@@ -670,6 +859,14 @@ async def invoke(
             # matches what the engine looks up at run time
             # (``context.namespace_prefix + (current,)``).
             resume_skip_set = frozenset(p.namespace + (p.node_name,) for p in completed_positions)
+            # Per spec §10.7 / §10.11: restore per-fan-out per-instance
+            # state from the loaded record. ``FanOutNode.run_with_context``
+            # consults this on re-dispatch — completed instances skip,
+            # in_flight / not_started instances re-execute. Empty tuple
+            # when no fan-outs were in flight at save time.
+            fan_out_progress_state = _restore_fan_out_progress_state(record.fan_out_progress)
+        else:
+            fan_out_progress_state = {}
 
         context = _InvocationContext(
             queue=queue,
@@ -682,6 +879,7 @@ async def invoke(
             resume_skip_set=resume_skip_set,
             pending_resume_states=pending_resume_states,
             resume_invocation=resume_invocation,
+            fan_out_progress_state=fan_out_progress_state,
         )
         # Spec observability §3.1: the correlation_id MUST be readable
         # from anywhere within the invocation's async call tree via the
@@ -1420,69 +1618,85 @@ async def innermost(s: Any) -> Mapping[str, Any]:
         dispatch_token = _set_active_dispatch(lambda event: _dispatch(context, event))
         namespace_token = _set_namespace_prefix(namespace)
         fan_out_token = _set_fan_out_index(context.fan_out_index)
+        # Per spec §10.11 the ``fan_out_progress`` entry is "in-flight
+        # only"; the fan-out's own completion save below is the last
+        # point where the entry is needed (proposal 0009: that save
+        # "also finalizes fan_out_progress to mark all instances
+        # complete"). Pop the entry after the save fires, regardless of
+        # whether the fan-out completed normally, short-circuited, or
+        # raised, so subsequent saves in this invocation do not carry
+        # stale fan-out progress and a retry middleware on the fan-out
+        # node sees a fresh tracked state on the second attempt.
+        fan_out_progress_key = (context.namespace_prefix, current)
         try:
             try:
-                final_partial = await chain(state)
-            except RuntimeGraphError:
-                raise
-            except Exception as e:
-                raise NodeException(node_name=current, cause=e, recoverable_state=state) from e
-        finally:
-            _reset_fan_out_index(fan_out_token)
-            _reset_namespace_prefix(namespace_token)
-            _reset_active_dispatch(dispatch_token)
-            _reset_active_observers(observers_token)
-        merged_outer = _merge_partial(state, final_partial, self.reducers, current)
-        # Spec §10.3 + §10.7: the fan-out's own completion DOES save —
-        # one record once the fan-out as a whole has finished and
-        # results have merged back. Per-instance internal saves are
-        # gated off by the fan-out instance descent setting
-        # ``checkpointer=None`` on the inner context. Per graph-engine
-        # §6 v0.16.1: the saved attempt_index reflects the wrapping
-        # retry MW's counter (sourced from ``deferred_info[0]`` which
-        # captured ``current_attempt_index()`` at the moment of the
-        # successful merge). Short-circuit case (middleware returned
-        # without invoking ``next``) records attempt_index=0.
-        info = deferred_info[0]
-        saved_attempt = info[0] if info is not None else 0
-        await self._maybe_save_checkpoint(
-            context,
-            node_name=current,
-            namespace=namespace,
-            step=step,
-            attempt_index=saved_attempt,
-            post_state=merged_outer,
-        )
+                try:
+                    final_partial = await chain(state)
+                except RuntimeGraphError:
+                    raise
+                except Exception as e:
+                    raise NodeException(node_name=current, cause=e, recoverable_state=state) from e
+            finally:
+                _reset_fan_out_index(fan_out_token)
+                _reset_namespace_prefix(namespace_token)
+                _reset_active_dispatch(dispatch_token)
+                _reset_active_observers(observers_token)
+            merged_outer = _merge_partial(state, final_partial, self.reducers, current)
+            # Spec §10.3 + §10.7 + proposal 0009 §10.11: the fan-out's
+            # own completion DOES save — one record once the fan-out as
+            # a whole has finished and results have merged back. The
+            # save also finalizes ``fan_out_progress`` (the projection
+            # at the save site captures every tracked instance's
+            # terminal state before the outer ``finally`` pops the
+            # entry). Per graph-engine §6 v0.16.1: the saved
+            # attempt_index reflects the wrapping retry MW's counter
+            # (sourced from ``deferred_info[0]`` which captured
+            # ``current_attempt_index()`` at the moment of the
+            # successful merge). Short-circuit case (middleware
+            # returned without invoking ``next``) records
+            # attempt_index=0.
+            info = deferred_info[0]
+            saved_attempt = info[0] if info is not None else 0
+            await self._maybe_save_checkpoint(
+                context,
+                node_name=current,
+                namespace=namespace,
+                step=step,
+                attempt_index=saved_attempt,
+                post_state=merged_outer,
+            )
 
-        if info is None:
-            return _StepResult(state=merged_outer, finalize_completed=_no_op_finalize)
-        final_attempt_index, final_pre_state, final_merged = info
+            if info is None:
+                return _StepResult(state=merged_outer, finalize_completed=_no_op_finalize)
+            final_attempt_index, final_pre_state, final_merged = info
 
-        def finalize_completed(edge_error: RuntimeGraphError | None) -> None:
-            if edge_error is None:
-                self._dispatch_completed(
-                    context,
-                    current,
-                    namespace,
-                    step,
-                    final_pre_state,
-                    post_state=final_merged,
-                    attempt_index=final_attempt_index,
-                    fan_out_config=fan_out_event_config,
-                )
-            else:
-                self._dispatch_completed(
-                    context,
-                    current,
-                    namespace,
-                    step,
-                    final_pre_state,
-                    error=edge_error,
-                    attempt_index=final_attempt_index,
-                    fan_out_config=fan_out_event_config,
-                )
+            def finalize_completed(edge_error: RuntimeGraphError | None) -> None:
+                if edge_error is None:
+                    self._dispatch_completed(
+                        context,
+                        current,
+                        namespace,
+                        step,
+                        final_pre_state,
+                        post_state=final_merged,
+                        attempt_index=final_attempt_index,
+                        fan_out_config=fan_out_event_config,
+                    )
+                else:
+                    self._dispatch_completed(
+                        context,
+                        current,
+                        namespace,
+                        step,
+                        final_pre_state,
+                        error=edge_error,
+                        attempt_index=final_attempt_index,
+                        fan_out_config=fan_out_event_config,
+                    )
 
-        return _StepResult(state=merged_outer, finalize_completed=finalize_completed)
+            return _StepResult(state=merged_outer, finalize_completed=finalize_completed)
+        finally:
+            context.fan_out_progress_state.pop(fan_out_progress_key, None)
 
     async def _step_parallel_branches_node(
         self,
@@ -1724,23 +1938,43 @@ async def _maybe_save_checkpoint(
         post_state: Any,
     ) -> None:
         """Fire a checkpoint save for the just-completed node, if a
-        backend is registered and we're not inside a fan-out instance.
+        backend is registered.
 
-        Per spec pipeline-utilities §10.3:
+        Per spec pipeline-utilities §10.3 (revised by proposal 0009 /
+        spec v0.18.0):
 
         - Save fires for outermost-graph nodes, subgraph-internal
-          nodes, AND the fan-out node's own completion (the parent
-          dispatch). All three have ``fan_out_index is None`` from
-          the context's perspective.
-        - Save does NOT fire for events from inside a fan-out
-          instance. The atomic-restart contract (§10.7) means
-          per-instance progress isn't recoverable in v1, so saving
-          inner-instance state is dead weight.
+          nodes, fan-out instance internal nodes, AND the fan-out
+          node's own completion (the parent dispatch).
+        - When the save fires from inside a fan-out instance
+          (``context.fan_out_index is not None``), the inner node's
+          position is recorded against the per-instance state on the
+          shared ``fan_out_progress_state`` rather than the outer
+          ``completed_positions`` list. The saved record's
+          ``fan_out_progress`` field projects this shared dict so
+          all concurrent instances' snapshots are captured atomically.
+
+        Atomicity contract (§10.11): the save-call site below
+        completes the "produce contribution + record into accumulator
+        + save" sequence the spec mandates. ``FanOutNode.run_with_context``
+        flips an instance's state to ``completed`` and stashes its
+        ``result`` BEFORE invoking the save that durably records the
+        transition. A crash between that state mutation and the save
+        below leaves the in-memory dict updated but the persisted
+        record showing ``in_flight``, so resume re-runs the instance
+        and the append/last_write_wins/merge reducer's exactly-once
+        guarantee per §10.11.1 holds.
+
+        Save also enumerates ALL concurrent fan-out instances when
+        building ``fan_out_progress`` (not just the one whose
+        ``completed`` event triggered this save) — the per-instance
+        snapshot is consistent across siblings, matching §10.11's
+        "captured when a sibling instance's ``completed`` event
+        triggers a save during this instance's execution" wording.
 
         After ``Checkpointer.save`` returns, dispatch a
         ``checkpoint_saved`` observer event (per §10.8 SHOULD-level
-        guidance) so observability backends — wired in Phase 6 — can
-        surface saves as spans.
+        guidance) so observability backends can surface saves as spans.
 
         Save failures raise ``CheckpointSaveFailed`` to the caller of
         ``invoke()`` immediately; saves are NOT retried by the engine.
@@ -1748,22 +1982,49 @@ async def _maybe_save_checkpoint(
         checkpointer = context.checkpointer
         if checkpointer is None:
             return
-        if context.fan_out_index is not None:
-            return
         # Per spec §10.2: NodePosition.namespace is the containing-
         # graph chain (outermost first), NOT including the node's
         # own name — distinct from NodeEvent.namespace which
         # includes it. The two are related by
         # NodeEvent.namespace == NodePosition.namespace +
         # (NodePosition.node_name,).
+        #
+        # Inner-position scoping (per §10.11.1, in-flight observability
+        # rules): a position from inside a fan-out instance is scoped
+        # to that instance's inner subgraph execution, NOT the outer
+        # graph. It accumulates on the per-instance state's
+        # ``completed_inner_positions`` list rather than the outer
+        # ``completed_positions`` list. The outer list keeps the outer
+        # graph's positions plus the fan-out node's own completion
+        # position (added by ``_step_fan_out_node`` after fan-in).
         position = NodePosition(
             namespace=context.namespace_prefix,
             node_name=node_name,
             step=step,
             attempt_index=attempt_index,
-            fan_out_index=None,
+            fan_out_index=context.fan_out_index,
         )
-        context.completed_positions.append(position)
+        if context.fan_out_index is not None:
+            # Locate the per-instance state for the innermost active
+            # fan-out (the one this node is running inside). The
+            # innermost fan-out's key has the longest namespace; the
+            # context's namespace_prefix at this depth is exactly that
+            # fan-out's full namespace prefix (namespace + name), so
+            # we walk the prefix back to find the matching key.
+            instance_state = _find_innermost_fan_out_instance_state(context)
+            if instance_state is not None:
+                instance_state.completed_inner_positions.append(position)
+        else:
+            context.completed_positions.append(position)
+        # Project the shared mutable per-fan-out tracking dict into the
+        # frozen ``FanOutProgress`` shape on the record. Per §10.11:
+        # enumerate every fan-out entry the engine has registered, not
+        # just the innermost one — concurrent fan-outs (nested or
+        # parallel) all contribute their state to the same save.
+        # Deterministic order: sort by (namespace, name) so two saves
+        # with identical state serialize identically (relevant for
+        # backends that hash records).
+        fan_out_progress = _project_fan_out_progress(context.fan_out_progress_state)
         record = CheckpointRecord(
             invocation_id=context.invocation_id,
             correlation_id=context.correlation_id,
@@ -1791,9 +2052,19 @@ async def _maybe_save_checkpoint(
             # (subclass schema_versions don't shadow the declared
             # graph schema).
             schema_version=self.state_cls.schema_version,
+            fan_out_progress=fan_out_progress,
         )
+        # Per §10.11.4: batching applies ONLY to fan-out instance
+        # internal saves. Outer-graph + subgraph-internal +
+        # fan-out-node-completion saves remain synchronous.
+        # ``checkpointer.save`` is invoked via the batching helper
+        # which falls back to direct ``save`` for non-fan-out-internal
+        # events even on a batching-enabled backend.
         try:
-            await checkpointer.save(context.invocation_id, record)
+            if context.fan_out_index is not None:
+                await _save_fan_out_internal(checkpointer, context.invocation_id, record)
+            else:
+                await checkpointer.save(context.invocation_id, record)
         except Exception as exc:
             raise CheckpointSaveFailed(context.invocation_id, exc) from exc
         # §10.8: dispatch a ``checkpoint_saved`` observer event so
diff --git a/src/openarmature/graph/fan_out.py b/src/openarmature/graph/fan_out.py
index 05ceb13..9422384 100644
--- a/src/openarmature/graph/fan_out.py
+++ b/src/openarmature/graph/fan_out.py
@@ -31,6 +31,7 @@
 from __future__ import annotations
 
 import asyncio
+import time
 from collections.abc import Callable, Mapping, Sequence
 from dataclasses import dataclass, field
 from typing import TYPE_CHECKING, Any, Literal, cast
@@ -42,6 +43,7 @@
     NodeException,
 )
 from .middleware import ChainCall, Middleware, compose_chain
+from .observer import _FanOutExecutionState, _FanOutInstanceState
 from .state import State
 
 if TYPE_CHECKING:
@@ -126,6 +128,18 @@ async def run_with_context(
         fan-in collected/extra fields, write count_field and
         errors_field if configured.
 
+        Per proposal 0009 / §10.11 per-instance resume contract: this
+        method registers a per-fan-out tracking entry on the shared
+        ``context.fan_out_progress_state`` dict before dispatching,
+        flips each instance's state through
+        ``not_started -> in_flight -> completed`` as the instance
+        progresses, and fires an explicit "instance completed" save
+        after the per-instance contribution has been recorded into
+        the accumulator. The atomicity contract from §10.11 is
+        observed: the per-instance state mutation precedes the save,
+        so a crash after mutation but before save leaves the saved
+        record showing ``in_flight`` (resume re-runs the instance).
+
         ``pre_resolved_count`` / ``pre_resolved_concurrency`` are the
         proposal-0013 v0.10.0 hooks: when the engine has already
         resolved the config eagerly to populate
@@ -152,10 +166,93 @@ async def run_with_context(
         else:
             max_concurrency = _resolve_concurrency(self.name, cfg, state)
 
+        # Register / reuse the per-fan-out tracking entry on the
+        # shared dict. Resume threads a pre-restored entry through
+        # ``context.fan_out_progress_state``; first-run constructs a
+        # fresh one with all instances ``not_started``.
+        key = (context.namespace_prefix, self.name)
+        exec_state = context.fan_out_progress_state.get(key)
+        if exec_state is None:
+            exec_state = _FanOutExecutionState(
+                fan_out_node_name=self.name,
+                namespace=context.namespace_prefix,
+                instance_count=instance_count,
+                instances=[_FanOutInstanceState() for _ in range(instance_count)],
+            )
+            context.fan_out_progress_state[key] = exec_state
+        else:
+            # Defensive: instance_count may have changed between runs if
+            # the items_field/count resolver returns a different value
+            # on resume. Trust the current count — pad or truncate the
+            # tracked instances to match. In practice users either
+            # configure deterministic counts or change them between
+            # runs deliberately; spec §10.5 idempotency says the work
+            # is the same on resume but is silent on count drift.
+            if len(exec_state.instances) < instance_count:
+                exec_state.instances.extend(
+                    _FanOutInstanceState() for _ in range(instance_count - len(exec_state.instances))
+                )
+            elif len(exec_state.instances) > instance_count:
+                del exec_state.instances[instance_count:]
+            exec_state.instance_count = instance_count
+
+        # Shared cancel signal for the fail_fast path. Defined here (not
+        # inside the fail_fast branch below) so ``run_instance`` can
+        # check it AFTER semaphore acquisition but BEFORE mutating
+        # tracked state — closes a race where a semaphore-blocked
+        # sibling would flip its tracked state to ``in_flight`` after a
+        # sibling failed and set the signal. In collect mode the signal
+        # is never set, so the check inside ``run_instance`` is a
+        # no-op there.
+        cancel_signal = asyncio.Event()
+
         # Per-instance task: build the instance_middleware chain, run
         # the subgraph against it, and return the per-instance partial
         # (collect_field + extra_outputs).
+        #
+        # Resume gating: an instance whose tracked state is
+        # ``completed`` is skipped (its result rolls forward from the
+        # accumulator entry). Instances tracked as ``in_flight`` or
+        # ``not_started`` dispatch normally with fresh per-instance
+        # state — per §10.7 the inner subgraph re-enters at its
+        # declared entry, not at any of the ``completed_inner_positions``
+        # captured in the prior run.
         async def run_instance(idx: int, instance_state: ChildT) -> Mapping[str, Any]:
+            tracked = exec_state.instances[idx]
+            if tracked.state == "completed":
+                if tracked.result_is_error:
+                    # Per §10.11.2 collect-mode resume: an error
+                    # contribution rolls forward through the
+                    # ``errors_field`` bucket, not ``target_field``.
+                    # Raise a categorized exception so the outer
+                    # gather captures it and ``_fan_in_collect``
+                    # routes it through error_records (with the same
+                    # ``category`` the original failure carried).
+                    raise _RolledForwardError(category=_extract_error_category(tracked.result))
+                # Roll the success contribution forward verbatim.
+                return _rolled_forward_partial(cfg, tracked)
+
+            # Cancel-signal check AFTER the resume rollforward branch
+            # but BEFORE the first tracked-state mutation. Covers the
+            # race where a sibling failed (setting the signal) while
+            # this task was blocked on the bounded-concurrency
+            # semaphore inside ``gated_run``; without this check, the
+            # task would acquire the semaphore and flip ``tracked.state``
+            # to ``in_flight``, contradicting §10.11.2's
+            # "not-yet-dispatched siblings end up not_started" contract.
+            if cancel_signal.is_set():
+                raise asyncio.CancelledError()
+
+            # Flip to in_flight BEFORE dispatching so a sibling-
+            # triggered save during this instance's execution observes
+            # the correct state. Reset completed_inner_positions to
+            # ensure resume re-execution doesn't accumulate against
+            # the prior run's prefix.
+            tracked.state = "in_flight"
+            tracked.completed_inner_positions.clear()
+            tracked.result = None
+            tracked.extra_outputs = {}
+
             child_context = context.descend_into_fan_out_instance(
                 fan_out_node_name=self.name,
                 parent_state=state,
@@ -171,34 +268,165 @@ async def innermost(s: ChildT) -> Mapping[str, Any]:
                 return _extract_instance_partial(cfg, final_inst_state)
 
             chain: ChainCall = compose_chain(cfg.instance_middleware, innermost)
-            return await chain(instance_state)
+            try:
+                partial = await chain(instance_state)
+            except Exception as exc:
+                if cfg.error_policy == "collect":
+                    # Per §10.11.2 collect mode: the failure becomes a
+                    # ``completed`` contribution with the error record
+                    # as ``result``. Mutate state BEFORE saving so the
+                    # save durably reflects the completion (atomicity
+                    # contract per §10.11). The re-raise hands the
+                    # exception back to the outer gather so the
+                    # ``_fan_in_collect`` path builds the parent
+                    # ``errors_field`` from raw_results.
+                    error_record: dict[str, str] = {
+                        "fan_out_index": str(idx),
+                        "category": getattr(exc, "category", type(exc).__name__),
+                    }
+                    tracked.result = error_record
+                    tracked.result_is_error = True
+                    tracked.extra_outputs = {}
+                    tracked.state = "completed"
+                    await _save_instance_completed(state, context)
+                    raise
+                # Per §10.11 in_flight observability under fail_fast:
+                # if no sibling completion fired a save during this
+                # instance's execution (the serial-execution +
+                # first-node-fails case), the saved record would not
+                # otherwise reflect this instance's in_flight
+                # transition. Fire an explicit "instance failed" save
+                # so the per-instance in_flight observation reaches
+                # the saved record. Tracked state stays ``in_flight``
+                # (no accumulator write happens on failure under
+                # fail_fast) per §10.11.2. Re-raise after the save so
+                # the fail_fast cancellation path stays intact.
+                await _save_instance_in_flight(state, context)
+                raise
+
+            # Atomicity contract (§10.11): produce contribution -> record
+            # into accumulator -> save. The accumulator update below
+            # happens BEFORE the explicit "instance completed" save so a
+            # crash between accumulator write and save leaves the saved
+            # record showing ``in_flight`` and resume re-runs the
+            # instance. The ``append`` reducer's no-double-merge guarantee
+            # (§10.11.1) depends on this ordering.
+            tracked.result = partial.get(cfg.collect_field)
+            tracked.result_is_error = False
+            tracked.extra_outputs = {
+                parent_field: partial[parent_field]
+                for parent_field in cfg.extra_outputs
+                if parent_field in partial
+            }
+            tracked.state = "completed"
+
+            # Fire an explicit "instance completed" save so the saved
+            # record durably reflects the completed state. Without this
+            # save, only the terminal inner node's intrinsic save fires
+            # (which executed BEFORE the accumulator mutation above and
+            # therefore showed the instance as ``in_flight``). The
+            # explicit save closes the atomicity gap. Routed through
+            # the fan-out-internal batching seam per §10.11.4.
+            await _save_instance_completed(state, context)
+
+            return partial
 
         gated_run = _bounded_runner(run_instance, max_concurrency)
 
         if cfg.error_policy == "fail_fast":
-            tasks = [gated_run(idx, st) for idx, st in enumerate(instance_states)]
+            # ``cancel_signal`` is defined above (before ``run_instance``)
+            # so the in-instance check can read it. The wrapper below
+            # adds a fast-path check before semaphore acquisition for
+            # tasks that haven't entered ``run_instance`` yet; the
+            # in-instance check covers the race after semaphore
+            # acquisition. The explicit signal closes the
+            # bounded-concurrency-semaphore cancellation gap that
+            # ``asyncio.gather`` / ``asyncio.wait`` don't enforce
+            # strongly enough on their own.
+
+            async def signaled_run(idx: int, st: Any) -> Mapping[str, Any]:
+                # Check before any work — if a sibling already failed,
+                # exit immediately so this instance's tracked state
+                # stays at its default not_started.
+                if cancel_signal.is_set():
+                    raise asyncio.CancelledError()
+                try:
+                    return await gated_run(idx, st)
+                except Exception:
+                    # Set the signal so siblings about to run see it
+                    # before they enter run_instance and mutate
+                    # tracked state. The first task to raise wins.
+                    cancel_signal.set()
+                    raise
+
+            tasks: list[tuple[int, asyncio.Task[Mapping[str, Any]]]] = [
+                (idx, asyncio.create_task(signaled_run(idx, st))) for idx, st in enumerate(instance_states)
+            ]
             try:
-                results = await asyncio.gather(*tasks)
-            except Exception as exc:
-                # Per spec §9.5: the propagated exception is the
-                # offending instance's, wrapped in a node_exception
-                # with recoverable_state set to the parent's pre-fan-out
-                # snapshot. Sibling cancellations are infrastructure
-                # (asyncio.gather already cancelled them) and don't
-                # produce additional node_exception per cancelled
-                # instance.
-                raise NodeException(
-                    node_name=self.name,
-                    cause=exc,
-                    recoverable_state=state,
-                ) from exc
-            return _fan_in_fail_fast(cfg, results)
+                await asyncio.wait(
+                    [t for _, t in tasks],
+                    return_when=asyncio.FIRST_EXCEPTION,
+                )
+            except BaseException:
+                for _, t in tasks:
+                    t.cancel()
+                await asyncio.gather(*(t for _, t in tasks), return_exceptions=True)
+                raise
+
+            # Iterate all completed-not-cancelled tasks to retrieve
+            # each exception via ``t.exception()`` (otherwise asyncio
+            # warns "Task exception was never retrieved" on GC for any
+            # task that failed before fail_fast cancelled its
+            # siblings). ``failed_cause`` still captures only the
+            # FIRST real exception — NodeException's ``cause`` should
+            # be the originating instance's error, not a later
+            # sibling's. CancelledErrors are siblings we cancelled, so
+            # ignore them.
+            failed_cause: BaseException | None = None
+            for _, t in tasks:
+                if t.done() and not t.cancelled():
+                    exc = t.exception()
+                    if exc is not None and not isinstance(exc, asyncio.CancelledError):
+                        if failed_cause is None:
+                            failed_cause = exc
+
+            # Cancel any still-pending tasks; drain to absorb
+            # CancelledError so it doesn't propagate as unhandled.
+            for _, t in tasks:
+                if not t.done():
+                    t.cancel()
+            await asyncio.gather(*(t for _, t in tasks if not t.done()), return_exceptions=True)
+
+            if failed_cause is None:
+                # All tasks finished without raising. Collect results
+                # in instance-index order and fan-in.
+                results = [t.result() for _, t in tasks]
+                return _fan_in_fail_fast(cfg, results)
+
+            # Per spec §9.5: the propagated exception is the offending
+            # instance's, wrapped in a node_exception with
+            # recoverable_state set to the parent's pre-fan-out
+            # snapshot. Per §10.11.2 the failed instance's tracked
+            # state is ``in_flight`` (no accumulator entry was
+            # recorded because the contribution -> mutation -> save
+            # sequence raised before the mutation; resume re-runs).
+            raise NodeException(
+                node_name=self.name,
+                cause=failed_cause,
+                recoverable_state=state,
+            ) from failed_cause
 
         # collect — run all instances; capture per-instance exceptions
-        # rather than propagate.
-        tasks = [gated_run(idx, st) for idx, st in enumerate(instance_states)]
-        raw = await asyncio.gather(*tasks, return_exceptions=True)
-        return _fan_in_collect(cfg, raw, instance_count)
+        # rather than propagate. Per §10.11.2 a collect-mode failure
+        # produces a ``completed`` instance whose ``result`` is the
+        # error record contributed to ``errors_field``. Per-instance
+        # promotion happens inside ``run_instance`` so the
+        # ``completed`` save fires before sibling instances dispatch
+        # (the §10.11 atomicity contract still holds and the abort_-
+        # after_instance harness directive sees the right state).
+        collect_tasks = [gated_run(idx, st) for idx, st in enumerate(instance_states)]
+        raw_results = await asyncio.gather(*collect_tasks, return_exceptions=True)
+        return _fan_in_collect(cfg, raw_results, instance_count)
 
 
 # ---------------------------------------------------------------------------
@@ -333,6 +561,188 @@ def _extract_instance_partial(cfg: FanOutConfig, final_state: Any) -> Mapping[st
     return partial
 
 
+class _RolledForwardError(Exception):
+    """Exception raised by ``run_instance`` to signal that a
+    collect-mode resume is rolling forward a recorded error
+    contribution. Carries the original failure's ``category`` so
+    the resumed fan-in path can record an error entry with the
+    same category that the prior run produced. Internal — never
+    propagates out of the fan-out's run_with_context.
+    """
+
+    def __init__(self, *, category: str) -> None:
+        super().__init__(f"rolled-forward error ({category})")
+        self.category = category
+
+
+def _extract_error_category(error_record: Any) -> str:
+    """Pull the ``category`` field from an error_record dict the engine
+    stored as a tracked instance's ``result``. Falls back to
+    ``node_exception`` when the field isn't present (defensive — the
+    engine always sets ``category`` per ``_fan_in_collect``)."""
+    if isinstance(error_record, dict):
+        result_dict = cast("dict[str, Any]", error_record)
+        category = result_dict.get("category", "node_exception")
+        if isinstance(category, str):
+            return category
+    return "node_exception"
+
+
+def _rolled_forward_partial(cfg: FanOutConfig, tracked: _FanOutInstanceState) -> Mapping[str, Any]:
+    """Reconstruct the per-instance partial for a ``completed`` instance
+    being skipped on resume. The accumulator entry rolls forward
+    verbatim — same shape as :func:`_extract_instance_partial` would
+    have produced on the original run, sourced from the per-instance
+    tracked state instead of a freshly-computed inner state."""
+    partial: dict[str, Any] = {cfg.collect_field: tracked.result}
+    for parent_field in cfg.extra_outputs:
+        if parent_field in tracked.extra_outputs:
+            partial[parent_field] = tracked.extra_outputs[parent_field]
+    return partial
+
+
+async def _save_instance_in_flight(
+    parent_state: Any,
+    context: _InvocationContext,
+) -> None:
+    """Fire an explicit save when an instance fails before any sibling
+    triggered a save during its execution. Without this save, the
+    instance's in_flight transition would not be observable on the
+    saved record under serial execution: no sibling completion fires
+    during a serial instance's run, and the instance's own inner-node
+    save only fires on successful merge (failure path skips it).
+
+    Routes through the checkpointer's ``save_fan_out_in_flight_failure``
+    seam (when present) per §10.11.4. Batching backends typically
+    buffer this save WITHOUT triggering a flush — the "crash" the
+    failure represents would lose the buffer, including this save,
+    in a real-world scenario. Non-batching backends route it through
+    the synchronous ``save`` path so the in_flight observability of
+    fixture 048 holds.
+    """
+    from openarmature.checkpoint.errors import CheckpointSaveFailed  # noqa: PLC0415
+    from openarmature.checkpoint.protocol import CheckpointRecord  # noqa: PLC0415
+
+    from .compiled import (  # noqa: PLC0415
+        _project_fan_out_progress,
+        _save_fan_out_in_flight_failure,
+    )
+
+    checkpointer = context.checkpointer
+    if checkpointer is None:
+        return
+    fan_out_progress = _project_fan_out_progress(context.fan_out_progress_state)
+    # Per spec §10.2: ``schema_version`` is the OUTERMOST graph state's
+    # version — the record represents the whole invocation tree. For a
+    # nested fan-out (a fan-out inside a subgraph), ``parent_state`` is
+    # the subgraph's state, not the outermost; read from
+    # ``context.parent_states_prefix[0]`` (the outermost state lives at
+    # index 0 of the parent chain) when non-empty, else from
+    # ``parent_state`` directly (which IS the outermost state for an
+    # outermost fan-out).
+    if context.parent_states_prefix:
+        outermost_cls = cast("type[Any]", type(context.parent_states_prefix[0]))
+    else:
+        outermost_cls = cast("type[Any]", type(parent_state))
+    schema_version = cast("str", getattr(outermost_cls, "schema_version", ""))
+    record = CheckpointRecord(
+        invocation_id=context.invocation_id,
+        correlation_id=context.correlation_id,
+        state=parent_state,
+        completed_positions=tuple(context.completed_positions),
+        parent_states=context.parent_states_prefix,
+        last_saved_at=time.time(),
+        schema_version=schema_version,
+        fan_out_progress=fan_out_progress,
+    )
+    try:
+        await _save_fan_out_in_flight_failure(checkpointer, context.invocation_id, record)
+    except Exception as exc:
+        raise CheckpointSaveFailed(context.invocation_id, exc) from exc
+
+
+async def _save_instance_completed(
+    parent_state: Any,
+    context: _InvocationContext,
+) -> None:
+    """Fire the explicit "instance completed" save closing the §10.11
+    atomicity gap. The per-instance state has already been flipped to
+    ``completed`` with ``result`` populated; this save durably records
+    that transition so resume can skip the instance.
+
+    Routed through the fan-out-internal batching seam per §10.11.4 —
+    backends opting into batching may buffer the save; non-batching
+    backends call ``save`` directly. On crash with buffered-but-
+    unflushed saves, the instance reverts to ``in_flight`` /
+    ``not_started`` on resume and re-runs (contributing for the first
+    time, no double-merge per §10.11.1).
+    """
+    # Lazy imports: ``compiled`` and ``checkpoint.protocol`` would
+    # create textual cycles at module-load. Function-scope keeps the
+    # import cheap (cached after first call) and the cycle off the
+    # static analyzer's graph.
+    from openarmature.checkpoint.errors import CheckpointSaveFailed  # noqa: PLC0415
+    from openarmature.checkpoint.protocol import CheckpointRecord  # noqa: PLC0415
+
+    from .compiled import (  # noqa: PLC0415
+        _project_fan_out_progress,
+        _save_fan_out_internal,
+    )
+
+    checkpointer = context.checkpointer
+    if checkpointer is None:
+        return
+    # The "instance completed" save records the post-merge outer state
+    # via ``parent_state`` (the snapshot of outer state at fan-out
+    # dispatch time). ``parent_states`` carries any enclosing subgraph
+    # chain. This save shape mirrors a top-level "outer node completed"
+    # save: ``state`` = outer; ``parent_states`` = enclosing chain
+    # (empty for outermost fan-outs). The inner-node saves fired during
+    # the instance's execution have a different shape (state = inner,
+    # parent_states includes outer) — both shapes are valid checkpoint
+    # records and the resume path handles either based on
+    # ``parent_states`` length.
+    fan_out_progress = _project_fan_out_progress(context.fan_out_progress_state)
+    # Per spec §10.2: ``schema_version`` is the OUTERMOST graph state's
+    # version (the record represents the whole invocation tree). For a
+    # nested fan-out (a fan-out inside a subgraph), ``parent_state`` is
+    # the subgraph's state, not the outermost — read from
+    # ``context.parent_states_prefix[0]`` (the outermost state lives at
+    # index 0 of the parent chain) when non-empty, else from
+    # ``parent_state`` directly (which IS the outermost state for an
+    # outermost fan-out). Mirrors ``_maybe_save_checkpoint``'s
+    # ``self.state_cls.schema_version`` read in ``compiled.py``.
+    if context.parent_states_prefix:
+        outermost_cls = cast("type[Any]", type(context.parent_states_prefix[0]))
+    else:
+        outermost_cls = cast("type[Any]", type(parent_state))
+    schema_version = cast("str", getattr(outermost_cls, "schema_version", ""))
+    record = CheckpointRecord(
+        invocation_id=context.invocation_id,
+        correlation_id=context.correlation_id,
+        state=parent_state,
+        completed_positions=tuple(context.completed_positions),
+        parent_states=context.parent_states_prefix,
+        last_saved_at=time.time(),
+        schema_version=schema_version,
+        fan_out_progress=fan_out_progress,
+    )
+    try:
+        await _save_fan_out_internal(checkpointer, context.invocation_id, record)
+    except Exception as exc:
+        raise CheckpointSaveFailed(context.invocation_id, exc) from exc
+    # Per §10.8: the explicit "instance completed" save is a save like
+    # any other and SHOULD emit a ``checkpoint_saved`` observer event.
+    # However the engine's primary save call site
+    # (``_maybe_save_checkpoint``) already dispatches the event for
+    # every save it owns, and the explicit save here is conceptually
+    # part of the same save-point: the inner node's intrinsic save
+    # already fired ``checkpoint_saved`` for this fan-out instance's
+    # progress. Adding a second event would double-count for backends
+    # that surface them as spans. Suppress to keep the event stream
+    # node-aligned.
+
+
 def _fan_in_fail_fast(
     cfg: FanOutConfig,
     results: Sequence[Mapping[str, Any]],
diff --git a/src/openarmature/graph/observer.py b/src/openarmature/graph/observer.py
index f139695..44c7917 100644
--- a/src/openarmature/graph/observer.py
+++ b/src/openarmature/graph/observer.py
@@ -32,7 +32,7 @@
 import warnings
 from collections.abc import Iterable
 from dataclasses import dataclass, field
-from typing import Any, Protocol
+from typing import Any, Literal, Protocol
 
 from .events import NodeEvent
 from .state import State
@@ -212,6 +212,69 @@ class _QueuedItem:
 _DRAIN_SENTINEL = None
 
 
+# Spec: realizes pipeline-utilities §10.11 per-instance progress
+# tracking in the engine. These are the MUTABLE internal-state
+# counterparts to the FROZEN public ``FanOutProgress`` /
+# ``FanOutInstanceProgress`` shapes the saved CheckpointRecord exposes.
+# ``_maybe_save_checkpoint`` projects this mutable state into the
+# frozen public shape when building a record.
+@dataclass
+class _FanOutInstanceState:
+    """Mutable per-instance state inside a fan-out, updated by the
+    engine as the instance progresses. ``state`` transitions
+    not_started -> in_flight -> completed.
+
+    - ``result`` holds the per-instance contribution to the fan-out
+      accumulator, set when ``state == "completed"``. Per spec
+      §10.11 this is "the value contributed to the ``target_field``
+      bucket" (success path) or "the error entry contributed to the
+      ``errors_field`` bucket" (collect-mode failure). The harness
+      projects this into the frozen ``FanOutInstanceProgress.result``
+      verbatim.
+    - ``result_is_error`` distinguishes success contributions
+      (``False``) from collect-mode error contributions (``True``).
+      Internal flag — not exposed on the public
+      ``FanOutInstanceProgress`` shape because the spec presents
+      ``result`` as a single typed entry per the parent state schema.
+      ``FanOutNode.run_with_context`` consults this on resume to
+      route the rolled-forward contribution through the
+      ``errors_field`` bucket rather than ``target_field``.
+    - ``extra_outputs`` holds the per-instance values for the fan-out's
+      ``extra_outputs`` mapping (parent-field -> sub-field) so that
+      per-instance resume preserves the FULL per-instance contribution
+      (not just the ``target_field`` slice). Internal — not exposed on
+      the public ``FanOutInstanceProgress`` shape because the spec
+      describes ``result`` as a single accumulator entry.
+    - ``completed_inner_positions`` accumulates ``NodePosition`` entries
+      from inner nodes that complete inside this instance's subgraph
+      execution. Captures the instance's progress for observational
+      purposes when an in_flight save snapshot fires; not used as a
+      resume re-entry point (the instance re-enters at its subgraph's
+      declared entry node per §10.7).
+    """
+
+    state: Literal["completed", "in_flight", "not_started"] = "not_started"
+    result: Any = None
+    result_is_error: bool = False
+    extra_outputs: dict[str, Any] = field(default_factory=dict[str, Any])
+    completed_inner_positions: list[Any] = field(default_factory=list[Any])  # list[NodePosition]
+
+
+@dataclass
+class _FanOutExecutionState:
+    """Mutable per-fan-out execution state. One entry per in-flight
+    fan-out node in the invocation; lives on
+    ``_InvocationContext.fan_out_progress_state`` keyed by
+    ``(namespace, fan_out_node_name)``. The namespace component
+    disambiguates same-named fan-outs in different subgraph descents.
+    """
+
+    fan_out_node_name: str
+    namespace: tuple[str, ...]
+    instance_count: int
+    instances: list[_FanOutInstanceState]
+
+
 @dataclass
 class _InvocationContext:
     """Per-invocation state threaded through the engine and into subgraphs.
@@ -279,6 +342,17 @@ class _InvocationContext:
     # descents at the same depth project as usual. Shared mutable dict
     # propagates across descents.
     pending_resume_states: dict[int, Any] = field(default_factory=dict[int, Any])
+    # Per spec §10.11: mutable per-fan-out progress tracking. Keyed by
+    # ``(namespace, fan_out_node_name)`` — disambiguates same-named
+    # fan-outs in different subgraph descents. ``FanOutNode`` populates
+    # entries before descending into instances; updates state as
+    # instances progress; the entry stays in the dict for the duration
+    # of the fan-out so concurrent saves see consistent sibling state.
+    # ``_maybe_save_checkpoint`` projects this into the frozen
+    # ``FanOutProgress`` shape on the saved CheckpointRecord.
+    fan_out_progress_state: dict[tuple[tuple[str, ...], str], _FanOutExecutionState] = field(
+        default_factory=dict[tuple[tuple[str, ...], str], _FanOutExecutionState]
+    )
 
     def full_observers(self) -> tuple[SubscribedObserver, ...]:
         """Return the ordered observer list to deliver for events from
@@ -320,6 +394,7 @@ def descend_into_subgraph(
             resume_skip_set=self.resume_skip_set,
             pending_resume_states=self.pending_resume_states,
             resume_invocation=self.resume_invocation,
+            fan_out_progress_state=self.fan_out_progress_state,
         )
 
     def descend_into_fan_out_instance(
@@ -335,15 +410,16 @@ def descend_into_fan_out_instance(
         index onto the new context so every inner-node event carries it.
         Per spec §9 the index is the instance's 0-based position.
 
-        Per pipeline-utilities §10.3 / §10.7: fan-out instance internal
-        events do NOT produce checkpoint saves in v1. We achieve that
-        by clearing ``checkpointer`` to None on the descent so the
-        save gate inside the inner _step_function_node is False; the
-        rest of the checkpoint context (invocation_id, correlation_id,
-        etc.) still propagates so observability spans inside the
-        instance can correlate. ``resume_skip_set`` is also dropped:
-        a resumed invocation re-runs the entire fan-out from scratch
-        per §10.7 atomic-restart.
+        Per pipeline-utilities §10.3 (revised by proposal 0009): fan-out
+        instance internal nodes DO produce checkpoint saves. The
+        checkpointer reference propagates unchanged so an inner node's
+        ``completed`` event triggers a save; the engine's save path
+        projects the shared ``fan_out_progress_state`` into the record's
+        per-instance progress field. ``resume_skip_set`` is dropped:
+        inner-position skipping is governed by the per-instance
+        ``completed_inner_positions`` field on the loaded record's
+        ``fan_out_progress`` entry, not by the outer skip-set (which
+        would conflate inner and outer positions otherwise).
         """
         return _InvocationContext(
             queue=self.queue,
@@ -355,13 +431,15 @@ def descend_into_fan_out_instance(
             fan_out_index=fan_out_index,
             invocation_id=self.invocation_id,
             correlation_id=self.correlation_id,
-            checkpointer=None,
+            checkpointer=self.checkpointer,
             completed_positions=self.completed_positions,
             resume_skip_set=frozenset(),
-            # Fan-out instances are atomic-restart per §10.7 — no
-            # saved inner state to thread in. Drop the map.
             pending_resume_states={},
             resume_invocation=self.resume_invocation,
+            # Propagate the shared per-fan-out tracking dict so an
+            # inner-instance node can update its own entry and so the
+            # outer save sees consistent sibling state.
+            fan_out_progress_state=self.fan_out_progress_state,
         )
 
     def descend_into_parallel_branch(
@@ -406,6 +484,7 @@ def descend_into_parallel_branch(
             resume_skip_set=frozenset(),
             pending_resume_states={},
             resume_invocation=self.resume_invocation,
+            fan_out_progress_state=self.fan_out_progress_state,
         )
 
     def take_step(self) -> int:
@@ -542,6 +621,8 @@ async def deliver_loop(queue: asyncio.Queue[_QueuedItem | None]) -> None:
     # imported by `compiled.py` and `subgraph.py`). The underscore prefix
     # is the user-facing "don't import these" signal.
     "_DRAIN_SENTINEL",
+    "_FanOutExecutionState",
+    "_FanOutInstanceState",
     "_InvocationContext",
     "_QueuedItem",
     "_coerce_subscribed",
diff --git a/tests/conformance/adapter.py b/tests/conformance/adapter.py
index e5c1ba0..847bb1c 100644
--- a/tests/conformance/adapter.py
+++ b/tests/conformance/adapter.py
@@ -67,6 +67,13 @@ def _parse_type(s: str) -> Any:
         return dict[str, Any]
     if s == "list<dict>":
         return list[dict[str, Any]]
+    # proposal-0009 fixture 052: ``error_entry`` is the spec's shorthand
+    # for the per-instance error record contributed to ``errors_field``
+    # under collect mode. The exact shape is implementation-defined per
+    # §9.5; the engine ships dict[str, str] with at least
+    # ``fan_out_index`` and ``category`` keys.
+    if s == "list<error_entry>":
+        return list[dict[str, str]]
     if s.startswith("list<") and s.endswith(">"):
         return list[_parse_type(s[5:-1])]
     if s.startswith("dict<") and s.endswith(">"):
@@ -171,14 +178,40 @@ def _make_pure_update_fn(
     update: Mapping[str, Any],
     trace: list[str],
 ) -> Callable[[Any], Awaitable[Mapping[str, Any]]]:
-    """`update_pure` test seam — same as `update` but explicitly tagged as
-    state-independent. Used by fan-out fixtures whose worker subgraphs
-    apply a fixed update that doesn't depend on the input state."""
+    """`update_pure` test seam — applies a fixed update.
+
+    Two shapes coexist across the spec fixtures:
+
+    - Literal values (e.g. ``update_pure: {a_ran: true, count: 0}``)
+      — most common, the snapshot is the partial verbatim.
+    - Field references (e.g. fixture 050 ``update_pure: {stage1: input}``)
+      — when a value is a string AND the state has a field of that
+      name, treat the string as a field-name reference and resolve
+      to ``state.<input>`` at call time. This handles fixtures that
+      use ``update_pure`` to copy one inner field to another without
+      a ``multiplier`` (which would route through ``update_from_field``).
+
+    The disambiguation is deliberately lax: a literal-string update
+    (e.g. ``update_pure: {label: "foo"}``) accidentally matching a
+    state field name would resolve incorrectly. Real fixtures don't
+    exercise this overlap; if a future fixture needs both shapes
+    disambiguated, prefer ``update_pure_from_state`` for the
+    field-reference case and keep ``update_pure`` strictly literal.
+    """
     snapshot = dict(update)
 
-    async def fn(_state: Any) -> Mapping[str, Any]:
+    async def fn(state: Any) -> Mapping[str, Any]:
         trace.append(node_name)
-        return copy.deepcopy(snapshot)
+        resolved: dict[str, Any] = {}
+        state_cls = cast("type[Any]", type(state))
+        model_fields = cast("dict[str, Any]", getattr(state_cls, "model_fields", {}))
+        state_field_names = set(model_fields.keys())
+        for k, v in snapshot.items():
+            if isinstance(v, str) and v in state_field_names:
+                resolved[k] = getattr(state, v)
+            else:
+                resolved[k] = copy.deepcopy(v)
+        return resolved
 
     return fn
 
@@ -309,6 +342,89 @@ async def fn(state: Any) -> Mapping[str, Any]:
     return fn
 
 
+def _make_flaky_per_index_fn(
+    node_name: str,
+    cfg: Mapping[str, Any],
+    trace: list[str],
+    *,
+    instance_attempt_recorder: dict[int, list[int]] | None = None,
+) -> Callable[[Any], Awaitable[Mapping[str, Any]]]:
+    """Build a flaky-per-index node body. Two failure-injection shapes
+    (per proposal-0009 fixture set 048-054):
+
+    - ``fail_first_run_indices: [int, ...]`` — instances with these
+      indices fail on the FIRST CALL EVER (the first-run path); all
+      subsequent calls (resume) succeed. The closure tracks "have I
+      ever failed" via a shared flag that flips on the first raise.
+
+    - ``always_fail_indices: [int, ...]`` — instances with these
+      indices fail on EVERY call. Used by collect-mode fixtures (052)
+      where the failure becomes an error contribution on the saved
+      record and rolls forward verbatim on resume.
+
+    Both forms share ``success_compute`` for the success-path state
+    update.
+
+    Reads ``current_fan_out_index()`` to determine which fan-out
+    instance is currently executing. Returns the success_compute output
+    for non-failing indices.
+
+    ``instance_attempt_recorder`` (optional): when supplied, the closure
+    appends each call's ``current_attempt_index()`` to
+    ``instance_attempt_recorder[idx]`` so the test driver can later
+    assert per-instance retry-count expectations
+    (``instance_N_attempt_index_on_resume`` /
+    ``instance_N_resume_attempt_count`` directives).
+    """
+    from openarmature.observability.correlation import (  # noqa: PLC0415
+        current_attempt_index,
+        current_fan_out_index,
+    )
+
+    fail_first_run_indices = set(cfg.get("fail_first_run_indices") or [])
+    always_fail_indices = set(cfg.get("always_fail_indices") or [])
+    success_compute = dict(cfg.get("success_compute", {}))
+    # Per-index tracking of which ``fail_first_run_indices`` instances
+    # have already failed once. The earlier single-flag shape failed
+    # only the first index in dispatch order when the list named
+    # multiple indices; per-index tracking matches the directive's
+    # "fail on FIRST CALL EVER" wording.
+    already_failed_indices: set[int] = set()
+    traced = [False]
+
+    async def fn(state: Any) -> Mapping[str, Any]:
+        if not traced[0]:
+            trace.append(node_name)
+            traced[0] = True
+        idx = current_fan_out_index()
+        if idx is None:
+            # Defensive — flaky_per_index only makes sense inside a
+            # fan-out instance. Surface as a categorized failure so
+            # mismatched fixture wiring is loud rather than silent.
+            raise _CategorizedException(
+                message=f"flaky_per_index({node_name}) called outside a fan-out instance",
+                category="node_exception",
+            )
+        if instance_attempt_recorder is not None:
+            instance_attempt_recorder.setdefault(idx, []).append(current_attempt_index())
+        if idx in always_fail_indices:
+            raise _CategorizedException(
+                message=f"flaky_per_index({node_name}) always-fail at idx={idx}",
+                category="node_exception",
+            )
+        if idx in fail_first_run_indices and idx not in already_failed_indices:
+            already_failed_indices.add(idx)
+            raise _CategorizedException(
+                message=f"flaky_per_index({node_name}) first-run failure at idx={idx}",
+                category="node_exception",
+            )
+        if success_compute:
+            return _resolve_success_compute(success_compute, state)
+        return {}
+
+    return fn
+
+
 def _make_flaky_fn(
     node_name: str,
     flaky: Mapping[str, Any],
@@ -504,6 +620,7 @@ def build_graph(
     graph_middleware: Sequence[Any] | None = None,
     fan_out_instance_middleware: Mapping[str, Sequence[Any]] | None = None,
     parallel_branches_branch_middleware: Mapping[str, Mapping[str, Sequence[Any]]] | None = None,
+    flaky_per_index_attempt_recorders: dict[str, dict[int, list[int]]] | None = None,
 ) -> BuiltGraph:
     """Translate a graph-shaped fixture block into a `BuiltGraph`.
 
@@ -584,6 +701,16 @@ def build_graph(
             body = _make_flaky_by_index_fn(node_name, node_spec["flaky_by_index"], trace)
         elif "flaky_instance_only" in node_spec:
             body = _make_flaky_instance_only_fn(node_name, node_spec["flaky_instance_only"], trace)
+        elif "flaky_per_index" in node_spec:
+            recorder: dict[int, list[int]] | None = None
+            if flaky_per_index_attempt_recorders is not None:
+                recorder = flaky_per_index_attempt_recorders.setdefault(node_name, {})
+            body = _make_flaky_per_index_fn(
+                node_name,
+                node_spec["flaky_per_index"],
+                trace,
+                instance_attempt_recorder=recorder,
+            )
         elif "update" in node_spec:
             body = _make_update_fn(node_name, node_spec["update"], trace)
         elif "update_pure" in node_spec:
@@ -594,8 +721,8 @@ def build_graph(
             raise ValueError(
                 f"node {node_name!r} has no recognized directive "
                 "(update / update_pure / update_from_field / raises / flaky / "
-                "flaky_by_index / flaky_instance_only / fan_out / parallel_branches / "
-                "subgraph)"
+                "flaky_by_index / flaky_instance_only / flaky_per_index / fan_out / "
+                "parallel_branches / subgraph)"
             )
 
         sleep_ms = node_spec.get("sleep_ms")
@@ -734,7 +861,16 @@ def _add_fan_out_node(
     elif count_raw is not None:
         count = int(count_raw)
 
-    conc_raw = cfg.get("concurrency", 10)
+    # ``concurrent_mode: serial`` (proposal-0009 fixture set 048-054)
+    # is harness sugar for ``concurrency=1`` — forces deterministic
+    # per-instance completion ordering for resume-correctness assertions.
+    # Takes precedence over an explicit ``concurrency`` value if both
+    # are present.
+    concurrent_mode = cfg.get("concurrent_mode")
+    if concurrent_mode == "serial":
+        conc_raw: Any = 1
+    else:
+        conc_raw = cfg.get("concurrency", 10)
     conc: int | Callable[[Any], int | None] | None
     if isinstance(conc_raw, dict):
         conc = cast(
diff --git a/tests/conformance/harness/directives.py b/tests/conformance/harness/directives.py
index dd20d2c..dc4f368 100644
--- a/tests/conformance/harness/directives.py
+++ b/tests/conformance/harness/directives.py
@@ -15,7 +15,7 @@
 
 from __future__ import annotations
 
-from typing import Annotated, Any, Literal
+from typing import Annotated, Any, Literal, cast
 
 from pydantic import (
     BaseModel,
@@ -159,10 +159,23 @@ class FlakyByIndexSpec(_AllowExtras):
 
 
 class FlakyPerIndexSpec(_AllowExtras):
-    """Checkpoint-resume variant: indices in ``fail_first_run_indices`` fail
-    on the first invocation; everyone succeeds on subsequent runs."""
+    """Checkpoint-resume variant. Two failure-injection shapes:
+
+    - ``fail_first_run_indices``: indices fail on the first invocation
+      only; everyone succeeds on subsequent runs (used by 048, 049,
+      050, 051, 053, 054 to simulate "abort, then resume succeeds").
+    - ``always_fail_indices``: indices fail on EVERY invocation
+      (deterministic failure). Used by 052 (collect mode): the failure
+      gets recorded as an error contribution and the instance is
+      ``completed`` on the saved record; on resume the recorded error
+      rolls forward verbatim without re-running.
+
+    Both forms share ``success_compute`` for the success-path state
+    update.
+    """
 
-    fail_first_run_indices: list[int]
+    fail_first_run_indices: list[int] | None = None
+    always_fail_indices: list[int] | None = None
     success_compute: dict[str, Any]
 
 
@@ -233,6 +246,63 @@ class FanOutSpec(_AllowExtras):
     on_empty: Literal["raise", "noop"] | None = None
     errors_field: str | None = None
     instance_middleware: list[MiddlewareSpec] | None = None
+    # proposal-0009 fixtures (048-054): ``concurrent_mode: serial``
+    # forces deterministic completion ordering for resume-correctness
+    # assertions. The adapter translates this into ``concurrency=1``;
+    # other modes (the implicit "concurrent" default) fall through to
+    # the configured ``concurrency`` value.
+    concurrent_mode: Literal["serial", "concurrent"] | None = None
+    # proposal-0009 fixture 052 (collect mode): ``abort_after_instance: N``
+    # is a harness directive — after instance N's completion save fires,
+    # the harness aborts to simulate a crash with that prefix flushed.
+    # The engine never sees this directive; the conformance test driver
+    # interprets it.
+    abort_after_instance: int | None = None
+
+    @model_validator(mode="before")
+    @classmethod
+    def _normalize_instance_middleware(cls, data: Any) -> Any:
+        """Normalize the two YAML shapes for ``instance_middleware`` entries.
+
+        Existing fixtures (e.g., 021) use the explicit-tag form::
+
+            instance_middleware:
+              - type: retry
+                max_attempts: 3
+
+        proposal-0009 fixture 053 uses the key-as-tag form::
+
+            instance_middleware:
+              - retry:
+                  max_attempts: 3
+
+        Both are valid YAML descriptions of the same configuration.
+        Rewrite the second shape into the first BEFORE the
+        discriminated union validates so downstream parsing sees a
+        uniform shape.
+        """
+        if not isinstance(data, dict):
+            return data
+        data_dict: dict[str, Any] = cast("dict[str, Any]", data)
+        im: Any = data_dict.get("instance_middleware")
+        if not isinstance(im, list):
+            return data_dict
+        normalized: list[Any] = []
+        for entry in cast("list[Any]", im):
+            if isinstance(entry, dict):
+                entry_dict = cast("dict[str, Any]", entry)
+                if "type" not in entry_dict and len(entry_dict) == 1:
+                    only_key = next(iter(entry_dict))
+                    inner = entry_dict[only_key]
+                    if isinstance(inner, dict):
+                        inner_dict = cast("dict[str, Any]", inner)
+                        flat: dict[str, Any] = {"type": only_key}
+                        flat.update(inner_dict)
+                        normalized.append(flat)
+                        continue
+            normalized.append(entry)
+        data_dict["instance_middleware"] = normalized
+        return data_dict
 
 
 class ParallelBranchSpec(_AllowExtras):
diff --git a/tests/conformance/harness/fixtures.py b/tests/conformance/harness/fixtures.py
index 71010c0..c7dc1c9 100644
--- a/tests/conformance/harness/fixtures.py
+++ b/tests/conformance/harness/fixtures.py
@@ -118,8 +118,11 @@ class CaseSpec(BaseModel):
     # llm-provider sub-cases.
     call: LlmCallSpec | None = None
     expected_wire_request: dict[str, Any] | None = None
-    # Checkpointing fixtures (024–031).
-    checkpointer: str | None = None
+    # Checkpointing fixtures (024-031, 048-054). Two shapes:
+    #   - ``str`` (e.g. ``"in_memory"``): backend kind selector.
+    #   - ``dict``: backend kind + config knobs (e.g. fixture 054's
+    #     ``{kind: in_memory_batched, fan_out_internal_save_batching: ...}``).
+    checkpointer: str | dict[str, Any] | None = None
     first_run_expected_error: dict[str, Any] | None = None
     saved_record_assertions: dict[str, Any] | None = None
     latest_record_assertions: dict[str, Any] | None = None
diff --git a/tests/conformance/test_checkpoint.py b/tests/conformance/test_checkpoint.py
index 5da607b..9e72b6e 100644
--- a/tests/conformance/test_checkpoint.py
+++ b/tests/conformance/test_checkpoint.py
@@ -1,29 +1,36 @@
-"""Run every spec checkpoint conformance fixture (024-031) against the engine.
+"""Run every spec checkpoint conformance fixture (024-031, 048-054)
+against the engine.
 
-Phase 5 scope: pipeline-utilities §10 (proposal 0008). Drives the real
-:class:`InMemoryCheckpointer` through the engine's save+resume path
-end-to-end, asserting against the fixture's ``expected.checkpoint_saves``
-+ ``invariants`` + resume expectations.
+Phase 5 / proposal-0009 scope: pipeline-utilities §10. Drives the real
+:class:`InMemoryCheckpointer` (with optional fan-out internal save
+batching per §10.11.4) through the engine's save+resume path end-to-end,
+asserting against the fixture's ``saved_record_assertions`` (including
+``fan_out_progress`` matchers), ``expected.checkpoint_saves``,
+``invariants``, and resume expectations (including per-instance
+``instances_executed_during_resume`` / ``instances_skipped_during_resume``
+and per-instance attempt-count assertions from proposal 0009 fixtures).
 
-Fixture-by-fixture status (Phase 5):
+Fixture-by-fixture status:
 
 - 024 save-on-every-completed-event — supported.
 - 025 resume-from-completed-position — supported.
 - 026 record-shape — supported.
 - 027 attempt-index-resets-on-resume — needs a resume-aware
   ``flaky_resume_aware`` test seam in the adapter; deferred.
-- 028 fan-out-atomic-restart — needs a resume-aware
-  ``flaky_per_index`` test seam; deferred.
+- 028 fan-out-atomic-restart — REMOVED in spec v0.18.0 (replaced by
+  per-instance resume contract). The fixture file no longer exists.
 - 029 subgraph-resume — supported (uses plain ``flaky``).
 - 030 checkpoint-not-found — supported.
 - 031 correlation-id-preserved-across-resume — record-level
   assertions supported here; the OTel span/log assertions are
   gated until Phase 6 lands the observability mapping.
+- 048-054 per-instance fan-out resume contract (proposal 0009) —
+  supported.
 """
 
 from __future__ import annotations
 
-from collections.abc import Mapping
+from collections.abc import Iterable, Mapping
 from pathlib import Path
 from typing import Any, cast
 
@@ -35,6 +42,9 @@
     CheckpointError,
     CheckpointNotFound,
     CheckpointRecord,
+    FanOutInstanceProgress,
+    FanOutInternalSaveBatching,
+    FanOutProgress,
     InMemoryCheckpointer,
 )
 from openarmature.graph import (
@@ -48,18 +58,20 @@
     Path(__file__).resolve().parents[2] / "openarmature-spec" / "spec" / "pipeline-utilities" / "conformance"
 )
 
-# Phase 5 fixture range: 024-031 are the proposal-0008 conformance set.
-_CHECKPOINT_FIXTURE_RANGE = range(24, 32)
+# Conformance fixture range: 024-031 minus 028 are the proposal-0008
+# set; 048-054 are the proposal-0009 per-instance-resume set. 028
+# (fan-out atomic-restart) was REMOVED in spec v0.18.0 when proposal
+# 0009 superseded its contract, so it is explicitly excluded from the
+# set rather than relying on the test runner's file-glob to filter
+# the missing fixture out.
+_CHECKPOINT_FIXTURE_NUMBERS: frozenset[int] = frozenset((set(range(24, 32)) - {28}) | set(range(48, 55)))
 
 # Fixtures that need resume-aware test seams the conformance adapter
 # doesn't yet translate. Skipped here with a clear reason — the engine
-# plumbing they'd verify (retry-budget reset on resume, fan-out
-# atomic restart) is independently covered by unit tests in
-# tests/unit/test_checkpoint.py.
+# plumbing they'd verify is independently covered by unit tests.
 _DEFERRED_FIXTURES = frozenset(
     {
         "027-checkpoint-attempt-index-resets-on-resume",
-        "028-checkpoint-fan-out-atomic-restart",
     }
 )
 
@@ -71,7 +83,7 @@ def _fixture_paths() -> list[Path]:
             number = int(p.stem.split("-", 1)[0])
         except ValueError:
             continue
-        if number in _CHECKPOINT_FIXTURE_RANGE:
+        if number in _CHECKPOINT_FIXTURE_NUMBERS:
             out.append(p)
     return out
 
@@ -90,19 +102,89 @@ def _load(path: Path) -> dict[str, Any]:
 # ---------------------------------------------------------------------------
 
 
+class _AbortAfterInstance(Exception):  # noqa: N818
+    """Sentinel exception raised by the capturing wrapper to simulate a
+    crash after the configured instance's "instance completed" save
+    has fired.
+
+    Under collect mode, this exception fires from inside a
+    per-instance save and gets captured by
+    ``asyncio.gather(..., return_exceptions=True)``. The test driver
+    sees the captured-but-not-surfaced abort by inspecting the
+    wrapper's ``_aborted`` flag after the invoke returns.
+    """
+
+
 class _CapturingCheckpointer:
     """Wraps an :class:`InMemoryCheckpointer` and records every save
     in order so the harness can assert against the fixture's
     ``expected.checkpoint_saves`` block. Implements the
-    :class:`Checkpointer` Protocol shape."""
-
-    def __init__(self) -> None:
-        self._inner = InMemoryCheckpointer()
+    :class:`Checkpointer` Protocol shape AND the optional
+    ``save_fan_out_internal`` hook (per §10.11.4 batching) so the
+    engine routes inner-instance saves here.
+
+    ``abort_after_instance``: when set, the wrapper raises
+    :class:`_AbortAfterInstance` AFTER the save that just transitioned
+    the named instance index from ``not_started`` / ``in_flight`` to
+    ``completed``. Simulates a crash at that exact point — used by
+    fixture 052 to test collect-mode error-record rollforward.
+    """
+
+    def __init__(
+        self,
+        *,
+        fan_out_internal_save_batching: FanOutInternalSaveBatching | None = None,
+        abort_after_instance: int | None = None,
+    ) -> None:
+        self._inner = InMemoryCheckpointer(
+            fan_out_internal_save_batching=fan_out_internal_save_batching,
+        )
         self.saves: list[CheckpointRecord] = []
+        self._abort_after_instance = abort_after_instance
+        self._aborted = False
 
     async def save(self, invocation_id: str, record: CheckpointRecord) -> None:
+        self._raise_if_post_abort()
         self.saves.append(record)
         await self._inner.save(invocation_id, record)
+        self._maybe_abort(record)
+
+    async def save_fan_out_internal(self, invocation_id: str, record: CheckpointRecord) -> None:
+        self._raise_if_post_abort()
+        self.saves.append(record)
+        await self._inner.save_fan_out_internal(invocation_id, record)
+        self._maybe_abort(record)
+
+    async def save_fan_out_in_flight_failure(self, invocation_id: str, record: CheckpointRecord) -> None:
+        self._raise_if_post_abort()
+        self.saves.append(record)
+        await self._inner.save_fan_out_in_flight_failure(invocation_id, record)
+        self._maybe_abort(record)
+
+    def _raise_if_post_abort(self) -> None:
+        """Once the abort has fired, any subsequent save call raises
+        immediately — modelling a process-level crash after the
+        target instance's completion. Without this, gather would
+        continue dispatching sibling instances whose saves would
+        complete normally and pollute the loaded record."""
+        if self._aborted:
+            raise _AbortAfterInstance("post-abort save call")
+
+    def _maybe_abort(self, record: CheckpointRecord) -> None:
+        """Check whether this save was the one transitioning the
+        configured ``abort_after_instance`` to ``completed``. If so,
+        raise the sentinel after the save has been recorded (so the
+        record is durably persisted before the simulated crash)."""
+        if self._abort_after_instance is None or self._aborted:
+            return
+        target_idx = self._abort_after_instance
+        for fp in record.fan_out_progress:
+            if target_idx < len(fp.instances) and fp.instances[target_idx].state == "completed":
+                # Subsequent instances must NOT be completed — otherwise
+                # we'd abort after a later instance's save instead.
+                if all(inst.state != "completed" for inst in fp.instances[target_idx + 1 :]):
+                    self._aborted = True
+                    raise _AbortAfterInstance(f"simulated crash after instance {target_idx} completed save")
 
     async def load(self, invocation_id: str) -> CheckpointRecord | None:
         return await self._inner.load(invocation_id)
@@ -132,20 +214,107 @@ async def test_checkpoint_fixture(fixture_path: Path) -> None:
         for case in cast("list[dict[str, Any]]", spec["cases"]):
             case_name = case.get("name", "<unnamed>")
             try:
-                await _run_one_case(case)
+                await _run_one_case(case, top_level=spec)
             except AssertionError as e:
                 raise AssertionError(f"case {case_name!r}: {e}") from e
         return
-    await _run_one_case(spec)
+    await _run_one_case(spec, top_level=spec)
+
+
+def _build_capturing(spec: Mapping[str, Any]) -> _CapturingCheckpointer:
+    """Build the capturing checkpointer for a case, honoring the
+    optional batching / abort directives from the fixture.
+
+    The fixture's ``checkpointer`` field accepts two shapes:
+    - ``"in_memory"``: default no-batching backend.
+    - ``{kind: in_memory_batched, fan_out_internal_save_batching: {flush_every: N}}``:
+      the §10.11.4 batched backend with N-save flush interval.
+
+    The fixture's fan-out node may also carry ``abort_after_instance: N``
+    — a harness-level directive that simulates a crash after the named
+    instance's "instance completed" save fires. Surface that here so
+    the capturing wrapper can raise the sentinel.
+    """
+    checkpointer_cfg = spec.get("checkpointer")
+    batching: FanOutInternalSaveBatching | None = None
+    if isinstance(checkpointer_cfg, dict):
+        cfg_dict = cast("dict[str, Any]", checkpointer_cfg)
+        kind = cfg_dict.get("kind")
+        if kind == "in_memory_batched":
+            batching_cfg = cast(
+                "Mapping[str, Any]",
+                cfg_dict.get("fan_out_internal_save_batching") or {},
+            )
+            flush_every = int(batching_cfg.get("flush_every", 0))
+            batching = FanOutInternalSaveBatching(flush_every=flush_every)
+    abort_after = _find_abort_after_instance(spec)
+    return _CapturingCheckpointer(
+        fan_out_internal_save_batching=batching,
+        abort_after_instance=abort_after,
+    )
 
 
-async def _run_one_case(spec: Mapping[str, Any]) -> None:
+def _find_abort_after_instance(spec: Mapping[str, Any]) -> int | None:
+    """Locate the ``abort_after_instance`` directive (if any) on a
+    fan-out node config inside the case spec. Returns the int idx or
+    None if no fan-out node declares the directive. Used by fixture 052.
+    """
+    for node_spec in cast("dict[str, dict[str, Any]]", spec.get("nodes", {})).values():
+        if "fan_out" in node_spec:
+            fan_out = cast("Mapping[str, Any]", node_spec["fan_out"])
+            if "abort_after_instance" in fan_out:
+                return int(fan_out["abort_after_instance"])
+    return None
+
+
+def _strip_abort_directive(spec: Mapping[str, Any]) -> Mapping[str, Any]:
+    """Return a fresh spec dict with any ``abort_after_instance``
+    directive removed from fan-out nodes. The engine doesn't recognize
+    the directive; the wrapper checkpointer interprets it on the
+    harness side. Strip before passing to ``build_graph`` so the
+    underlying fan-out config doesn't carry the unknown key into the
+    builder."""
+    nodes_raw = spec.get("nodes")
+    if not isinstance(nodes_raw, dict):
+        return spec
+    nodes = cast("dict[str, dict[str, Any]]", nodes_raw)
+    new_nodes: dict[str, dict[str, Any]] = {}
+    changed = False
+    for node_name, node_spec in nodes.items():
+        if "fan_out" in node_spec:
+            fan_out = cast("dict[str, Any]", node_spec["fan_out"])
+            if "abort_after_instance" in fan_out:
+                new_fan_out = {k: v for k, v in fan_out.items() if k != "abort_after_instance"}
+                new_nodes[node_name] = {**node_spec, "fan_out": new_fan_out}
+                changed = True
+                continue
+        new_nodes[node_name] = node_spec
+    if not changed:
+        return spec
+    return {**spec, "nodes": new_nodes}
+
+
+async def _run_one_case(spec: Mapping[str, Any], *, top_level: Mapping[str, Any]) -> None:
     """Run one fixture or one case from a cases-shape fixture."""
-    capturing = _CapturingCheckpointer()
-    subgraphs = _build_subgraphs(spec)
+    capturing = _build_capturing(spec)
+    # Shared recorders so flaky_per_index nodes inside subgraphs feed
+    # the same per-instance attempt table the resume assertions consult.
+    # Subgraphs and the outer graph both contribute keyed by node name.
+    flaky_per_index_recorders: dict[str, dict[int, list[int]]] = {}
+    subgraphs = _build_subgraphs_for(
+        spec,
+        top_level,
+        flaky_per_index_recorders=flaky_per_index_recorders,
+    )
 
     trace: list[str] = []
-    built = build_graph(spec, subgraphs=subgraphs, trace=trace)
+    sanitized_spec = _strip_abort_directive(spec)
+    built = build_graph(
+        sanitized_spec,
+        subgraphs=subgraphs,
+        trace=trace,
+        flaky_per_index_attempt_recorders=flaky_per_index_recorders,
+    )
     builder = built.builder
     builder.with_checkpointer(cast("Checkpointer", capturing))
     compiled = builder.compile()
@@ -155,16 +324,57 @@ async def _run_one_case(spec: Mapping[str, Any]) -> None:
     first_run_expected_error = spec.get("first_run_expected_error")
     invocation_id_first_run: str | None = None
     final_first_run: State | None = None
+    trace.clear()
     try:
         final_first_run = await compiled.invoke(
             initial_state,
             correlation_id=spec.get("correlation_id"),
         )
-        if first_run_expected_error is not None:
+        # Under collect mode, the abort_after_instance sentinel fires
+        # from inside a per-instance save and is captured by gather's
+        # return_exceptions=True. The invoke returns "successfully"
+        # from gather's perspective. Detect the simulated crash by
+        # the wrapper's ``_aborted`` flag and treat it like a
+        # node_exception per the fixture's first_run_expected_error
+        # contract.
+        if capturing._aborted:  # noqa: SLF001 — test driver intentional
+            if first_run_expected_error is None:
+                raise AssertionError("abort_after_instance fired but no first_run_expected_error declared")
+            expected_category = first_run_expected_error["category"]
+            assert expected_category == "node_exception", (
+                f"abort_after_instance simulates node_exception; fixture asserts {expected_category!r}"
+            )
+        elif first_run_expected_error is not None:
             raise AssertionError(
                 f"expected first run to fail with category "
                 f"{first_run_expected_error!r} but it returned successfully"
             )
+    except _AbortAfterInstance:
+        # Simulated crash from the abort_after_instance directive
+        # for fail_fast-style flows where the sentinel propagates
+        # out of the engine. Treat as a node_exception at the
+        # fan-out level — that's the fixture's
+        # ``first_run_expected_error: node_exception`` shape.
+        if first_run_expected_error is None:
+            raise
+        expected_category = first_run_expected_error["category"]
+        assert expected_category == "node_exception", (
+            f"abort_after_instance simulates node_exception; fixture asserts {expected_category!r}"
+        )
+    except CheckpointError:
+        # When abort_after_instance fires during a subsequent
+        # post-abort save (instance dispatched after the target's
+        # save), the engine wraps the abort sentinel as
+        # ``CheckpointSaveFailed`` and propagates it out. Treat
+        # the wrapped abort the same way as a direct sentinel
+        # propagation when the fixture declares it as a
+        # ``node_exception`` first-run failure.
+        if first_run_expected_error is None or not capturing._aborted:  # noqa: SLF001
+            raise
+        expected_category = first_run_expected_error["category"]
+        assert expected_category == "node_exception", (
+            f"abort_after_instance simulates node_exception; fixture asserts {expected_category!r}"
+        )
     except RuntimeGraphError as e:
         if first_run_expected_error is None:
             raise
@@ -179,9 +389,24 @@ async def _run_one_case(spec: Mapping[str, Any]) -> None:
     if capturing.saves:
         invocation_id_first_run = capturing.saves[-1].invocation_id
 
-    # ----- Saved record assertions (fixture 029) -----
-    if "saved_record_assertions" in spec:
-        _assert_saved_record(cast("Mapping[str, Any]", spec["saved_record_assertions"]), capturing)
+    # Track per-instance attempts observed in the first run (used by
+    # proposal-0009 resume-side assertions). Snapshot before the
+    # resume run clears the recorder.
+    first_run_attempts = _snapshot_attempt_recorders(flaky_per_index_recorders)
+    _ = first_run_attempts  # reserved for cross-run assertions; not used directly yet
+
+    # ----- Saved record assertions -----
+    # Source the assertion against the LOADED record, not the
+    # last-recorded save call. For batching backends (fixture 054)
+    # the two differ: the in-memory ``saves`` list captures every
+    # call including buffered-not-flushed ones, but ``load`` only
+    # returns durably-flushed state. Per §10.11.4, the spec's
+    # ``saved record`` is the loaded record.
+    if "saved_record_assertions" in spec and invocation_id_first_run is not None:
+        loaded_record = await capturing.load(invocation_id_first_run)
+        if loaded_record is None:
+            raise AssertionError(f"saved_record_assertions: load({invocation_id_first_run!r}) returned None")
+        _assert_saved_record_from(cast("Mapping[str, Any]", spec["saved_record_assertions"]), loaded_record)
 
     # ----- Single-run expected assertions -----
     expected = cast("Mapping[str, Any]", spec.get("expected") or {})
@@ -202,7 +427,7 @@ async def _run_one_case(spec: Mapping[str, Any]) -> None:
             await compiled.invoke(initial_state, resume_invocation=ghost)
         return
 
-    # ----- Resume path (fixtures 025, 029, 031) -----
+    # ----- Resume path (fixtures 025, 029, 031, 048-054) -----
     resume_block = spec.get("resume")
     if resume_block is None or not resume_block.get("from_first_run"):
         return
@@ -210,6 +435,19 @@ async def _run_one_case(spec: Mapping[str, Any]) -> None:
         raise AssertionError("resume requested but no invocation_id captured (no saves fired)")
     saves_before_resume = list(capturing.saves)
     capturing.saves.clear()
+    # Clear per-instance attempt recorders so the resume run's
+    # entries are isolated for ``instance_N_attempt_index_on_resume``
+    # and ``instance_N_resume_attempt_count`` assertions.
+    for recorder in flaky_per_index_recorders.values():
+        recorder.clear()
+    # Reset the abort gate so the resume run completes normally.
+    # ``_aborted`` being False disables the ``_raise_if_post_abort``
+    # pre-flight check; clearing ``_abort_after_instance`` ensures
+    # ``_maybe_abort`` is also a no-op on the resume path.
+    capturing._aborted = False  # noqa: SLF001 — test driver intentional
+    capturing._abort_after_instance = None  # noqa: SLF001
+    # Clear the trace so post-resume execution capture is isolated.
+    trace.clear()
     try:
         final_resume = await compiled.invoke(
             initial_state,
@@ -217,10 +455,48 @@ async def _run_one_case(spec: Mapping[str, Any]) -> None:
         )
     except CheckpointError:
         raise
+    _ = trace  # trace clearing/inspection deferred; recorder map is canonical
     resume_expected = cast("Mapping[str, Any]", resume_block.get("expected") or {})
     if "final_state" in resume_expected:
         _assert_state_matches(final_resume, cast("Mapping[str, Any]", resume_expected["final_state"]))
 
+    # proposal-0009 instances_executed_during_resume /
+    # instances_skipped_during_resume — assert against the
+    # per-instance attempt recorders (each instance whose body ran
+    # appears in the recorder).
+    if "instances_executed_during_resume" in resume_expected:
+        expected_executed = sorted(
+            int(i) for i in cast(Iterable[Any], resume_expected["instances_executed_during_resume"])
+        )
+        actual_executed = sorted(_flatten_executed_instances(flaky_per_index_recorders))
+        assert actual_executed == expected_executed, (
+            f"instances_executed_during_resume mismatch: "
+            f"actual={actual_executed}, expected={expected_executed}"
+        )
+    if "instances_skipped_during_resume" in resume_expected:
+        expected_skipped = sorted(
+            int(i) for i in cast(Iterable[Any], resume_expected["instances_skipped_during_resume"])
+        )
+        actual_executed_set = set(_flatten_executed_instances(flaky_per_index_recorders))
+        # An instance is "skipped" if its body did NOT run during resume.
+        # We can validate by asserting it's not in the executed set —
+        # the fixtures specify the disjoint partitioning explicitly.
+        for skipped_idx in expected_skipped:
+            assert skipped_idx not in actual_executed_set, (
+                f"instance {skipped_idx} expected to be skipped on resume "
+                f"but its body ran (recorded attempts: {actual_executed_set})"
+            )
+
+    if "invariants" in resume_expected or "invariants" in resume_block:
+        # Resume-block invariants land on either resume.expected.invariants
+        # or resume.invariants depending on fixture style. Read both.
+        invariants_block: dict[str, Any] = {}
+        if "invariants" in resume_block:
+            invariants_block.update(cast("dict[str, Any]", resume_block["invariants"]))
+        if "invariants" in resume_expected:
+            invariants_block.update(cast("dict[str, Any]", resume_expected["invariants"]))
+        _assert_resume_invariants(invariants_block, final_resume, flaky_per_index_recorders)
+
     # Fixture 031: assert correlation_id preserved + invocation_id
     # changed. Span/log assertions deferred to Phase 6 — observability
     # isn't wired yet. Skip those cleanly here.
@@ -249,9 +525,40 @@ async def _run_one_case(spec: Mapping[str, Any]) -> None:
 # ---------------------------------------------------------------------------
 
 
-def _build_subgraphs(spec: Mapping[str, Any]) -> dict[str, Any]:
+def _build_subgraphs_for(
+    spec: Mapping[str, Any],
+    top_level: Mapping[str, Any],
+    *,
+    flaky_per_index_recorders: dict[str, dict[int, list[int]]] | None = None,
+) -> dict[str, Any]:
+    """Build subgraphs from either the case's own ``subgraph`` /
+    ``subgraphs`` block or the cases-fixture's top-level shared
+    ``subgraph`` block. Each case may declare local subgraphs OR
+    inherit from the top level.
+
+    ``flaky_per_index_recorders`` (when supplied) threads through to
+    inner-subgraph build so per-instance flaky bodies inside subgraphs
+    populate the same recorder map the resume assertions read.
+    """
+    return _build_subgraphs(
+        {**dict(top_level), **dict(spec)},
+        flaky_per_index_recorders=flaky_per_index_recorders,
+    )
+
+
+def _build_subgraphs(
+    spec: Mapping[str, Any],
+    *,
+    flaky_per_index_recorders: dict[str, dict[int, list[int]]] | None = None,
+) -> dict[str, Any]:
     """Build any subgraphs (`subgraph:` or `subgraphs:`) the fixture
-    declares. Returns a registry the adapter consumes by name."""
+    declares. Returns a registry the adapter consumes by name.
+
+    Inner subgraphs may declare flaky_per_index nodes (fixture 048+:
+    the failing/succeeding scorer node lives in the inner subgraph,
+    not the outer graph). Thread the recorders through so those
+    flaky bodies populate the same per-instance attempt table.
+    """
     subgraph_specs: dict[str, Any] = {}
     if "subgraph" in spec:
         single = cast("Mapping[str, Any]", spec["subgraph"])
@@ -263,18 +570,25 @@ def _build_subgraphs(spec: Mapping[str, Any]) -> dict[str, Any]:
     compiled_subgraphs: dict[str, Any] = {}
     for name, sub_spec in subgraph_specs.items():
         sub_trace: list[str] = []
-        sub_built = build_graph(sub_spec, trace=sub_trace)
+        sub_built = build_graph(
+            sub_spec,
+            trace=sub_trace,
+            flaky_per_index_attempt_recorders=flaky_per_index_recorders,
+        )
         compiled_subgraphs[name] = sub_built.builder.compile()
     return compiled_subgraphs
 
 
-def _assert_saved_record(
+def _assert_saved_record_from(
     block: Mapping[str, Any],
-    capturing: _CapturingCheckpointer,
+    record: CheckpointRecord,
 ) -> None:
-    if not capturing.saves:
-        raise AssertionError("saved_record_assertions: no saves were recorded")
-    record = capturing.saves[-1]
+    """Assert ``block`` against ``record``. Same semantics as
+    :func:`_assert_saved_record` but the caller supplies the record
+    directly (used for fixtures where the assertion targets the
+    loaded record rather than the last in-memory save call —
+    e.g., the §10.11.4 batching case where buffered saves are
+    invisible to ``load``)."""
     if "completed_positions" in block:
         expected_positions = cast("list[Mapping[str, Any]]", block["completed_positions"])
         actual = [
@@ -293,6 +607,240 @@ def _assert_saved_record(
         assert record.parent_states, "expected parent_states to be populated; got empty tuple"
     if block.get("parent_states_outermost_first"):
         assert record.parent_states[0] is not None
+    if "fan_out_progress" in block:
+        _assert_fan_out_progress(
+            cast("Mapping[str, Any]", block["fan_out_progress"]),
+            record.fan_out_progress,
+        )
+    if "fan_out_node_in_completed_positions" in block:
+        expected_present = bool(block["fan_out_node_in_completed_positions"])
+        actual_present = any(
+            p.node_name in {fp.fan_out_node_name for fp in record.fan_out_progress}
+            for p in record.completed_positions
+        )
+        assert actual_present == expected_present, (
+            f"fan_out_node_in_completed_positions mismatch: "
+            f"actual={actual_present}, expected={expected_present}"
+        )
+
+
+def _assert_fan_out_progress(
+    expected: Mapping[str, Any],
+    actual: tuple[FanOutProgress, ...],
+) -> None:
+    """Assert against a ``fan_out_progress`` block in the fixture.
+
+    Block shape:
+
+        fan_out_progress:
+          <node_name>:
+            instance_count: int
+            instances:
+              - state: completed | in_flight | not_started
+                result: <any>                 # optional, scalar matches
+                result_kind: error            # optional, asserts result is an error dict
+                state_one_of: [in_flight, not_started]  # optional alternation
+                completed_inner_positions:    # optional list-of-dicts matchers
+                  - {node_name: step_a, attempt_index: 0}
+    """
+    by_name = {fp.fan_out_node_name: fp for fp in actual}
+    for node_name, fp_expected in expected.items():
+        fp_expected_dict = cast("Mapping[str, Any]", fp_expected)
+        if node_name not in by_name:
+            raise AssertionError(
+                f"fan_out_progress: no entry for fan-out node {node_name!r}; "
+                f"actual entries: {sorted(by_name)}"
+            )
+        fp = by_name[node_name]
+        if "instance_count" in fp_expected_dict:
+            assert fp.instance_count == fp_expected_dict["instance_count"], (
+                f"fan_out_progress[{node_name!r}].instance_count: "
+                f"actual={fp.instance_count}, expected={fp_expected_dict['instance_count']}"
+            )
+        if "instances" in fp_expected_dict:
+            instances_expected = cast("list[Mapping[str, Any]]", fp_expected_dict["instances"])
+            assert len(fp.instances) == len(instances_expected), (
+                f"fan_out_progress[{node_name!r}].instances length: "
+                f"actual={len(fp.instances)}, expected={len(instances_expected)}"
+            )
+            for idx, (inst_expected, inst_actual) in enumerate(
+                zip(instances_expected, fp.instances, strict=True)
+            ):
+                _assert_fan_out_instance(node_name, idx, inst_expected, inst_actual)
+
+
+def _assert_fan_out_instance(
+    node_name: str,
+    idx: int,
+    expected: Mapping[str, Any],
+    actual: FanOutInstanceProgress,
+) -> None:
+    """Assert one entry inside a fan_out_progress.instances list."""
+    if "state" in expected:
+        assert actual.state == expected["state"], (
+            f"fan_out_progress[{node_name!r}].instances[{idx}].state: "
+            f"actual={actual.state!r}, expected={expected['state']!r}"
+        )
+    if "state_one_of" in expected:
+        allowed = set(cast("Iterable[str]", expected["state_one_of"]))
+        assert actual.state in allowed, (
+            f"fan_out_progress[{node_name!r}].instances[{idx}].state: "
+            f"actual={actual.state!r}, expected one of {allowed!r}"
+        )
+    if "result" in expected:
+        assert actual.result == expected["result"], (
+            f"fan_out_progress[{node_name!r}].instances[{idx}].result: "
+            f"actual={actual.result!r}, expected={expected['result']!r}"
+        )
+    if expected.get("result_kind") == "error":
+        # Spec §10.11.2: collect-mode error contributions are recorded
+        # as the per-instance result entry. The engine ships
+        # ``dict[str, str]`` with ``fan_out_index`` and ``category``.
+        raw_result: Any = actual.result
+        assert isinstance(raw_result, dict), (
+            f"fan_out_progress[{node_name!r}].instances[{idx}].result: "
+            f"expected dict (error_record), got {type(raw_result).__name__}"
+        )
+        result_dict = cast("dict[str, Any]", raw_result)
+        assert "category" in result_dict, (
+            f"fan_out_progress[{node_name!r}].instances[{idx}].result: "
+            f"expected error_record with 'category' key, got {result_dict!r}"
+        )
+    if "completed_inner_positions" in expected:
+        positions_expected = cast("list[Mapping[str, Any]]", expected["completed_inner_positions"])
+        # Compare by node_name + attempt_index per the spec; namespace
+        # and step are engine-internal details fixture authors don't
+        # always include.
+        actual_min = [
+            {"node_name": p.node_name, "attempt_index": p.attempt_index}
+            for p in actual.completed_inner_positions
+        ]
+        expected_min = [
+            {"node_name": p["node_name"], "attempt_index": p.get("attempt_index", 0)}
+            for p in positions_expected
+        ]
+        assert actual_min == expected_min, (
+            f"fan_out_progress[{node_name!r}].instances[{idx}].completed_inner_positions: "
+            f"actual={actual_min}, expected={expected_min}"
+        )
+
+
+def _assert_resume_invariants(
+    block: Mapping[str, Any],
+    final_state: State | None,
+    recorders: Mapping[str, dict[int, list[int]]],
+) -> None:
+    """Assert resume-side invariants — list-length, no-duplicate,
+    per-instance attempt counts."""
+    final_dict: dict[str, Any] = final_state.model_dump() if final_state is not None else {}
+    for key, value in block.items():
+        if key == "no_duplicate_results":
+            if not value:
+                continue
+            results = final_dict.get("results")
+            if isinstance(results, list):
+                results_list = cast("list[Any]", results)
+                assert len(set(_hashable(r) for r in results_list)) == len(results_list), (
+                    f"results list has duplicate entries: {results_list}"
+                )
+        elif key == "results_list_length":
+            results = final_dict.get("results")
+            assert isinstance(results, list), f"results_list_length: results is not a list ({results!r})"
+            results_list = cast("list[Any]", results)
+            assert len(results_list) == value, (
+                f"results_list_length: actual={len(results_list)}, expected={value}"
+            )
+        elif key == "errors_list_length":
+            errors = final_dict.get("errors")
+            assert isinstance(errors, list), f"errors_list_length: errors is not a list ({errors!r})"
+            errors_list = cast("list[Any]", errors)
+            assert len(errors_list) == value, (
+                f"errors_list_length: actual={len(errors_list)}, expected={value}"
+            )
+        elif key == "no_duplicate_error_entries":
+            if not value:
+                continue
+            errors = final_dict.get("errors")
+            if isinstance(errors, list):
+                errors_list = cast("list[Any]", errors)
+                hashes = [_hashable(e) for e in errors_list]
+                assert len(set(hashes)) == len(hashes), f"errors list has duplicates: {errors_list}"
+        elif key.startswith("instance_") and key.endswith("_attempt_index_on_resume"):
+            # Extract instance index from ``instance_<N>_attempt_index_on_resume``.
+            parts = key.split("_")
+            try:
+                idx = int(parts[1])
+            except ValueError:
+                continue
+            # Per §10.6: every retry budget resets to 0 on resume.
+            # Assert the first attempt observed on the resume run for
+            # the named instance is attempt_index 0.
+            for recorder in recorders.values():
+                if idx in recorder:
+                    attempts = recorder[idx]
+                    if attempts:
+                        assert attempts[0] == value, (
+                            f"instance {idx} first-resume attempt_index: "
+                            f"actual={attempts[0]}, expected={value}"
+                        )
+        elif key.startswith("instance_") and key.endswith("_resume_attempt_count"):
+            parts = key.split("_")
+            try:
+                idx = int(parts[1])
+            except ValueError:
+                continue
+            for recorder in recorders.values():
+                if idx in recorder:
+                    attempts = recorder[idx]
+                    assert len(attempts) == value, (
+                        f"instance {idx} resume attempt count: actual={len(attempts)}, expected={value}"
+                    )
+        elif key.startswith("instance_") and "_executes_step_" in key and key.endswith("_on_resume"):
+            # Fixture 050 directive: ``instance_1_executes_step_a_on_resume: true``.
+            # Verified indirectly by ``instances_executed_during_resume``
+            # — the instance ran, so its inner subgraph re-entered at
+            # the entry node. The harness doesn't yet introspect which
+            # specific inner nodes fired; the broader executed-set
+            # assertion covers the same correctness invariant.
+            continue
+        elif key == "batching_scoped_to_fan_out_internal_saves_only":
+            # Structural invariant — verified across the fixture suite
+            # rather than per-fixture (every non-fan-out save runs
+            # synchronously regardless of the batching config). The
+            # fixture restates it as a reminder; no per-test action.
+            continue
+
+
+def _hashable(value: Any) -> Any:
+    """Make a value hashable for set-based duplicate detection. Lists
+    and dicts get rendered as tuples of (key, value) pairs."""
+    if isinstance(value, dict):
+        return tuple(sorted((k, _hashable(v)) for k, v in cast("dict[str, Any]", value).items()))
+    if isinstance(value, list):
+        return tuple(_hashable(v) for v in cast("list[Any]", value))
+    return value
+
+
+def _snapshot_attempt_recorders(
+    recorders: Mapping[str, dict[int, list[int]]],
+) -> dict[str, dict[int, list[int]]]:
+    """Deep-copy the per-flaky-node attempt recorder map."""
+    out: dict[str, dict[int, list[int]]] = {}
+    for node_name, idx_map in recorders.items():
+        out[node_name] = {idx: list(attempts) for idx, attempts in idx_map.items()}
+    return out
+
+
+def _flatten_executed_instances(
+    recorders: Mapping[str, dict[int, list[int]]],
+) -> list[int]:
+    """Union of instance indices observed across every flaky_per_index
+    recorder. An instance whose body fired at least once during this
+    run appears."""
+    seen: set[int] = set()
+    for idx_map in recorders.values():
+        seen.update(idx for idx, attempts in idx_map.items() if attempts)
+    return sorted(seen)
 
 
 def _assert_checkpoint_saves(
@@ -331,6 +879,24 @@ def _assert_state_matches(actual: Any, expected: Mapping[str, Any]) -> None:
     else:
         raise AssertionError(f"unexpected actual state type {type(actual).__name__}")
     for k, v in expected.items():
+        # Magic-key length assertions can appear inside ``final_state``
+        # alongside literal field assertions (e.g. fixture 052 has
+        # ``errors_list_length: 1`` as a sibling of the literal
+        # ``results`` and ``items`` fields). Route the magic keys to
+        # length checks against the corresponding list field; literal
+        # keys take the standard equality path.
+        if k == "errors_list_length":
+            errors = actual_dict.get("errors")
+            assert isinstance(errors, list), f"errors_list_length: errors is not a list ({errors!r})"
+            errors_list = cast("list[Any]", errors)
+            assert len(errors_list) == v, f"errors_list_length: actual={len(errors_list)}, expected={v}"
+            continue
+        if k == "results_list_length":
+            results = actual_dict.get("results")
+            assert isinstance(results, list), f"results_list_length: results is not a list ({results!r})"
+            results_list = cast("list[Any]", results)
+            assert len(results_list) == v, f"results_list_length: actual={len(results_list)}, expected={v}"
+            continue
         assert actual_dict.get(k) == v, f"state field {k!r}: actual={actual_dict.get(k)!r}, expected={v!r}"
 
 
diff --git a/tests/test_smoke.py b/tests/test_smoke.py
index 49b37ba..38e838b 100644
--- a/tests/test_smoke.py
+++ b/tests/test_smoke.py
@@ -9,7 +9,7 @@
 
 def test_package_versions() -> None:
     assert openarmature.__version__ == "0.8.0"
-    assert openarmature.__spec_version__ == "0.17.1"
+    assert openarmature.__spec_version__ == "0.18.1"
 
 
 def test_spec_version_matches_pyproject() -> None:
diff --git a/tests/unit/test_checkpoint.py b/tests/unit/test_checkpoint.py
index ae3fafb..e4defed 100644
--- a/tests/unit/test_checkpoint.py
+++ b/tests/unit/test_checkpoint.py
@@ -14,7 +14,7 @@
 from __future__ import annotations
 
 from pathlib import Path
-from typing import Any
+from typing import Any, ClassVar
 
 import pytest
 from pydantic import Field
@@ -95,7 +95,7 @@ def test_checkpoint_record_default_schema_version() -> None:
     # §10.2 (proposal 0014): records carry the user's state-schema
     # version, which is "" until the state class declares one.
     assert record.schema_version == ""
-    assert record.fan_out_progress is None
+    assert record.fan_out_progress == ()
 
 
 # ---------------------------------------------------------------------------
@@ -454,7 +454,7 @@ async def test_save_failure_raises_to_invoke_caller() -> None:
 
 
 # ---------------------------------------------------------------------------
-# Fan-out internal saves are gated off (§10.7 atomic restart)
+# Per-instance fan-out resume contract (proposal 0009 / spec v0.18.0)
 # ---------------------------------------------------------------------------
 
 
@@ -463,12 +463,20 @@ class _CapturingCheckpointer:
 
     def __init__(self) -> None:
         self.saves: list[CheckpointRecord] = []
+        self._records: dict[str, CheckpointRecord] = {}
 
     async def save(self, invocation_id: str, record: CheckpointRecord) -> None:
         self.saves.append(record)
+        self._records[invocation_id] = record
+
+    async def save_fan_out_internal(self, invocation_id: str, record: CheckpointRecord) -> None:
+        await self.save(invocation_id, record)
+
+    async def save_fan_out_in_flight_failure(self, invocation_id: str, record: CheckpointRecord) -> None:
+        await self.save(invocation_id, record)
 
     async def load(self, invocation_id: str) -> CheckpointRecord | None:
-        return None
+        return self._records.get(invocation_id)
 
     async def list(self, filter: Any = None) -> Any:
         return []
@@ -491,11 +499,13 @@ async def _scorer(s: _ItemState) -> dict[str, int]:
     return {"out": s.item + 100}
 
 
-async def test_fan_out_internal_saves_are_gated_off() -> None:
-    """Spec §10.3 + §10.7: per-instance internal completed events do
-    NOT produce saves in v1. Only the fan-out node's own completion
-    (the parent dispatch) saves, with ``fan_out_index is None`` on
-    every recorded position."""
+async def test_fan_out_internal_saves_fire_per_instance() -> None:
+    """Per spec §10.3 (revised by proposal 0009 / v0.18.0): fan-out
+    instance internal nodes DO produce saves. Each per-instance
+    completion emits at least one save with ``fan_out_index``
+    populated on the inner-node position, plus an explicit "instance
+    completed" save that flips the instance's ``fan_out_progress``
+    state to ``completed``."""
     inner = (
         GraphBuilder(_ItemState)
         .add_node("scorer", _scorer)
@@ -521,15 +531,219 @@ async def test_fan_out_internal_saves_are_gated_off() -> None:
         .compile()
     )
     await parent.invoke(_ParentState(items=[1, 2, 3]))
-    # Exactly one save: the fan-out's own completion (parent dispatch).
-    assert len(cp.saves) == 1
-    record = cp.saves[0]
-    # Every position in the record is at the parent level — no
-    # fan_out_index populated, ever.
-    for pos in record.completed_positions:
-        assert pos.fan_out_index is None
-    # And no save fired with a position from inside an instance.
-    assert all(p.namespace == () for p in record.completed_positions)
+    # Deterministic save breakdown for this test (3 instances,
+    # 1-node inner subgraph):
+    #   3 inner-node saves (per-instance scorer completion via
+    #     ``_maybe_save_checkpoint`` with ``fan_out_index`` set)
+    #   3 explicit "instance completed" saves (per-instance via
+    #     ``_save_instance_completed`` in ``fan_out.py``)
+    #   1 fan-out node completion save (via ``_maybe_save_checkpoint``
+    #     in ``_step_fan_out_node`` after fan-in)
+    # Total: 7. Use ``>=`` rather than ``==`` so future engine
+    # internals can add additional saves without breaking the test.
+    assert len(cp.saves) >= 7, (
+        f"expected >= 7 saves (3 inner-node + 3 instance-completed + "
+        f"1 fan-out completion), got {len(cp.saves)}"
+    )
+    # At least one save carries an inner position with fan_out_index
+    # populated — that's the inner-node save inside an instance,
+    # recorded against the per-instance ``completed_inner_positions``
+    # field on ``fan_out_progress`` (per spec §10.11).
+    saves_with_inner_positions = [
+        s
+        for s in cp.saves
+        for fp in s.fan_out_progress
+        for inst in fp.instances
+        if inst.completed_inner_positions
+    ]
+    assert saves_with_inner_positions, "expected at least one save with per-instance inner positions"
+    # The terminal save (fan-out node's own completion) carries the
+    # outer "fan" position with fan_out_index=None.
+    last_save = cp.saves[-1]
+    fan_positions = [p for p in last_save.completed_positions if p.node_name == "fan"]
+    assert len(fan_positions) == 1
+    assert fan_positions[0].fan_out_index is None
+
+
+# ---------------------------------------------------------------------------
+# Q4 from the spec impl review: focused unit test on fail_fast fast-cancel
+# ensuring the failed instance lands as in_flight (no result) on the
+# saved record after cancellation completes.
+# ---------------------------------------------------------------------------
+
+
+class _FailingItemState(State):
+    item: int = 0
+    out: int = 0
+
+
+class _FailingParentState(State):
+    items: list[int] = Field(default_factory=list[int])
+    results: list[int] = Field(default_factory=list[int])
+
+
+async def _failing_scorer(s: _FailingItemState) -> dict[str, int]:
+    # Fail when item == 999 (sentinel). All others succeed and
+    # contribute ``out = item``. The sentinel is positioned in the
+    # items list to trigger fail_fast cancellation of siblings.
+    if s.item == 999:
+        raise RuntimeError(f"intentional failure for item {s.item}")
+    return {"out": s.item}
+
+
+async def test_fail_fast_cancellation_leaves_failed_instance_in_flight() -> None:
+    """Per §10.11.2 fail_fast cancellation contract: the failed
+    instance's ``fan_out_progress`` state on the saved record is
+    ``in_flight`` (no ``result`` recorded), and cancelled siblings
+    are also ``in_flight`` or ``not_started`` — never ``completed``
+    for the failed slot. Closes the spec impl-review Q4 follow-on."""
+    inner = (
+        GraphBuilder(_FailingItemState)
+        .add_node("scorer", _failing_scorer)
+        .add_edge("scorer", END)
+        .set_entry("scorer")
+        .compile()
+    )
+    cp = _CapturingCheckpointer()
+    parent = (
+        GraphBuilder(_FailingParentState)
+        .add_fan_out_node(
+            "fan",
+            subgraph=inner,
+            collect_field="out",
+            target_field="results",
+            items_field="items",
+            item_field="item",
+            concurrency=1,  # serial so the failure ordering is deterministic
+            error_policy="fail_fast",
+        )
+        .add_edge("fan", END)
+        .set_entry("fan")
+        .with_checkpointer(cp)
+        .compile()
+    )
+    # Items: [10, 20, 999, 40] — instance 2 (item 999) fails. The
+    # engine wraps the raw RuntimeError as ``NodeException``.
+    with pytest.raises(NodeException):
+        await parent.invoke(_FailingParentState(items=[10, 20, 999, 40]))
+    # Locate the latest save's fan_out_progress for the "fan" node.
+    assert cp.saves, "expected at least one save to fire"
+    latest = cp.saves[-1]
+    fan_progress = next(
+        (fp for fp in latest.fan_out_progress if fp.fan_out_node_name == "fan"),
+        None,
+    )
+    assert fan_progress is not None, "expected fan_out_progress entry for the 'fan' node"
+    # Per §10.11.2: failed instance (idx 2) state is ``in_flight``
+    # (no ``result`` recorded). Successful preceding instances
+    # (0, 1) are ``completed``; cancelled siblings (3) are
+    # ``in_flight`` or ``not_started``.
+    assert fan_progress.instances[0].state == "completed"
+    assert fan_progress.instances[1].state == "completed"
+    assert fan_progress.instances[2].state == "in_flight", (
+        f"failed instance state should be in_flight, got {fan_progress.instances[2].state!r}"
+    )
+    assert fan_progress.instances[2].result is None, (
+        f"failed instance result should be None, got {fan_progress.instances[2].result!r}"
+    )
+    assert fan_progress.instances[3].state in {"in_flight", "not_started"}, (
+        f"cancelled sibling state should be in_flight or not_started, got {fan_progress.instances[3].state!r}"
+    )
+
+
+# ---------------------------------------------------------------------------
+# Nested fan-out: schema_version read from outermost state class
+# ---------------------------------------------------------------------------
+
+
+class _NestedSchemaOuterState(State):
+    schema_version: ClassVar[str] = "outer-v1"
+    items: list[int] = Field(default_factory=list[int])
+    results: list[int] = Field(default_factory=list[int])
+
+
+class _NestedSchemaMiddleState(State):
+    schema_version: ClassVar[str] = "middle-v1"
+    items: list[int] = Field(default_factory=list[int])
+    results: list[int] = Field(default_factory=list[int])
+
+
+class _NestedSchemaInnerState(State):
+    item: int = 0
+    out: int = 0
+
+
+async def _nested_schema_scorer(s: _NestedSchemaInnerState) -> dict[str, int]:
+    return {"out": s.item}
+
+
+async def test_nested_fan_out_records_outermost_schema_version() -> None:
+    """Per spec §10.2: a ``CheckpointRecord``'s ``schema_version`` is the
+    outermost graph state's declared version (the record represents the
+    whole invocation tree). For a fan-out inside a subgraph, the
+    engine's ``_save_instance_completed`` / ``_save_instance_in_flight``
+    helpers read from the outermost state via
+    ``context.parent_states_prefix[0]`` rather than the subgraph state's
+    class. Closes the spec impl-review observation (a) on nested
+    fan-out schema_version drift."""
+    inner = (
+        GraphBuilder(_NestedSchemaInnerState)
+        .add_node("scorer", _nested_schema_scorer)
+        .add_edge("scorer", END)
+        .set_entry("scorer")
+        .compile()
+    )
+    middle = (
+        GraphBuilder(_NestedSchemaMiddleState)
+        .add_fan_out_node(
+            "fan",
+            subgraph=inner,
+            collect_field="out",
+            target_field="results",
+            count=3,
+            error_policy="fail_fast",
+        )
+        .add_edge("fan", END)
+        .set_entry("fan")
+        .compile()
+    )
+    cp = _CapturingCheckpointer()
+    outer = (
+        GraphBuilder(_NestedSchemaOuterState)
+        .add_subgraph_node("dispatch", middle)
+        .add_edge("dispatch", END)
+        .set_entry("dispatch")
+        .with_checkpointer(cp)
+        .compile()
+    )
+    await outer.invoke(_NestedSchemaOuterState())
+    # The fan-out's explicit "instance completed" saves fire from inside
+    # the subgraph context. Identify them by:
+    #   - ``parent_states`` has length 1 — only the outermost state, since
+    #     the fan-out lives one descent in (in the middle subgraph).
+    #     Inner-instance node saves have length-2 parent_states
+    #     ``(outer, middle)`` and use the INNER graph's
+    #     ``self.state_cls.schema_version`` via ``_maybe_save_checkpoint``,
+    #     which is a separate code path not targeted by this test.
+    #   - ``completed_positions`` is empty. The fan-out node's own
+    #     completion save (fired by ``_maybe_save_checkpoint`` after
+    #     fan-in) appends the ``fan`` position to completed_positions;
+    #     ``_save_instance_completed`` doesn't.
+    instance_completed_saves = [
+        s for s in cp.saves if len(s.parent_states) == 1 and not s.completed_positions
+    ]
+    assert instance_completed_saves, (
+        "expected at least one '_save_instance_completed' save "
+        "(non-empty parent_states + empty completed_positions); "
+        f"got saves with shapes: "
+        f"{[(len(s.parent_states), len(s.completed_positions)) for s in cp.saves]}"
+    )
+    for save in instance_completed_saves:
+        assert save.schema_version == "outer-v1", (
+            f"_save_instance_completed save's schema_version should be "
+            f"outermost's ('outer-v1'), got {save.schema_version!r} "
+            f"(subgraph state class declares 'middle-v1')"
+        )
 
 
 # ---------------------------------------------------------------------------