Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **State migration for checkpointed graphs (proposal 0014, introduced in spec v0.15.0; refined by proposal 0018 in spec v0.16.0).** Saved checkpoints whose `schema_version` doesn't match the current state class now route through a registered migration chain instead of failing on resume. Surface: `State.schema_version: ClassVar[str] = ""` (declare a non-empty value to opt in), `GraphBuilder.with_state_migration(from_version, to_version, migrate)` and `with_state_migrations(*migrations)` for registration, `StateMigration` and `MigrationRegistry` types exported from `openarmature.checkpoint`. Chain resolution is BFS over the registered edges; the shortest path wins. Three new error categories: `CheckpointStateMigrationChainAmbiguous` (proposal 0018: duplicate `(from, to)` pair at registration time, or multiple distinct shortest paths between the saved and current versions at resume time), `CheckpointStateMigrationMissing` (no chain bridges the versions), and `CheckpointStateMigrationFailed` (a migration function raised). All non-transient. Post-migration deserialization failures still route to `CheckpointRecordInvalid` per §10.12.4. The same chain applies to each entry in `parent_states` in lockstep with the outer state per §10.12.2. Routing precedence per §10.10 (v0.16.0): chain-ambiguous → missing → failed → record-invalid.
- **`Checkpointer.supports_state_migration` Protocol attribute.** Marks whether a backend can expose the structural intermediate form (a plain dict, JSON tree) the migration registry consumes. `SQLiteCheckpointer(serialization="json")` opts in; `SQLiteCheckpointer(serialization="pickle")` and `InMemoryCheckpointer` opt out. On version mismatch against a non-migration-eligible backend the engine raises `CheckpointRecordInvalid` per spec §10.12.1.
- **`openarmature.checkpoint.migrate` OTel span (proposal 0014 §6 cross-ref).** Versioned resumes whose migration chain runs emit a zero-duration `openarmature.checkpoint.migrate` span on the OTel observer, parented under the invocation root span. Attributes: `openarmature.checkpoint.migrate.from_version`, `openarmature.checkpoint.migrate.to_version` (the final target), `openarmature.checkpoint.migrate.chain_length`. The §10.12.3 fast path (versions match, registry not consulted) emits no span. Engine-side: a synthetic `checkpoint_migrated` observer phase carries a `_MigrationSummary` payload from `_migrate_record` through to the OTel observer; the new phase is gated off default subscriptions (observers opt in explicitly via `phases={..., "checkpoint_migrated"}`).
- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 16 hex chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
Expand All @@ -22,7 +25,9 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Changed

- **Pinned spec version: 0.10.0 → 0.15.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.15.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017) in one bump. Only the surface introduced by proposal 0016 is implemented in this changelog entry; fixtures from 0011 / 0014 / 0015 / 0017 are marked deferred-skip in the conformance suite and unmark as their respective PRs land.
- **Pinned spec version: 0.10.0 → 0.16.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.16.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017, 0018) in one bump. Only the surfaces introduced by proposals 0014–0017 are implemented in the batch's release; fixtures from 0011 are deferred-skip in the conformance suite and unmark with PR-5.
- **`CheckpointRecord.schema_version` semantic shift (proposal 0014).** Previously a backend-internal record-shape version (`CHECKPOINT_SCHEMA_VERSION = "1"` constant), now the user-facing state-schema version per spec §10.2. The framework reads `type(state).schema_version` at save time. Pre-PR-4 records carrying `"1"` are reinterpreted as user-facing v1 identifiers; users with such records either declare `schema_version="1"` on their state class or discard the pre-PR-4 records. `SQLiteCheckpointer` no longer rejects records with non-default `schema_version` at the backend boundary; version-mismatch routing is now an engine concern at resume time. The `CHECKPOINT_SCHEMA_VERSION` module constant is removed; future record-shape evolution can add backend-private metadata fields if needed.
- **`NodeEvent.pre_state` typed `Any` (was `State`).** Required by the new `checkpoint_migrated` phase which carries a `_MigrationSummary` payload rather than a `State` instance. Observer authors who type-narrowed `pre_state` to `State` should treat it as `Any` and narrow per-phase (e.g., `if event.phase == "completed": ...`). The `checkpoint_saved` phase already carried a State-flavored shape (not necessarily a typed `State` subclass instance), so this widens the declared type to match runtime reality rather than introducing a new constraint.

### Notes

Expand Down
128 changes: 128 additions & 0 deletions docs/concepts/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,134 @@ multi-process), S3 (cross-region durability). For event-sourced
runtimes (Temporal, DBOS, Restate, Inngest) the Protocol is the
adapter layer.

## State migrations

When a checkpoint was saved against an earlier version of your state
schema and the code has since evolved, the engine consults a
**migration registry** to bridge the saved record into the current
shape. Without migrations, a schema change invalidates every prior
checkpoint; with one short registration per change, you keep your
saved records working across releases.

The wire-up is two pieces: declare a version on your state class,
and register one migration per version bump.

```python
from typing import ClassVar
from openarmature.graph import State, GraphBuilder
from openarmature.checkpoint import SQLiteCheckpointer


class MyState(State):
schema_version: ClassVar[str] = "v2"
x: int = 0
new_field: str = "default" # added in v2


def add_new_field_default(state: dict) -> dict:
return {**state, "new_field": "default"}


graph = (
GraphBuilder(MyState)
.add_node(...)
.with_checkpointer(SQLiteCheckpointer("ck.db", serialization="json"))
.with_state_migration("v1", "v2", add_new_field_default)
.compile()
)
```

On resume, the engine reads the saved record's `schema_version`. If
it equals `MyState.schema_version`, the record loads via the §10.4
fast path (no migration consulted). If it differs, the engine
resolves a chain through the registry (BFS for the shortest path),
applies each migration in order to the record's state, then
deserializes the result into your current state class.

### Chain resolution

Registered migrations form a directed graph. Each
`with_state_migration(a, b, fn)` is an edge from `a` to `b`. Chain
resolution finds the shortest path between the saved version and the
current version. Branching is fine: a v1 record can have one
migration leading to v2 and another leading to v2-experimental;
chain resolution picks the path that ends at the current declared
version.

Two ambiguity cases are configuration errors. Both surface as
`CheckpointStateMigrationChainAmbiguous`:

- **Duplicate edges.** Registering two migrations with the same
`(from_version, to_version)` pair raises at registration time so
the configuration error surfaces before any resume attempt.
Either delete one or pick distinct version identifiers.
- **Multiple shortest paths.** A diamond like
`v1 → v2 → v4` and `v1 → v3 → v4` is ambiguous: both paths have
length 2. The engine raises during resume so the user can
register fewer migrations or pick a single canonical route.

### The three new error categories

- **`CheckpointStateMigrationChainAmbiguous`**: the registered
migration graph is ambiguous (duplicate `(from, to)` pair at
registration time, OR multiple distinct shortest paths between
the saved and current versions at resume time). Surfaces before
any migration function runs. Carries `from_version` and
`to_version` when known.
- **`CheckpointStateMigrationMissing`**: the saved version doesn't
match the current version, and no chain bridges them. Carries
`from_version`, `to_version`, a count of registered migrations,
and a human-readable `registry_description` so operators see what
IS available.
- **`CheckpointStateMigrationFailed`**: a user-supplied migration
function raised. Subsequent migrations in the chain don't run;
the resume fails. The migration's exception rides `__cause__`.

Routing precedence on resume: chain-ambiguous → missing → failed →
record-invalid.

A third category, `CheckpointRecordInvalid`, continues to cover the
**post**-migration case: a migration ran cleanly but produced
output that the current state class can't deserialize (missing a
required field, wrong type, etc.). The three categories are
mutually exclusive on any given resume.

### Backend support

Not every backend can migrate. Migration needs the backend to expose
a **structural intermediate form** of the loaded state (a plain
dict, JSON tree, or similar) that's independent of the current
state class.

- **`SQLiteCheckpointer(serialization="json")`** can. JSON-encoded
state loads to a dict; the migration function operates on the
dict directly.
- **`SQLiteCheckpointer(serialization="pickle")`** can NOT. Pickle
holds class identity and round-trips back to typed instances.
- **`InMemoryCheckpointer`** can NOT. It holds live typed-state
references by reference; there's no serialization step.

On version mismatch against a non-migration-eligible backend, the
engine raises `CheckpointRecordInvalid` (not
`CheckpointStateMigrationMissing`): the registry has no chance to
bridge.

### Parent-state migration

Subgraph saves carry a `parent_states` chain of the outer-graph
state captured at the moment of the inner save. On resume, the same
migration chain applies to each entry in `parent_states` in lockstep
with the outer state. The spec treats `parent_states` as carrying
the same `schema_version` as the outer record (no per-parent
version metadata in v1).

### Migrations MUST be pure

A migration function MUST be deterministic, with no I/O, no implicit
state, no random or wall-clock-derived output. The framework
doesn't enforce purity, but violating it breaks determinism
guarantees for resume.

## When NOT to use checkpointing

- **Pure pipelines that complete in seconds.** Restart-from-entry is
Expand Down
2 changes: 1 addition & 1 deletion openarmature-spec
Submodule openarmature-spec updated 44 files
+76 −0 .github/workflows/docs.yml
+8 −1 .gitignore
+53 −37 CHANGELOG.md
+2 −2 README.md
+1 −0 docs/capabilities/graph-engine.md
+1 −0 docs/capabilities/llm-provider.md
+1 −0 docs/capabilities/observability.md
+1 −0 docs/capabilities/pipeline-utilities.md
+1 −0 docs/capabilities/prompt-management.md
+1 −0 docs/changelog.md
+1 −0 docs/governance.md
+157 −0 docs/index.md
+17 −0 docs/javascripts/header-link.js
+6 −0 docs/javascripts/tablesort.js
+6 −0 docs/javascripts/tablesort.min.js
+0 −34 docs/openarmature.md
+29 −0 docs/proposals.md
+1 −0 docs/proposals/0001-graph-engine-foundation.md
+1 −0 docs/proposals/0002-subgraph-explicit-mapping.md
+1 −0 docs/proposals/0003-node-boundary-observer-hooks.md
+1 −0 docs/proposals/0004-pipeline-utilities-middleware.md
+1 −0 docs/proposals/0005-pipeline-utilities-parallel-fan-out.md
+1 −0 docs/proposals/0006-llm-provider-core.md
+1 −0 docs/proposals/0007-observability-otel-span-mapping.md
+1 −0 docs/proposals/0008-pipeline-utilities-checkpointing.md
+1 −0 docs/proposals/0009-pipeline-utilities-per-instance-fan-out-resume.md
+1 −0 docs/proposals/0010-drain-timeout.md
+1 −0 docs/proposals/0011-pipeline-utilities-parallel-branches.md
+1 −0 docs/proposals/0012-graph-engine-completed-event-after-edges.md
+1 −0 docs/proposals/0013-fan-out-config-on-node-event.md
+1 −0 docs/proposals/0014-pipeline-utilities-state-migration.md
+1 −0 docs/proposals/0015-llm-provider-multimodal-images.md
+1 −0 docs/proposals/0016-llm-provider-structured-output.md
+1 −0 docs/proposals/0017-prompt-management-core.md
+1 −0 docs/proposals/0018-state-migration-chain-ambiguity.md
+353 −0 docs/stylesheets/extra.css
+132 −0 mkdocs.yml
+42 −0 mkdocs_hooks.py
+231 −0 proposals/0018-state-migration-chain-ambiguity.md
+18 −0 pyproject.toml
+65 −0 spec/pipeline-utilities/conformance/047-state-migration-chain-ambiguous.md
+96 −0 spec/pipeline-utilities/conformance/047-state-migration-chain-ambiguous.yaml
+39 −16 spec/pipeline-utilities/spec.md
+739 −0 uv.lock
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Repository = "https://github.com/LunarCommand/openarmature-python"
Specification = "https://github.com/LunarCommand/openarmature-spec"

[tool.openarmature]
spec_version = "0.15.0"
spec_version = "0.16.0"

[dependency-groups]
dev = [
Expand Down
2 changes: 1 addition & 1 deletion src/openarmature/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""OpenArmature — workflow framework for LLM pipelines and tool-calling agents."""

__version__ = "0.5.0"
__spec_version__ = "0.15.0"
__spec_version__ = "0.16.0"
11 changes: 9 additions & 2 deletions src/openarmature/checkpoint/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,12 @@
CheckpointNotFound,
CheckpointRecordInvalid,
CheckpointSaveFailed,
CheckpointStateMigrationChainAmbiguous,
CheckpointStateMigrationFailed,
CheckpointStateMigrationMissing,
)
from .migration import MigrationRegistry, StateMigration
from .protocol import (
CHECKPOINT_SCHEMA_VERSION,
Checkpointer,
CheckpointFilter,
CheckpointRecord,
Expand All @@ -37,17 +40,21 @@
)

__all__ = [
"CHECKPOINT_SCHEMA_VERSION",
"CheckpointError",
"CheckpointFilter",
"CheckpointNotFound",
"CheckpointRecord",
"CheckpointRecordInvalid",
"CheckpointSaveFailed",
"CheckpointStateMigrationChainAmbiguous",
"CheckpointStateMigrationFailed",
"CheckpointStateMigrationMissing",
"CheckpointSummary",
"Checkpointer",
"InMemoryCheckpointer",
"MigrationRegistry",
"NodePosition",
"SQLiteCheckpointer",
"SerializationMode",
"StateMigration",
]
17 changes: 17 additions & 0 deletions src/openarmature/checkpoint/backends/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,25 @@ class InMemoryCheckpointer:
Pydantic state instance the engine produces is what comes back
from :meth:`load` — no serialization round-trip. (This is the
feature: tests can assert on the saved state's identity.)

**State-migration eligibility:** none. Per spec §10.12.1, a
backend supports migration only when it can expose a structural
intermediate form of the loaded state independent of the current
state class. This backend holds live typed instances by
reference, so a version mismatch on resume raises
``CheckpointRecordInvalid`` rather than consulting the
migration registry.
"""

# Per spec §10.12.1: in-memory storage holds live typed-state
# references, so there's no class-independent intermediate form
# the migration registry could consume. Declared at the class
# level (not as a per-instance attribute) since the answer is
# constructor-independent; the Protocol declaration in
# ``protocol.py`` types this as ``bool`` (not ``ClassVar[bool]``)
# so Pyright accepts a class-attribute override here.
supports_state_migration: bool = False

def __init__(self) -> None:
self._records: dict[str, CheckpointRecord] = {}
self._lock = asyncio.Lock()
Expand Down
20 changes: 13 additions & 7 deletions src/openarmature/checkpoint/backends/sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@

from ..errors import CheckpointRecordInvalid
from ..protocol import (
CHECKPOINT_SCHEMA_VERSION,
CheckpointFilter,
CheckpointRecord,
CheckpointSummary,
Expand Down Expand Up @@ -109,6 +108,13 @@ def __init__(
self._serialization: SerializationMode = serialization
self._lock = asyncio.Lock()
self._initialized = False
# Per spec §10.12.1, a backend supports state migration only
# when it can expose a structural intermediate form of the
# loaded state that is independent of the current state
# class. JSON serialization satisfies this (loads to dicts);
# pickle holds class identity and round-trips to typed
# instances, so it cannot bridge a schema-version mismatch.
self.supports_state_migration: bool = serialization == "json"

def _connect(self) -> sqlite3.Connection:
conn = sqlite3.connect(self._path)
Expand Down Expand Up @@ -230,12 +236,12 @@ def _do() -> tuple[Any, ...] | None:
schema_version,
recorded_serialization,
) = row
if schema_version != CHECKPOINT_SCHEMA_VERSION:
raise CheckpointRecordInvalid(
invocation_id,
f"persisted schema_version={schema_version!r} does not match "
f"current {CHECKPOINT_SCHEMA_VERSION!r}",
)
# Note: per spec §10.12 (proposal 0014), version mismatches
# are no longer rejected at the backend boundary. The engine
# routes mismatches through the migration registry on resume
# (CheckpointStateMigrationMissing if no chain, else applies
# the chain). The backend just round-trips the version
# identifier as opaque data.
state = self._decode(state_blob, recorded_serialization, invocation_id)
position_dicts = self._decode(positions_blob, recorded_serialization, invocation_id)
parent_states = self._decode(parent_states_blob, recorded_serialization, invocation_id)
Expand Down
Loading