feat(prompts): prompt-management core (proposal 0017)#45
Conversation
Establishes the prompt-management subpackage with the three canonical error categories from spec §10: - PromptNotFound (non-transient): no prompt matches (name, label). - PromptRenderError (non-transient): undefined variable, template parse error, or variable-coercion failure. - PromptStoreUnavailable (transient): backend infrastructure failure (network, I/O, vendor API). Exports PROMPT_TRANSIENT_CATEGORIES mirroring the TRANSIENT_CATEGORIES frozenset in openarmature.llm.errors, so retry-middleware classifiers can identify transient prompt-management failures by category.
Pydantic models for the prompt-management capability shapes from spec §3, §4, and §9. Prompt carries the raw template source string plus identity metadata (name, version, label, template_hash, fetched_at, optional metadata). The raw-string representation keeps Prompt serializable and engine-agnostic; compilation happens on render. PromptResult propagates identity from the source Prompt and carries the rendered messages list (compatible with openarmature.llm.Message and directly consumable by Provider.complete()), the variables used, rendered_hash, and rendered_at. PromptGroup wraps an ordered N>=2 sequence of PromptResult instances with a stable group_name. The validator rejects empty and single-member groups per §9 (single-prompt tagging is already served by per-prompt observability attributes). Hashing helpers compute SHA-256 over UTF-8 bytes (template) and over a canonical JSON serialization with sort_keys + minimal separators (rendered). Both prefixed with 'sha256:' so future algorithm changes are self-describing.
PromptBackend is a runtime-checkable Protocol with a single async fetch(name, label) method, matching the openarmature.llm.Provider pattern. The docstring restates the §5 contract: reentrant, no mutation, raises PromptNotFound / PromptStoreUnavailable, and the rule that cached results MUST preserve the original fetched_at. PromptManager composes one or more PromptBackends and exposes: - fetch: §8 fallback semantics. First successful fetch wins; PromptNotFound STOPS the chain (logical absence MUST NOT silently substitute); PromptStoreUnavailable continues to the next backend; all-exhausted raises PromptStoreUnavailable with the last unavailable chained as __cause__. WARN-level log on each fallback per §8. - render: synchronous string transform via Jinja2 with StrictUndefined per §7. Produces a single UserMessage in v1 (multi-message decomposition deferred). UndefinedError and TemplateError both map to PromptRenderError carrying the prompt's identity + the variables + a description. Pydantic ValidationError on the UserMessage(content=rendered_text) construction (empty-string render case) also maps to PromptRenderError per §10's 'variable's value not coercible' framing. - get: convenience equivalent to render(await fetch(...), variables). Adds jinja2>=3.1 to runtime dependencies.
FilesystemPromptBackend reads prompts from <root>/<label>/<name>.j2.
The subdirectory-per-label layout keeps name-collisions across
labels distinct without prefix-escape concerns. version is
derived from the first 12 hex chars of the template_hash so two
file contents map deterministically to two distinct versions
without needing a sidecar metadata file (spec §3 lets backends
pick any stable identifier). The docstring notes that future
caching backends MUST preserve the original fetched_at on
returned Prompts per spec §3.
Adds the context-variable propagation mechanism for spec §11
LLM-call span attributes:
- openarmature.prompts.context module exposes
with_active_prompt(result) and with_active_prompt_group(group)
context managers plus current_prompt_result() /
current_prompt_group() inspectors.
- OTelObserver._on_llm_event reads the two ContextVars at LLM-
call span start and surfaces:
openarmature.prompt.name
openarmature.prompt.version
openarmature.prompt.label
openarmature.prompt.template_hash
openarmature.prompt.rendered_hash
openarmature.prompt.group_name
- Nesting is innermost-wins (matches Python's natural ContextVar
token-stacking behavior; spec §11 doesn't mandate a policy).
The attribute names match spec §11's normative list. The
mechanism (context variables) is one of the two example
mechanisms §11 names; bundling it now keeps the §11 surface
discoverable from the moment prompt-management lands.
Adds prompt-management as the fifth conformance capability: - harness/prompt_management.py — typed YAML models for the new fixture shape (backends + manager + calls with target / operation / capture_as, plus per-call and top-level expected blocks for raises / result_equivalence / prompt_group / rendered_hash_equal / rendered_hash_different). - harness/fixtures.py — PromptManagementFixture added to the discriminated union; the discriminator recognizes top-level 'backends:' (without 'mock_provider:') as the prompt-management shape. - harness/loader.py — 'prompt-management' added to CAPABILITIES so test_fixture_parsing.py discovers and parses the new fixtures. test_prompt_management.py drives all 12 spec fixtures (001-fetch-success through 012-prompt-result-rendered-hash-stability) against the real PromptManager + a MockPromptBackend that implements the protocol with optional simulate_unavailable + preloaded prompts + a call_count for fixtures that assert fallback chain visits. All 12 fixtures pass.
Adds tests/unit/test_prompts.py (25 tests) covering gaps the conformance fixtures don't exercise directly: - error categories match spec §10 strings; PROMPT_TRANSIENT_CATEGORIES contains only prompt_store_unavailable. - error attribute carriage (PromptNotFound name/label/backend, PromptRenderError name/version/label/variables/description). - template_hash / rendered_hash determinism, prefix, and length; divergence for different inputs. - Prompt extra-field rejection; PromptGroup 0/1-member rejection and 2+ acceptance. - PromptManager construction (zero-backend rejection). - Empty-string render output boundary wrap (the spec-agent's concern about Jinja2 cleanly rendering '' but UserMessage rejecting empty content — verified to surface as PromptRenderError). - Identity-field propagation from Prompt to PromptResult on render. - FilesystemPromptBackend disk I/O: success path, missing file raises PromptNotFound, OSError that isn't FileNotFoundError raises PromptStoreUnavailable. - Context-var propagation: with_active_prompt / _prompt_group set + reset, innermost-wins nesting, async-task visibility. - PromptManager fallback gaps: first-match short-circuits later backends; render returns a UserMessage carrying the rendered text. Adds two OTel observer tests under tests/unit/test_observability_otel.py: - Active prompt + active prompt group propagates the six openarmature.prompt.* span attributes (name, version, label, template_hash, rendered_hash, group_name) on the openarmature.llm.complete span. - Without an active prompt, the LLM-call span carries no openarmature.prompt.* attributes.
docs/concepts/prompts.md walks through the prompt-management capability: the fetch + render split (and why both, not just get()), Prompt identity fields, strict-by-default variables, composite-backend fallback (PromptStoreUnavailable continues, PromptNotFound stops), the three error categories, PromptGroup for tracing related prompts, observability propagation via with_active_prompt and the six normative openarmature.prompt.* attributes, determinism + content-addressed caching, a minimal example, and what's out of scope (vendor backends, versioning workflows, cache invalidation, multi-message decomposition). docs/reference/prompts.md is an mkdocstrings autodoc page in the same shape as docs/reference/llm.md. mkdocs.yml gains the two new pages in the Concepts and Reference nav sections. CHANGELOG.md adds two entries under [Unreleased]: - the new openarmature.prompts subpackage with PromptManager, the three error categories, FilesystemPromptBackend, and the jinja2>=3.1 runtime dependency. - the observability propagation surface in openarmature.prompts.context plus the OTel observer wiring.
There was a problem hiding this comment.
Pull request overview
Implements proposal 0017 (prompt-management core) in a new openarmature.prompts subpackage: typed Prompt / PromptResult / PromptGroup models, a PromptBackend protocol, a PromptManager that composes backends with §8 fallback semantics and renders via Jinja2 StrictUndefined, three canonical error categories, a FilesystemPromptBackend reference implementation, and with_active_prompt / with_active_prompt_group context managers wired into the OTel observer for §11 span-attribute propagation. Adds a conformance harness shape for the 12 new prompt-management fixtures.
Changes:
- New
openarmature.promptssubpackage (models, errors, manager, FS backend, hashing helpers, context vars) withjinja2>=3.1as a new runtime dep. - OTel observer surfaces six
openarmature.prompt.*attributes onopenarmature.llm.completespans when called inside the new context managers. - Conformance harness extended with
PromptManagementFixture, a YAML model layer, and a parametrized test runner using an in-processMockPromptBackend.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
src/openarmature/prompts/__init__.py |
Public API re-exports for the new subpackage. |
src/openarmature/prompts/prompt.py |
Prompt / PromptResult Pydantic models with extra="forbid". |
src/openarmature/prompts/group.py |
PromptGroup with N≥2 member validator. |
src/openarmature/prompts/errors.py |
Three canonical error classes + category constants + transient frozenset. |
src/openarmature/prompts/backend.py |
Runtime-checkable PromptBackend Protocol. |
src/openarmature/prompts/manager.py |
Fallback fetch, Jinja2 strict render with empty-string boundary wrap, get convenience. |
src/openarmature/prompts/hashing.py |
SHA-256 helpers (sha256: prefix) over template source and canonical message JSON. |
src/openarmature/prompts/context.py |
ContextVar-backed with_active_prompt / with_active_prompt_group. |
src/openarmature/prompts/backends/filesystem.py |
Reference FS backend; reads <root>/<label>/<name>.j2, derives version from hash prefix. |
src/openarmature/prompts/backends/__init__.py |
Backend re-export. |
src/openarmature/observability/otel/observer.py |
LLM-span attribute propagation reading the prompt/group context vars. |
pyproject.toml, uv.lock |
Add jinja2>=3.1 runtime dependency. |
tests/conformance/harness/prompt_management.py |
Typed YAML fixture models for the new shape. |
tests/conformance/harness/fixtures.py |
Discriminated-union registration of PromptManagementFixture. |
tests/conformance/harness/loader.py |
Adds prompt-management to known capabilities. |
tests/conformance/test_prompt_management.py |
Fixture runner with MockPromptBackend + per-call / top-level assertions. |
tests/unit/test_prompts.py |
Unit coverage for types, hashing, FS backend, context vars, manager edge cases. |
tests/unit/test_observability_otel.py |
OTel-attribute propagation present/absent cases. |
docs/concepts/prompts.md, docs/reference/prompts.md, mkdocs.yml |
Concept page, API reference, nav update. |
CHANGELOG.md |
[Unreleased] entries for the prompt-management capability and observability propagation. |
Comments suppressed due to low confidence (2)
src/openarmature/prompts/backends/filesystem.py:53
path.read_text(encoding="utf-8")can raiseUnicodeDecodeErrorif a template file contains invalid UTF-8.UnicodeDecodeErroris a subclass ofValueError, notOSError, so it bypasses bothexceptbranches and propagates as an uncategorized exception out offetch(). This means a corrupt prompt file surfaces as a generic Python exception instead of the spec's canonicalPromptStoreUnavailable(or arguablyPromptNotFound), defeating the §8 fallback chain (PromptManager.fetchwon't fall through to the next backend). Consider catchingUnicodeDecodeErrorexplicitly (orExceptionnarrowly aroundread_text) and mapping toPromptStoreUnavailable.
try:
template_source = await asyncio.to_thread(path.read_text, encoding="utf-8")
except FileNotFoundError as exc:
raise PromptNotFound(
f"prompt ({name!r}, {label!r}) not found under {self._root}",
name=name,
label=label,
backend=str(self._root),
) from exc
except OSError as exc:
raise PromptStoreUnavailable(
f"filesystem I/O error reading ({name!r}, {label!r}): {exc}"
) from exc
src/openarmature/prompts/backends/filesystem.py:49
nameandlabelare passed into the backend without any path-component validation. If a caller (or upstream code that forwards user input) passes a value containing..or an absolute path,self._root / label / f"{name}.j2"will silently traverse outside the configured root (Path("/etc") / "passwd"yields/etc/passwd, and..segments are not normalized away by/). Even though prompt names are typically developer-controlled, this is a reference backend and the public API contract onPromptBackend.fetchdoesn't forbid externally-sourced names. Consider rejecting names/labels containing path separators or..components, or resolving and verifyingpath.is_relative_to(self._root.resolve())before reading.
async def fetch(self, name: str, label: str = "production") -> Prompt:
path = self._root / label / f"{name}.j2"
try:
template_source = await asyncio.to_thread(path.read_text, encoding="utf-8")
except FileNotFoundError as exc:
raise PromptNotFound(
f"prompt ({name!r}, {label!r}) not found under {self._root}",
name=name,
label=label,
backend=str(self._root),
) from exc
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- manager.py: hoist Jinja2 Environment to module-level singleton (stateless config; thread-safe for compile + render; avoids re-parsing config on every render call), keep the autoescape-disabled-by-design comment. - errors.py: PromptStoreUnavailable carries optional name / label / backends_tried for operator diagnosability; PromptManager's aggregate raise populates backends_tried with the ordered list of consulted backends. PromptRenderError docstring documents spec §10's non-transient mandate. - backends/filesystem.py: widen the version-prefix length from 12 to 16 hex chars (~64 bits; birthday-paradox boundary at ~4B templates), document the rationale + the wider-prefix / alternative-identifier guidance for higher-scale backends. Also carries name / label on PromptStoreUnavailable raises. - observability/otel/observer.py: hoist prompts.context import to module top-level (no longer optional; cost off the per-event hot path). - harness/fixtures.py: tighten the prompt-management discriminator from `backends:` alone to `backends:` co-occurring with `calls:` AND absence of graph-shape keys; avoids silently misrouting future fixtures that introduce a backends list for some other purpose. - test_prompt_management.py: lift per-call call-count assertions out of the raises branch so they apply on both success and error paths; add internal-consistency check that a fixture's fields_must_match and fields_may_differ sets don't overlap. - test_prompts.py: mock Path.read_text for the OSError-routing test instead of relying on platform-dependent NotADirectoryError behavior; update the version-prefix length assertion to match the widened 16-char prefix.
Memory rule: no em dashes in user-facing copy. Reworded the new docs/concepts/prompts.md to use colons, semicolons, parens, or sentence restructuring in place of em dashes.
Sweep of leftover em dashes from PR-1/PR-2 docs that slipped past the no-em-dashes-in-user-facing-copy rule. Same substitutions as the prompts.md cleanup (colons, semicolons, parens, or sentence restructuring).
- CHANGELOG.md: update 12 → 16 hex chars to match the widened FilesystemPromptBackend.version derivation. - prompt.py: PromptResult.messages gains Field(min_length=1) so the spec §4 'Ordered non-empty sequence' mandate is enforced at the type boundary, not just by the construction path. - errors.py: PromptStoreUnavailable gains an optional causes list[BaseException] attribute carrying per-backend exceptions index-aligned to backends_tried. - manager.py: aggregate raise populates causes with the per-backend exceptions in fallback order, while keeping the __cause__ chain pointing at the last unavailable for stack-trace continuity. - manager.py: PromptManager carries a per-instance dict[str, jinja2.Template] keyed by template_hash. Render consults the cache and only re-parses on miss. Unbounded for v1 (typical apps have O(10) prompts; an LRU follow-on can land if benchmarks show memory pressure). template_hash is content-derived, so cache invalidation is automatic when a backend returns updated content. - test_prompts.py: new tests for empty-messages rejection and for the compiled-template cache hit behavior.
- harness/prompt_management.py: fix misleading comment on FixtureExpectedRaises.carries (secondary_backend_call_count is a sibling field on FixtureExpectedPerCall, not inside carries). - manager.py: replace 'assert causes' with an explicit 'if not causes: raise RuntimeError(...)' guard so the invariant holds under 'python -O' (asserts stripped) and surfaces as a clear RuntimeError rather than an opaque IndexError if a future change ever silently swallows an exception in the fallback loop. - test_prompts.py: rewrite the active-prompt-in-nested-async-function test to spawn via asyncio.create_task so it actually exercises context-copy across the task boundary, matching the function name's implied claim. The previous form's await ran in the same context where ContextVar propagation is trivially expected.
Consolidated release for the five-PR batch: - Structured output (proposal 0016, PR #42) - Image content blocks (proposal 0015, PR #44) - Prompt management (proposal 0017, PR #45) - State migration for checkpoints (proposal 0014, PR #46) - Parallel branches (proposal 0011, PR #47) Bumps: - ``pyproject.toml`` project.version: 0.5.0 → 0.6.0 - ``__version__`` in src/openarmature/__init__.py - ``uv.lock`` editable package version - ``tests/test_smoke.py`` version assertion Flips CHANGELOG ``[Unreleased]`` to ``[0.6.0] — 2026-05-16``, drops the release-gate Notes entry, and tightens the pre-1.0 MINOR note to list the two behavioral changes (retry-MW attempt-index propagation, CheckpointRecord.schema_version semantic shift) instead of the structured-output-specific note carried over from PR-1. Pinned spec stays at v0.16.1 (set in PR #47).
Summary
openarmature.promptssubpackage. PR-3 of the five-PR batch following PR-1 (feat(llm): structured output (proposal 0016) #42, proposal 0016) and PR-2 (feat(llm): image content blocks (proposal 0015) #44, proposal 0015).PromptManagercomposes one or morePromptBackends, exposesfetch/render/get, applies the §8 fallback contract (prompt_store_unavailablefalls through to the next backend;prompt_not_foundstops the chain), and renders templates through Jinja2'sStrictUndefinedper §7.Prompt/PromptResult/PromptGroupPydantic models match spec §3 / §4 / §9 exactly.PromptGrouprequireslen(members) >= 2.PromptNotFound,PromptRenderError,PromptStoreUnavailable) with category-string constants andPROMPT_TRANSIENT_CATEGORIES = frozenset({"prompt_store_unavailable"})exported in the same shape asopenarmature.llm.errors.TRANSIENT_CATEGORIES.FilesystemPromptBackendis the minimum local-filesystem reference backend (layout:<root>/<label>/<name>.j2;versionderived from the first 12 hex chars of the template's SHA-256 hash).with_active_prompt(result)andwith_active_prompt_group(group)context managers +current_prompt_result()/current_prompt_group()inspectors. When an LLM call fires inside one of those contexts, the OTel observer surfaces the six normativeopenarmature.prompt.*attributes on theopenarmature.llm.completespan. Nesting is innermost-wins.What's new
openarmature.prompts__init__re-exports the full public API).Prompt,PromptResult,PromptGroupPromptBackendfetch(name, label)method.PromptManagerfetch(with fallback) /render(sync, Jinja2 strict) /getconvenience.FilesystemPromptBackend<root>/<label>/<name>.j2from disk.PromptNotFound/PromptRenderError/PromptStoreUnavailablePROMPT_TRANSIENT_CATEGORIEScompute_template_hash,compute_rendered_hash"sha256:"prefix.with_active_prompt,with_active_prompt_groupOTelObserverjinja2>=3.1docs/concepts/prompts.md+docs/reference/prompts.mdRelease gate
PR-3 of a five-PR batch (
0016→0015→0017→0014→0011). Do not tag a release until all five land — the CHANGELOG[Unreleased]Notes section carries the gate from PR-1.Commits
feat(prompts): error classes and category constantsfeat(prompts): Prompt, PromptResult, PromptGroup typesfeat(prompts): PromptBackend protocol, PromptManager, jinja2 depfeat(prompts): FilesystemPromptBackend + OTel attribute propagationtest(conformance): prompt-management harness and 12 fixturestest(unit): prompts subpackage + OTel attribute propagationdocs: prompts concept page, API reference, changelogNotable implementation details
""through Jinja2 (e.g.,{{ x if x else '' }}withx=None) would constructUserMessage(content="")which Pydantic rejects. The render path catches thatValidationErrorand re-raises asPromptRenderErrorper §10's "variable's value not coercible" framing. Surfaces the prompt's identity + the variables + the underlying description.name,version,label,template_hash,rendered_hash,group_name) match the normative §11 list.with_active_promptandwith_active_prompt_groupare active, the per-prompt attributes AND thegroup_nameattribute fire on the span.fetched_aton returned Prompts per §3; the FS backend's docstring notes the rule applies to caching backends even though FS doesn't cache.[UserMessage(content=rendered_text)]. Multi-message split convention deferred to a follow-on if real patterns surface.Conformance harness extensions
tests/conformance/harness/prompt_management.pyadds typed YAML models for the new fixture shape (backends:+manager:+calls:withtarget/operation/capture_as, plus per-call and top-levelexpectedblocks forraises/result_equivalence/prompt_group/rendered_hash_equal/rendered_hash_different).harness/fixtures.pyregistersPromptManagementFixturein the discriminated union; the discriminator recognizes top-levelbackends:(withoutmock_provider:) as the prompt-management shape.harness/loader.pyaddsprompt-managementtoCAPABILITIESsotest_fixture_parsing.pydiscovers and parses the 12 new fixtures.tests/conformance/test_prompt_management.pyparametrizes over the 12 fixtures and drives them against the realPromptManager+ an in-processMockPromptBackend. No I/O.Test plan
uv run pytest— 602 pass, 79 skipped (up from 73; +6 new docs example snippets inprompts.md), 0 failed.uv run pyright— clean.uv run ruff check+uv run ruff format— clean.uv run --group docs mkdocs build --strict— clean.001-fetch-successthrough012-prompt-result-rendered-hash-stability) pass.tests/unit/test_prompts.pycovering construction, error attribute carriage, hashing determinism, FilesystemPromptBackend disk I/O, context-var propagation, empty-string render boundary wrap, and PromptManager fallback semantics.tests/unit/test_observability_otel.pycovering attribute propagation (both present-with-context and absent-without-context cases).docs/concepts/prompts.mdend-to-end againstmkdocs serveto confirm rendering of code blocks and cross-links.Pre-1.0 SemVer
Additive change. New subpackage; no existing surface modified except the OTel observer (which gained a no-op-when-no-active-prompt branch). Existing callers see no behavior change. Jinja2 is added to required runtime dependencies — downstream
openarmatureconsumers will pick it up on nextpip install -U/uv sync.