Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 12 chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
Comment thread
chris-colinsky marked this conversation as resolved.
Outdated
- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
- **`OpenAIProvider` content-array wire mapping.** When `UserMessage.content` is a content-block sequence, the wire body uses OpenAI's `content` array per §8.1.1. `TextBlock → {type: "text", text}`. `ImageBlock` with a URL source maps to `{type: "image_url", image_url: {url, detail?}}`. `ImageBlock` with an inline source constructs an RFC 2397 `data:<media_type>;base64,<base64_data>` URI and goes through the same `image_url` entry shape. Inline bytes pass through unchanged — no inspection, transcoding, or re-encoding.
- **New error category `ProviderUnsupportedContentBlock` (non-transient).** Raised when the bound model rejects a content block type / media variant. Distinct from `ProviderInvalidRequest` (which covers spec-shape malformation): this category surfaces a *capability* mismatch, letting callers route differently (e.g., fall back to a multimodal-capable provider) without overloading the malformed-request category. Carries `block_type` ("image" / "audio" / "video") and `reason` (provider's human-readable message) when those are recoverable from the rejection. `OpenAIProvider` detects content rejection via HTTP 400 bodies — heuristic on `error.code` (known set: `image_content_not_supported`, `unsupported_image_media_type`, `audio_content_not_supported`, etc.), `error.type` (`image_parse_error`), and `error.message` ("does not support" + image/audio/video).
Expand Down
278 changes: 278 additions & 0 deletions docs/concepts/prompts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
# Prompts

Named, versioned, content-addressed prompts. OpenArmature's
prompt-management capability separates *fetching* a template
from *rendering* it, lets you compose multiple backends with
explicit fallback, and propagates prompt identity to your
observability backend so trace UIs can pivot on the prompt
that produced a call.

Skip ahead to [a minimal example](#a-minimal-example) if you
want code first.

## The two halves: fetch and render

A `PromptBackend` knows how to find a template by `name` and
`label`; nothing more. A `PromptManager` composes one or more
backends and adds rendering on top:

```python
from openarmature.prompts import PromptManager, FilesystemPromptBackend

manager = PromptManager(FilesystemPromptBackend("./prompts"))

# Fetch returns a Prompt (the raw template + identity metadata).
prompt = await manager.fetch("greeting", "production")

# Render applies variables and returns a PromptResult (the
# rendered messages plus a content-addressed identity).
result = manager.render(prompt, {"user": "Alice"})

# Or do both in one shot:
result = await manager.get("greeting", "production", {"user": "Alice"})
```

Why two operations instead of one? Three reasons:

- **Inspect templates without binding variables.** Schema
validation, prompt diffing, tooling that walks the prompt
catalogue.
- **Cache templates separately from rendered output.** The
fetch step is the I/O step; rendering is pure local
computation.
- **Render the same template with different variables in
tight loops.** Map-reduce over chunks, batch evaluation,
fan-out fixtures.

The convenience `get()` operation gives you the single-call
shape when you want it without removing the separability.

## Prompt identity

Every `Prompt` carries five identity fields:

- `name` — your stable identifier (`"greeting"`).
- `version` — the backend's version string. Implementation-defined:
a backend MAY use semver, monotonic integers, content
hashes, git short-SHAs, or any stable identifier. The
filesystem backend derives it from the template content
hash.
- `label` — the slot the prompt was fetched from
(`"production"`, `"latest"`, `"variant-a"`). The label is
part of the query.
- `template_hash` — SHA-256 of the raw template source.
Two prompts with different content always have different
hashes.
- `fetched_at` — when the prompt was fetched. Cached
backends preserve the original fetch time, not the
cache-hit time.

The `name + version + label` triple identifies the prompt;
the `template_hash` lets you tell two prompts apart by
*content*, which matters when a vendor backend serves
different content under the same `latest` label over time.

A `PromptResult` propagates all of those, plus:

- `rendered_hash` — SHA-256 over the rendered messages.
Same template + same variables → same hash. This is the
cache-key value a memoization layer wants.
- `messages` — the rendered output as an LLM-ready
`list[Message]`. Directly consumable by
`Provider.complete()`.
- `variables` — what was applied. Audit-trail friendly.
- `rendered_at` — when the render happened. Distinct from
`fetched_at`.

## Strict variables by default

A template that references a variable not in the mapping
raises `PromptRenderError`:

```python
prompt = await manager.fetch("greeting", "production") # "Hello, {{ user }}! Today is {{ day }}."
manager.render(prompt, {"user": "Alice"}) # raises — "day" is undefined
```

This is intentional. Silently substituting empty strings for
missing variables masks bugs: a typo'd variable name produces
a working-but-wrong prompt, often invisibly. If you need
lenient behavior, wrap your variables in your own defaulting
layer before passing them to `render()`.

The Python implementation uses Jinja2's `StrictUndefined`.

## Composite backends and fallback

A manager constructed with multiple backends consults them in
order. The fallback rule distinguishes infrastructure failure
from logical absence:

```python
from openarmature.prompts import PromptManager
from openarmature_langfuse import LangfusePromptBackend # hypothetical sibling

manager = PromptManager(
LangfusePromptBackend(api_key=...),
FilesystemPromptBackend("./prompts"), # local fallback
)
```

- **`PromptStoreUnavailable` from a backend → try the next.**
Network's down, vendor API is 5xx-ing, filesystem hiccupped —
the manager falls back. This is the "Langfuse is degraded,
use the local copy" case.
- **`PromptNotFound` from a backend → STOP the chain.** The
error propagates. This is the "operator deliberately
deleted the prompt from Langfuse to retire it" case —
falling back here would silently resurface a stale local
copy under a name the operator wanted gone.
- **All backends `PromptStoreUnavailable` → manager raises
`PromptStoreUnavailable`.** Everything's down.

The two error categories have different operational
meanings; the manager keeps them separated.

## Errors

Three categories cover every failure mode:

| Error | When | Transient |
| ------------------------- | ------------------------------------------------------------------- | --------- |
| `PromptNotFound` | No prompt matches `(name, label)` in any backend (after §8 rules) | No |
| `PromptRenderError` | Undefined variable, template parse error, coercion failure | No |
| `PromptStoreUnavailable` | Backend infrastructure failure (network, I/O, vendor API) | Yes |

`PROMPT_TRANSIENT_CATEGORIES` is exported as a frozenset for
retry-middleware classifiers — the same pattern
`openarmature.llm` uses with its `TRANSIENT_CATEGORIES`.

## PromptGroup — tracing related prompts together

A `PromptGroup` is a structural grouping of two or more
`PromptResult` instances under a stable `group_name`. The
group itself doesn't execute anything; it gives observability
a shared name to render related calls under.

```python
from openarmature.prompts import PromptGroup, with_active_prompt_group

classify = await manager.get("classify", variables={"input": user_query})
answer = await manager.get("answer", variables={"input": user_query, ...})

group = PromptGroup(group_name="classifier_chain", members=[classify, answer])
with with_active_prompt_group(group):
# Every LLM call in this scope carries
# openarmature.prompt.group_name="classifier_chain".
classification = await provider.complete(classify.messages, ...)
final = await provider.complete(answer.messages, ...)
```

Canonical patterns the primitive covers:

- **Multi-stage classification** — `[coarse, fine, answer]`.
- **RAG with reranking** — `[query_rewrite, retrieve, rerank, answer]`.
- **Self-correction loops** — `[generate, critique, revise]`.
- **Map-reduce over chunks** — `[chunk_classify_1..N, synthesize]`.

The N=2 case ("classifier + follow-up") is the simplest;
larger groups work under the same primitive. The group rejects
empty and single-member shapes — single-prompt tagging is
already served by the per-prompt observability attributes
below.

## Observability propagation

When an LLM call fires inside `with_active_prompt(result)` (or
`with_active_prompt_group(group)`), the OTel observer surfaces
six normative attributes on the `openarmature.llm.complete`
span:

- `openarmature.prompt.name`
- `openarmature.prompt.version`
- `openarmature.prompt.label`
- `openarmature.prompt.template_hash`
- `openarmature.prompt.rendered_hash`
- `openarmature.prompt.group_name`

Pattern:

```python
result = await manager.get("greeting", "production", {"user": "Alice"})
with with_active_prompt(result):
response = await provider.complete(result.messages, ...)
```

Trace UIs can then pivot on `prompt.name`, filter on
`prompt.template_hash` to find every call that used a given
template version, or surface `prompt.group_name` to group
related calls into a single workflow view.

Nesting is innermost-wins. If you activate a result inside
another active result, the inner one wins for the duration
of the inner block.

## Determinism and content-addressed caching

`render` is deterministic: same `Prompt`, same `variables` →
bytewise-identical `messages` and `rendered_hash` across
calls. This is the cache-key contract — `rendered_hash`
gives a downstream memoization layer the right equivalence
relation for free.

Templates MAY reference user-supplied variables that capture
nondeterministic values (`now=datetime.utcnow()`); the
determinism contract applies to the render operation given
fixed inputs, not to user-supplied variable content.

## A minimal example

```python
import asyncio
from pathlib import Path

from openarmature.prompts import FilesystemPromptBackend, PromptManager


async def main() -> None:
manager = PromptManager(FilesystemPromptBackend(Path("./prompts")))
result = await manager.get(
"greeting",
"production",
variables={"user": "Alice"},
)
print(result.messages[0].content) # rendered text
print(result.rendered_hash) # cache key


asyncio.run(main())
```
Comment thread
chris-colinsky marked this conversation as resolved.

The filesystem backend layout is
`<root>/<label>/<name>.j2` — for the example above,
`./prompts/production/greeting.j2`.

## What's out of scope (for now)

- **Specific vendor backends** — Langfuse, PromptLayer, etc.,
ship as sibling packages (`openarmature-langfuse`, …). The
core ships the protocol + a filesystem reference.
- **Prompt versioning workflows** — how versions are assigned,
promoted, pinned. Per project. The spec defines the
`version` field; the discipline is yours.
- **Cache invalidation policies** — `template_hash` and
`rendered_hash` are the keys; the cache itself is a
separate concern.
- **Prompt linting / evaluation** — quality checks belong to
separate tools (or the future eval capability).
- **Multi-message render decomposition** — v1 emits a single
`UserMessage` carrying the rendered text. If you need
`system + user` splits, construct the messages list
manually outside `render()` for now.

## Where to next

- **[Model Providers](../model-providers/index.md)** —
what to pass `result.messages` into.
- **[API reference: `openarmature.prompts`](../reference/prompts.md)** —
the full public surface.
7 changes: 7 additions & 0 deletions docs/reference/prompts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# openarmature.prompts

::: openarmature.prompts
options:
show_root_heading: false
show_source: false
heading_level: 2
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ nav:
- Composition: concepts/composition.md
- Fan-out: concepts/fan-out.md
- LLMs: concepts/llms.md
- Prompts: concepts/prompts.md
- Observability: concepts/observability.md
- Checkpointing: concepts/checkpointing.md
- Model Providers:
Expand All @@ -104,6 +105,7 @@ nav:
- reference/index.md
- openarmature.graph: reference/graph.md
- openarmature.llm: reference/llm.md
- openarmature.prompts: reference/prompts.md
- openarmature.checkpoint: reference/checkpoint.md
- openarmature.observability: reference/observability.md

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies = [
"pydantic>=2.7",
"httpx>=0.27",
"jsonschema>=4.0",
"jinja2>=3.1",
]

[project.optional-dependencies]
Expand Down
15 changes: 15 additions & 0 deletions src/openarmature/observability/otel/observer.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@
)
from opentelemetry.trace.propagation import set_span_in_context

from openarmature.prompts.context import current_prompt_group, current_prompt_result

if TYPE_CHECKING:
from openarmature.graph.events import NodeEvent

Expand Down Expand Up @@ -487,6 +489,19 @@ def _handle_llm_event(self, event: NodeEvent) -> None:
cid = current_correlation_id()
if cid is not None:
attrs["openarmature.correlation_id"] = cid
# Per prompt-management spec §11, surface prompt identity
# on the LLM-call span when the call fired inside a
# with_active_prompt / with_active_prompt_group context.
active_prompt = current_prompt_result()
if active_prompt is not None:
attrs["openarmature.prompt.name"] = active_prompt.name
attrs["openarmature.prompt.version"] = active_prompt.version
attrs["openarmature.prompt.label"] = active_prompt.label
attrs["openarmature.prompt.template_hash"] = active_prompt.template_hash
attrs["openarmature.prompt.rendered_hash"] = active_prompt.rendered_hash
active_group = current_prompt_group()
if active_group is not None:
attrs["openarmature.prompt.group_name"] = active_group.group_name
Comment thread
chris-colinsky marked this conversation as resolved.
span = self._tracer.start_span(
name="openarmature.llm.complete",
context=cast("Any", parent_ctx),
Expand Down
47 changes: 47 additions & 0 deletions src/openarmature/prompts/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""Prompt-management capability — fetch, render, and trace named prompts."""

from .backend import PromptBackend
from .backends import FilesystemPromptBackend
from .context import (
current_prompt_group,
current_prompt_result,
with_active_prompt,
with_active_prompt_group,
)
from .errors import (
PROMPT_NOT_FOUND,
PROMPT_RENDER_ERROR,
PROMPT_STORE_UNAVAILABLE,
PROMPT_TRANSIENT_CATEGORIES,
PromptError,
PromptNotFound,
PromptRenderError,
PromptStoreUnavailable,
)
from .group import PromptGroup
from .hashing import compute_rendered_hash, compute_template_hash
from .manager import PromptManager
from .prompt import Prompt, PromptResult

__all__ = [
"PROMPT_NOT_FOUND",
"PROMPT_RENDER_ERROR",
"PROMPT_STORE_UNAVAILABLE",
"PROMPT_TRANSIENT_CATEGORIES",
"FilesystemPromptBackend",
"Prompt",
"PromptBackend",
"PromptError",
"PromptGroup",
"PromptManager",
"PromptNotFound",
"PromptRenderError",
"PromptResult",
"PromptStoreUnavailable",
"compute_rendered_hash",
"compute_template_hash",
"current_prompt_group",
"current_prompt_result",
"with_active_prompt",
"with_active_prompt_group",
]
Loading