Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions docs/concepts/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -586,3 +586,100 @@ appear dropped. Two workarounds:
- Use `SimpleSpanProcessor` instead of `BatchSpanProcessor` in
tests; it exports synchronously and is unaffected by teardown
timing.

## Langfuse mapping (opt-in)

A second sibling observer maps the same `NodeEvent` stream onto
Langfuse's native Trace + Observation data model — Traces at the
top, Span observations for graph nodes, Generation observations for
LLM calls. Use it instead of (or alongside) the OTel observer when
your trace UI is Langfuse and you want first-class Generation
rendering without going through Langfuse's OTLP ingest.

```python
from openarmature.observability.langfuse import (
InMemoryLangfuseClient,
LangfuseObserver,
)

client = InMemoryLangfuseClient() # or langfuse.Langfuse(...) in prod
observer = LangfuseObserver(client=client)
graph.attach_observer(observer)
```

The `client` is anything matching the `LangfuseClient` Protocol —
the bundled `InMemoryLangfuseClient` (used by the conformance
harness, useful for unit tests), or a real `langfuse.Langfuse()`
instance from the [Langfuse Python SDK](https://github.com/langfuse/langfuse-python).
The Protocol declares only the methods the observer calls, so SDK
versions whose shape matches drop in directly. SDK versions whose
shape diverges (renamed kwargs, return-type quirks) plug in via a
small adapter; see
[`examples/10-langfuse-observability`](../examples/10-langfuse-observability.md)
for the runnable demo plus the adapter shape.

### What Langfuse sees

- **Trace ID = invocation ID.** The Trace's `id` is the OA
`invocation_id` verbatim, so cross-system lookup by invocation_id
finds the Langfuse Trace directly (spec §8.4.1).
- **Trace name.** Defaults to the entry-node name (spec §8.6
fallback). Caller-supplied invocation labels land in PR 4
(proposal 0034).
- **Per-observation metadata.** Each Span / Generation carries
`namespace`, `step`, `attempt_index`, optional `fan_out_index` /
`branch_name`, and the `correlation_id` cross-cutting join key
(spec §8.5).
- **Generation fields.** LLM calls become Generation observations
with `model`, `model_parameters` (the `gen_ai.request.*` request
parameters lifted by inclusion per §8.4.3), `usage` (input /
output / total tokens), and `metadata.finish_reason` /
`system` / `response_model` / `response_id`.

### Payload + truncation

`disable_llm_payload` mirrors the OTel observer's flag — defaults
to `True` for the same privacy reason. Flip to `False` to populate
`generation.input` / `output` / `metadata.request_extras` from the
LLM event payload.

```python
observer = LangfuseObserver(
client=client,
disable_llm_payload=False,
payload_byte_cap=65536,
)
```

When a payload exceeds `payload_byte_cap`, the observer emits the
serialized form with the §5.5.5 truncation marker
(`…[truncated, M bytes total]`) verbatim as a raw string instead of
parsing back to native shape. The unparseable JSON IS the
truncation signal in the Langfuse UI.

### Prompt linkage

When a Prompt's source backend exposes a Langfuse Prompt entity
reference under `Prompt.observability_entities['langfuse_prompt']`,
the Generation observation links to that entity natively (spec
§8.4.4 case 1). Backends that don't surface a Langfuse reference
(filesystem, in-memory, etc.) leave the Generation with
`metadata.prompt` populated but no entity link (case 2).

### Composition with OTel

The two observers are independent §6 event consumers and can be
attached together. They share the `correlation_id` as the
cross-backend join key — find a slow Generation in Langfuse, search
for its `correlation_id` in OTel logs, see the surrounding
infrastructure activity.

```python
otel_observer = OTelObserver(span_processor=...)
langfuse_observer = LangfuseObserver(client=langfuse_client)
graph.attach_observer(otel_observer)
graph.attach_observer(langfuse_observer)
```

Each observer's `disable_llm_spans` / `disable_llm_payload` flag is
independent; one MAY emit while the other suppresses.
159 changes: 159 additions & 0 deletions docs/examples/10-langfuse-observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# 10 - Langfuse observability

Send LLM call observability to Langfuse natively — Trace at the top,
Span observations for graph nodes, Generation observations with input,
output, token usage, model parameters, and a native link back to the
prompt entity the call rendered from.

## Overview

A mission-briefing assistant answers questions about Apollo and Artemis
missions. The pipeline fetches a versioned prompt template, renders it
with the user's question, sends the rendered messages to the model,
and stores the response. The Langfuse observer captures the full call
shape as the graph runs.

The demo's prompt backend stubs a Langfuse-source by attaching a
sentinel `langfuse_prompt` reference to the rendered prompt. The
Generation observation reads that reference and links back to the
prompt entity — exactly what you'd see in a production Langfuse
dashboard threading "this generation came from prompt v7" without any
manual wiring at the call site.

## What it teaches

- [`LangfuseObserver`](../concepts/observability.md#langfuse-mapping-opt-in)
attaches like any other observer; nothing in the node code knows or
cares about which backend is recording.
- The `LangfuseClient` Protocol decouples the observer from the SDK.
The bundled `InMemoryLangfuseClient` recorder is the test/demo
shape; production passes a real `langfuse.Langfuse()` instance (or
a thin adapter — see [Reading the output](#reading-the-output)
below).
- Prompt linkage through
[`Prompt.observability_entities`](../concepts/prompts.md#backend-keyed-observability-entity-references):
a prompt backend that exposes a Langfuse Prompt entity reference
surfaces it on every Generation that renders from that prompt.
Filesystem / in-memory backends without that reference work too,
they just produce metadata-only linkage.
- `disable_llm_payload=False` opt-in for capturing input messages +
output content on Generation observations. Default-off is the
privacy posture; the demo deliberately flips it.
- `correlation_id` cross-cutting metadata on the Trace and every
Observation — the join key if you're also running an OTel observer
alongside.

## How to run

```bash
uv sync --group examples
LLM_API_KEY=sk-... uv run python examples/10-langfuse-observability/main.py \
"what year did Apollo 11 land"
```

The first positional arg becomes the question. The demo uses an
in-memory recorder so no Langfuse account is needed.

## The graph

```mermaid
flowchart TD
start([start])
answer[answer_briefing]
stop([end])

start --> answer --> stop
```

A single-node graph: fetch the prompt, render with the question, call
the LLM under `with_active_prompt(...)`, store the response. The
single node is deliberate — the value is in the captured Trace shape,
not the graph topology.

## Reading the output

After the answer prints, the script renders the captured Langfuse
Trace + Observation tree:

```
question: what year did Apollo 11 land
answer: Apollo 11 landed on the Moon on July 20, 1969 ...
prompt: mission-briefing v7

─── captured Langfuse trace ─────────────────────────────────
Trace id=01234567-89ab-...
name='answer_briefing'
metadata={correlation_id='...', entry_node='answer_briefing', spec_version='0.26.0'}
[span] 'answer_briefing' level=DEFAULT
metadata={attempt_index=0, correlation_id='...', namespace=['answer_briefing'], step=0}
[generation] 'openarmature.llm.complete' level=DEFAULT
metadata={correlation_id='...', finish_reason='stop', prompt={...},
response_id='...', response_model='gpt-4o-mini-2024-...',
system='openai'}
model='gpt-4o-mini'
usage=input:48 output:32 total:80
prompt_entity_link='lf-prompt-mission-briefing-v7'
output='Apollo 11 landed on the Moon on July 20, 1969 ...'
```

- **Trace name = entry node name** by default. The caller-supplied
invocation-label path (a per-`invoke()` argument that overrides the
default) ships with proposal 0034's caller-metadata work.
- **Span observation per node.** `answer_briefing` is the only node
here; a multi-node graph would produce a tree of nested Span
observations under the Trace.
- **Generation observation per LLM call.** Carries `model`, `usage`,
`output`, and the prompt-identity metadata. In a production Langfuse
dashboard this is what the "Generation" detail view renders.
- **`prompt_entity_link`** is the value `Prompt.observability_entities['langfuse_prompt']`
carried — a sentinel string in this demo, a real Langfuse SDK Prompt
object in production. When the backend doesn't surface the reference
(e.g., a filesystem backend), the link is absent but the
`metadata.prompt` map (name, version, label, hashes) still appears
for traceability.

## Swapping to a real Langfuse SDK

The observer's `client` parameter is `LangfuseClient`-Protocol-typed,
so any structurally-compatible value works:

```python
from langfuse import Langfuse

client = Langfuse(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com",
)
observer = LangfuseObserver(client=client, disable_llm_payload=False)
```

If the installed SDK version's `trace` / `span` / `generation` method
signatures match the Protocol exactly, this is the whole change. If
they diverge (renamed kwargs, return-type quirks), wrap the SDK in a
small adapter class that implements `LangfuseClient` and delegates to
the SDK call-by-call. The Protocol surface is narrow — four methods —
so the adapter is on the order of 40 lines.

For prompt linkage: in production, the
`Prompt.observability_entities['langfuse_prompt']` value is the SDK's
own Prompt-entity object (returned by `langfuse_client.get_prompt(...)`)
rather than the sentinel string this demo uses. The observer passes
that value straight through to the SDK's `generation(..., prompt=...)`
argument, which is what the SDK uses to establish the native link.

## Composition with OTel

Both observers consume the same `NodeEvent` stream and can be attached
together:

```python
graph.attach_observer(OTelObserver(span_processor=batch))
graph.attach_observer(LangfuseObserver(client=langfuse_client))
```

Their `disable_llm_spans` / `disable_llm_payload` flags are
independent. The `correlation_id` cross-cutting attribute is the join
key — find a slow Generation in Langfuse, search for the
`correlation_id` in OTel logs to see the surrounding infrastructure
activity.
Loading