Add Langfuse observer (proposal 0031)#80
Conversation
New openarmature.observability.langfuse subpackage maps the observer event stream onto Langfuse's native Trace + Observation data model: invocation -> Trace, node/subgraph/fan-out -> Span observation, LLM provider call -> Generation observation. The LangfuseObserver consumes the event stream and emits through a narrow LangfuseClient Protocol with four methods (trace, span, generation, update_trace). Two concrete clients: - InMemoryLangfuseClient bundled for tests and the conformance harness; captures every Trace + Observation as plain dataclass records inspectable by assertions. - A real langfuse.Langfuse() SDK instance is Protocol-compatible and drops in unchanged for production use. SDK versions whose shape diverges plug in via a small adapter the user writes; the shape is documented in examples/10-langfuse-observability. Trace id equals the framework-minted invocation_id verbatim so cross-system lookup by invocation_id finds the Langfuse Trace directly. correlation_id surfaces on both trace.metadata and every observation.metadata for cross-backend join. Trace name sources from the entry-node name; the caller-supplied invocation-label path lands in proposal 0034 (PR 4). Generation rendering follows the §8.7 contract: input/output/ request_extras appear only when disable_llm_payload=False; the §5.5.5 truncation marker is preserved verbatim as a raw string when the serialized payload exceeds payload_byte_cap. Prompt linkage follows §8.4.4 case discrimination: reads Prompt.observability_entities['langfuse_prompt'] (the field added in proposal 0033) to establish a native Prompt-entity link when the prompt's source exposes one. Backends without that reference (filesystem, in-memory) produce metadata-only linkage. Three conformance fixtures pass: 022 basic trace, 023 Generation rendering plus truncation case, 024 prompt linkage both cases. Seven new unit tests cover payload-cap validation, in-memory recorder field handling, and observation parent walking. Example 10-langfuse-observability is a runnable moon-themed demo: a lunar mission Q&A pipeline with a mock Langfuse-source prompt backend, the LangfuseObserver wired to the in-memory recorder, and a pretty-printer for the captured Trace tree at the end. Real SDK swap is a one-line constructor change. Third of 6 PRs in the v0.10.0 batch.
There was a problem hiding this comment.
Pull request overview
Adds a Langfuse-native observer (proposal 0031, spec §8) as a new openarmature.observability.langfuse subpackage. The observer consumes the existing NodeEvent stream and emits Trace + Span/Generation entities through a narrow LangfuseClient Protocol, with a bundled InMemoryLangfuseClient recorder for tests and the conformance harness. Mirrors the OTel observer's per-invocation state isolation, payload-cap minimum, and truncation-marker semantics, while keeping the SDK dependency optional via structural Protocol matching.
Changes:
- New
LangfuseObservermapping invocation→Trace, node/subgraph→Span, LLM call→Generation with prompt-entity linkage per §8.4.4. - New
LangfuseClientProtocol +InMemoryLangfuseClientrecorder withLangfuseTrace/LangfuseObservation/LangfuseUsagedataclasses. - Conformance harness for fixtures 022-024, focused unit tests, runnable example, mkdocs/observability docs, and AGENTS.md cross-reference.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/openarmature/observability/langfuse/init.py | Public exports for the new subpackage. |
| src/openarmature/observability/langfuse/client.py | LangfuseClient Protocol, dataclass records, in-memory recorder. |
| src/openarmature/observability/langfuse/observer.py | Observer mapping events to Traces/Spans/Generations with truncation, prompt linkage, error mapping. |
| tests/conformance/test_observability_langfuse.py | Drives Langfuse spec fixtures 022-024 with mock LLM/prompt backends. |
| tests/conformance/test_fixture_parsing.py | Updates skip rationale for Langfuse fixtures now that harness exists. |
| tests/unit/test_observability_langfuse.py | Targeted unit tests for cap validation, truncation, recorder fields, parent walking. |
| tests/test_examples_smoke.py | Registers example 10 for smoke testing. |
| examples/10-langfuse-observability/main.py | Runnable lunar Q&A demo using InMemoryLangfuseClient + active prompt. |
| docs/examples/10-langfuse-observability.md | Example documentation with expected captured-trace shape. |
| docs/concepts/observability.md | Adds Langfuse mapping section parallel to OTel guidance. |
| mkdocs.yml | Adds example 10 to the Examples nav. |
| src/openarmature/AGENTS.md | Adds example 10 cross-reference. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Four doc-only additions from PR review: - _resolve_parent_observation_id now explicitly notes the longest-prefix-first outer-loop walk and calls out the first-match approximation when multiple open observations share a namespace prefix. The proper subgraph_observations / fan_out_instance_observations / detached_roots maps land in a follow-on PR; the current resolver covers the linear-graph and basic-LLM cases the v0.23.0 conformance fixtures exercise. - _maybe_truncate_for_input and _maybe_truncate_for_extras docstrings document the intentional list|str and dict|str union return types. When the serialized payload exceeds payload_byte_cap the helpers return the marker-bearing string verbatim per spec §8.7; the unparseable JSON IS the truncation signal and the Langfuse UI renders the string view in that case. - LangfuseClient.update_trace gets a comment explaining why it's declared in the Protocol despite the current observer not invoking it. The caller-supplied invocation-label path (proposal 0034, PR 4 of the v0.10.0 batch) may need to swap the trace name after the first node event opens the Trace; SDK adapters implement it for forward compatibility with that wiring. - Concepts and example docs now disclose that no specific langfuse SDK version is validated in CI for this release. A follow-on release pins a tested [langfuse] extras range and ships a runtime Protocol-conformance check; until then production wire-up is a "verify in your own environment" path. Behavior unchanged. AGENTS.md regenerated to pick up the example doc note.
Summary
Implements proposal 0031 (spec v0.23.0) — the Langfuse-native observer mapping. Third of 6 PRs in the v0.10.0 batch.
New subpackage
openarmature.observability.langfuse:LangfuseObserverconsumes theNodeEventstream and emits Trace + Span/Generation observations through aLangfuseClientProtocol. Mapping per §8.3: invocation → Trace, node/subgraph/fan-out → Span observation, LLM provider → Generation observation.LangfuseClientProtocol declares four methods (trace,span,generation,update_trace) the observer calls.InMemoryLangfuseClientis the bundled recorder used by the conformance harness and useful for unit tests; captures every Trace + Observation as plain dataclass records inspectable from test code.langfuse.Langfuse()SDK instance is structurally Protocol-compatible and drops in unchanged for production. SDK versions whose shape diverges wrap in a small adapter; example 10 documents the shape.Behavior highlights:
idequals the framework-mintedinvocation_idverbatim (§8.4.1) so cross-system lookup by invocation_id lands directly.correlation_idsurfaces ontrace.metadataand everyobservation.metadataas the cross-backend join key (§8.5).input/output/metadata.request_extrasappear only whendisable_llm_payload=False; the §5.5.5 truncation marker is preserved verbatim as a raw string when the serialized payload exceedspayload_byte_cap.Prompt.observability_entities['langfuse_prompt'](the field landed by proposal 0033 in PR 2) to establish a native Prompt-entity link when present; backends without the reference produce metadata-only linkage.Tests:
tests/conformance/test_observability_langfuse.pyagainstInMemoryLangfuseClient.tests/unit/test_observability_langfuse.py.Example:
examples/10-langfuse-observability/main.pyis a runnable moon-themed demo: a lunar mission Q&A pipeline that fetches a versioned prompt template (with a mock Langfuse-source backend), calls the LLM underwith_active_prompt, and prints the captured Trace + Generation tree at the end. The example doc covers the full Langfuse-shape output, the SDK-adapter shape for non-Protocol-compatible SDK versions, and composition with the OTel observer.docs/concepts/observability.mdgrows a new "Langfuse mapping (opt-in)" section parallel to the existing OTel section.conformance.tomlproposal 0031 staysnot-yetuntil the release PR (PR 6) flips it toimplemented since = "0.10.0".Test plan
build_graph()instantiates cleanly)LLM_API_KEY=sk-... uv run python examples/10-langfuse-observability/main.py "what year did Apollo 11 land"— prints the captured Trace with prompt entity linkpython3 scripts/check_conformance_manifest.pyexits 0 (30 accepted, 30 manifest entries)