Skip to content

Add Langfuse observer (proposal 0031)#80

Merged
chris-colinsky merged 2 commits into
mainfrom
feature/langfuse-observer
May 27, 2026
Merged

Add Langfuse observer (proposal 0031)#80
chris-colinsky merged 2 commits into
mainfrom
feature/langfuse-observer

Conversation

@chris-colinsky
Copy link
Copy Markdown
Member

@chris-colinsky chris-colinsky commented May 27, 2026

Summary

Implements proposal 0031 (spec v0.23.0) — the Langfuse-native observer mapping. Third of 6 PRs in the v0.10.0 batch.

New subpackage openarmature.observability.langfuse:

  • LangfuseObserver consumes the NodeEvent stream and emits Trace + Span/Generation observations through a LangfuseClient Protocol. Mapping per §8.3: invocation → Trace, node/subgraph/fan-out → Span observation, LLM provider → Generation observation.
  • LangfuseClient Protocol declares four methods (trace, span, generation, update_trace) the observer calls.
  • InMemoryLangfuseClient is the bundled recorder used by the conformance harness and useful for unit tests; captures every Trace + Observation as plain dataclass records inspectable from test code.
  • A real langfuse.Langfuse() SDK instance is structurally Protocol-compatible and drops in unchanged for production. SDK versions whose shape diverges wrap in a small adapter; example 10 documents the shape.

Behavior highlights:

  • Trace id equals the framework-minted invocation_id verbatim (§8.4.1) so cross-system lookup by invocation_id lands directly.
  • correlation_id surfaces on trace.metadata and every observation.metadata as the cross-backend join key (§8.5).
  • Trace name sources from the entry-node name (§8.6 fallback). Caller-supplied invocation-label path lands in proposal 0034 (PR 4).
  • Generation rendering (§8.7): input / output / metadata.request_extras appear only when disable_llm_payload=False; the §5.5.5 truncation marker is preserved verbatim as a raw string when the serialized payload exceeds payload_byte_cap.
  • Prompt linkage (§8.4.4): reads Prompt.observability_entities['langfuse_prompt'] (the field landed by proposal 0033 in PR 2) to establish a native Prompt-entity link when present; backends without the reference produce metadata-only linkage.

Tests:

  • Three conformance fixtures pass: 022 basic trace, 023 Generation rendering + truncation case, 024 prompt linkage both cases. Driven from a new tests/conformance/test_observability_langfuse.py against InMemoryLangfuseClient.
  • Seven new unit tests in tests/unit/test_observability_langfuse.py.

Example:

examples/10-langfuse-observability/main.py is a runnable moon-themed demo: a lunar mission Q&A pipeline that fetches a versioned prompt template (with a mock Langfuse-source backend), calls the LLM under with_active_prompt, and prints the captured Trace + Generation tree at the end. The example doc covers the full Langfuse-shape output, the SDK-adapter shape for non-Protocol-compatible SDK versions, and composition with the OTel observer.

docs/concepts/observability.md grows a new "Langfuse mapping (opt-in)" section parallel to the existing OTel section.

conformance.toml proposal 0031 stays not-yet until the release PR (PR 6) flips it to implemented since = "0.10.0".

Test plan

  • CI green (lint, format, types, conformance, unit, smoke, agents-md drift)
  • Three Langfuse conformance fixtures pass: 022, 023 (both happy + truncated cases), 024 (both Langfuse-reference and filesystem cases)
  • Example smoke test passes (build_graph() instantiates cleanly)
  • Run the example: LLM_API_KEY=sk-... uv run python examples/10-langfuse-observability/main.py "what year did Apollo 11 land" — prints the captured Trace with prompt entity link
  • python3 scripts/check_conformance_manifest.py exits 0 (30 accepted, 30 manifest entries)

New openarmature.observability.langfuse subpackage maps the observer
event stream onto Langfuse's native Trace + Observation data model:
invocation -> Trace, node/subgraph/fan-out -> Span observation, LLM
provider call -> Generation observation.

The LangfuseObserver consumes the event stream and emits through a
narrow LangfuseClient Protocol with four methods (trace, span,
generation, update_trace). Two concrete clients:
- InMemoryLangfuseClient bundled for tests and the conformance
  harness; captures every Trace + Observation as plain dataclass
  records inspectable by assertions.
- A real langfuse.Langfuse() SDK instance is Protocol-compatible
  and drops in unchanged for production use. SDK versions whose
  shape diverges plug in via a small adapter the user writes; the
  shape is documented in examples/10-langfuse-observability.

Trace id equals the framework-minted invocation_id verbatim so
cross-system lookup by invocation_id finds the Langfuse Trace
directly. correlation_id surfaces on both trace.metadata and every
observation.metadata for cross-backend join. Trace name sources
from the entry-node name; the caller-supplied invocation-label
path lands in proposal 0034 (PR 4).

Generation rendering follows the §8.7 contract: input/output/
request_extras appear only when disable_llm_payload=False; the
§5.5.5 truncation marker is preserved verbatim as a raw string
when the serialized payload exceeds payload_byte_cap.

Prompt linkage follows §8.4.4 case discrimination: reads
Prompt.observability_entities['langfuse_prompt'] (the field added
in proposal 0033) to establish a native Prompt-entity link when
the prompt's source exposes one. Backends without that reference
(filesystem, in-memory) produce metadata-only linkage.

Three conformance fixtures pass: 022 basic trace, 023 Generation
rendering plus truncation case, 024 prompt linkage both cases.
Seven new unit tests cover payload-cap validation, in-memory
recorder field handling, and observation parent walking.

Example 10-langfuse-observability is a runnable moon-themed demo:
a lunar mission Q&A pipeline with a mock Langfuse-source prompt
backend, the LangfuseObserver wired to the in-memory recorder,
and a pretty-printer for the captured Trace tree at the end. Real
SDK swap is a one-line constructor change.

Third of 6 PRs in the v0.10.0 batch.
Copilot AI review requested due to automatic review settings May 27, 2026 15:24
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py Fixed
Comment thread src/openarmature/observability/langfuse/client.py Fixed
Comment thread src/openarmature/observability/langfuse/client.py Fixed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Langfuse-native observer (proposal 0031, spec §8) as a new openarmature.observability.langfuse subpackage. The observer consumes the existing NodeEvent stream and emits Trace + Span/Generation entities through a narrow LangfuseClient Protocol, with a bundled InMemoryLangfuseClient recorder for tests and the conformance harness. Mirrors the OTel observer's per-invocation state isolation, payload-cap minimum, and truncation-marker semantics, while keeping the SDK dependency optional via structural Protocol matching.

Changes:

  • New LangfuseObserver mapping invocation→Trace, node/subgraph→Span, LLM call→Generation with prompt-entity linkage per §8.4.4.
  • New LangfuseClient Protocol + InMemoryLangfuseClient recorder with LangfuseTrace/LangfuseObservation/LangfuseUsage dataclasses.
  • Conformance harness for fixtures 022-024, focused unit tests, runnable example, mkdocs/observability docs, and AGENTS.md cross-reference.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/openarmature/observability/langfuse/init.py Public exports for the new subpackage.
src/openarmature/observability/langfuse/client.py LangfuseClient Protocol, dataclass records, in-memory recorder.
src/openarmature/observability/langfuse/observer.py Observer mapping events to Traces/Spans/Generations with truncation, prompt linkage, error mapping.
tests/conformance/test_observability_langfuse.py Drives Langfuse spec fixtures 022-024 with mock LLM/prompt backends.
tests/conformance/test_fixture_parsing.py Updates skip rationale for Langfuse fixtures now that harness exists.
tests/unit/test_observability_langfuse.py Targeted unit tests for cap validation, truncation, recorder fields, parent walking.
tests/test_examples_smoke.py Registers example 10 for smoke testing.
examples/10-langfuse-observability/main.py Runnable lunar Q&A demo using InMemoryLangfuseClient + active prompt.
docs/examples/10-langfuse-observability.md Example documentation with expected captured-trace shape.
docs/concepts/observability.md Adds Langfuse mapping section parallel to OTel guidance.
mkdocs.yml Adds example 10 to the Examples nav.
src/openarmature/AGENTS.md Adds example 10 cross-reference.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Four doc-only additions from PR review:

- _resolve_parent_observation_id now explicitly notes the
  longest-prefix-first outer-loop walk and calls out the first-match
  approximation when multiple open observations share a namespace
  prefix. The proper subgraph_observations /
  fan_out_instance_observations / detached_roots maps land in a
  follow-on PR; the current resolver covers the linear-graph and
  basic-LLM cases the v0.23.0 conformance fixtures exercise.

- _maybe_truncate_for_input and _maybe_truncate_for_extras docstrings
  document the intentional list|str and dict|str union return types.
  When the serialized payload exceeds payload_byte_cap the helpers
  return the marker-bearing string verbatim per spec §8.7; the
  unparseable JSON IS the truncation signal and the Langfuse UI
  renders the string view in that case.

- LangfuseClient.update_trace gets a comment explaining why it's
  declared in the Protocol despite the current observer not invoking
  it. The caller-supplied invocation-label path (proposal 0034, PR 4
  of the v0.10.0 batch) may need to swap the trace name after the
  first node event opens the Trace; SDK adapters implement it for
  forward compatibility with that wiring.

- Concepts and example docs now disclose that no specific langfuse
  SDK version is validated in CI for this release. A follow-on
  release pins a tested [langfuse] extras range and ships a runtime
  Protocol-conformance check; until then production wire-up is a
  "verify in your own environment" path.

Behavior unchanged. AGENTS.md regenerated to pick up the example
doc note.
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py
@chris-colinsky chris-colinsky merged commit 885fee5 into main May 27, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the feature/langfuse-observer branch May 27, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants