observability: phase 6.1 concurrency-safe observer (PR-A)#22
Merged
Conversation
Phase 6.0 left three architectural limitations: OTelObserver state
collided across concurrent invocations on a shared instance, OTel
attach tokens spanned event boundaries (LIFO violations under
interleaved fan-out, suppressed by try/except guards in round-7),
and LLM spans had no calling-node identity threaded through, so
parent resolution fell back to the OTel current-span context and
mis-attributed under concurrent fan-out + retry. Spec agent's ack
folded the three into one coherent refactor — they share the same
primitive ("parents come from the observer's internal maps within
a single event handler's scope") and split would ship intermediate
states that fail their own correctness story.
Three new ContextVars in observability/correlation.py carry the
calling-node identity: current_namespace_prefix (default ()),
current_fan_out_index (default None), current_attempt_index
(default 0). The engine sets the first two in the outer per-node
scope of _step_function_node / _step_subgraph_node /
_step_fan_out_node; attempt_index lives inside innermost so retry
middleware that re-enters the inner closure bumps the value per
attempt. _LlmEventState gains calling_* fields populated from the
ContextVars at dispatch time.
OTelObserver state is now outer-keyed by invocation_id, not
correlation_id. Resume preserves the cid across runs (per spec
§5.6 cross-run join), so a cid-keyed map would have the resumed
run's spans inherit the prior invocation's trace_id, violating
§5.1's "each invocation has its own trace_id". Caught during impl
by test_phase5_fixture_031_span_assertions. The cid is still set
as openarmature.correlation_id on every span — only the state-
scoping key changed.
Parent resolution moves entirely to internal maps. _OpenSpan no
longer carries an OTel context token; each event handler reads
the parent from per-invocation open_spans / subgraph_spans /
detached_roots and starts the span with
context=set_span_in_context(parent). The round-7 try/except
ValueError guards on cross-event detach are deleted (no tokens
cross events). The close-prior-correlation_id branch in
_handle_started is deleted (per-invocation scoping makes it
unreachable). "Concurrency model — NOT safe" warning and
cross-event attach-token notes come out of the class docstring.
LLM-span parent resolution uses calling-node identity directly:
_resolve_llm_parent looks up open_spans by the full _StackKey
(calling_namespace_prefix, calling_attempt_index,
calling_fan_out_index), then walks ancestors for subgraph dispatch
or detached roots, then falls back to the invocation span. Under
concurrent fan-out + retry the LLM span lands under exactly the
right calling node regardless of dispatch ordering or which
sibling instance's span happens to be on the OTel current-span
stack.
Tests added in tests/unit/test_observability_otel.py:
- shared-observer concurrent invocations (asyncio.gather × 5):
N distinct trace_ids and clean per-invocation span trees.
- concurrent fan-out (4 instances) each calling LLM: every LLM
span parents under its own calling instance, not a sibling.
- LIFO regression check under warnings.catch_warnings("error"):
the path that previously needed try/except guards stays quiet
under the new architecture rather than being suppressed.
- LLM call inside retried node (max_attempts=3, two failures +
one success): each LLM span parents under its own attempt's
span, not a hardcoded attempt 0.
387 tests pass (4 new). Pyright clean.
There was a problem hiding this comment.
Pull request overview
This PR implements Phase 6.1 of the observability refactor by making OTelObserver safe to share across concurrent invocations (scoping internal span state by invocation_id), removing cross-event OpenTelemetry context attach/detach tokens, and adding calling-node identity propagation via new ContextVars so §5.5 LLM spans parent under the correct node attempt under fan-out + retry.
Changes:
- Refactor
OTelObserverto store all per-run span tracking in per-invocation_idstate containers and resolve parents from internal maps (no cross-event OTel tokens). - Add
ContextVars (current_namespace_prefix,current_fan_out_index,current_attempt_index) set by the engine to carry calling-node identity into LLM provider events. - Add/expand unit tests covering concurrent invocations, fan-out interleaving, LLM parent attribution under fan-out, and per-attempt attribution under retry.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_observability_otel.py | Adds Phase 6.1 regression + concurrency tests for span isolation and correct LLM parenting. |
| src/openarmature/observability/otel/observer.py | Refactors OTel observer to per-invocation state, removes attach/detach, and implements LLM parent resolution via calling identity. |
| src/openarmature/observability/correlation.py | Introduces new calling-node ContextVars and exports them. |
| src/openarmature/observability/init.py | Re-exports the new correlation/identity accessors. |
| src/openarmature/llm/providers/openai.py | Extends _LlmEventState with calling-node identity and populates it from the new ContextVars. |
| src/openarmature/graph/compiled.py | Sets/resets calling-node identity ContextVars around node execution and per-attempt attempt_index scope. |
Comments suppressed due to low confidence (1)
src/openarmature/observability/otel/observer.py:339
checkpoint_savedspans are currently parented via_resolve_parent_context, but the triggering node span has already been popped frominv_state.open_spansin_handle_completed. As a result,openarmature.checkpoint.savewill fall back to the invocation/subgraph span instead of the node that triggered the save (contradicting this method’s docstring). Consider storing the just-completed node span context (or span_id) long enough for the subsequentcheckpoint_savedevent to attach correctly, e.g., keep a short-lived map keyed by_StackKeyor emit the save span directly from_handle_completedwhen appropriate.
invocation_id = current_invocation_id()
if invocation_id is None:
return
inv_state = self._inv_states.get(invocation_id)
if inv_state is None:
return
parent_ctx = self._resolve_parent_context(inv_state, invocation_id, event)
attrs: dict[str, Any] = {
"openarmature.checkpoint.save_node": event.node_name,
}
cid = current_correlation_id()
if cid is not None:
attrs["openarmature.correlation_id"] = cid
span = self._tracer.start_span(
name="openarmature.checkpoint.save",
context=cast("Any", parent_ctx),
kind=SpanKind.INTERNAL,
attributes=attrs,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix _fan_out_node_span_key hardcoding attempt_index=0. Fan-out nodes wrapped with retry middleware open at the bumped attempt_index, so the lookup missed under retry and the detached-instance Link wasn't added to the §4.4 cross-trace navigation. Replaced with _find_fan_out_node_span scan that finds the in-flight fan-out span at prefix regardless of attempt_index (only one such entry is open at a time). - Update close_invocation docstring to be honest about the invocation_id sourcing constraint (CompiledGraph.invoke doesn't return it; the ContextVar is reset before control returns) and recommend shutdown() as the typical-production lifecycle hook. The deeper UX gap (long-lived observers without periodic shutdown accumulate per-invocation residue) is tracked in phase-6-1-conformance-fillin.md for follow-up. - Fix stale "now-cid-scoped" wording in openai.py — observer state is invocation_id-keyed in the merged code.
5 tasks
4 tasks
chris-colinsky
added a commit
that referenced
this pull request
May 29, 2026
* Bump spec pin to v0.31.0 + 0039/0041 fixtures Bumps the spec submodule from v0.27.1 to v0.31.0 (proposals 0037 Anthropic, 0039 caller invocation_id, 0040 open-span, 0041 reserved keys) and wires conformance to python's current state for v0.11.0. conformance.toml: spec_pin to v0.31.0 + entries for 0037 / 0039 / 0040 / 0041. 0039 and 0041 are implemented since 0.11.0; 0037 and 0040 stay not-yet (Anthropic provider + #22 open-span are out of scope for this PR). __spec_version__ in src and pyproject likewise bumps; AGENTS.md regenerated. Conformance harness updates: defer Anthropic fixtures (llm-provider/ 033-043) in both the cross-capability parser and the llm-provider runner; defer observability/034 (waits on #22); defer observability/ 035-036 from the cross-capability parser (the langfuse_trace shape isn't modeled; the derivation is pinned by unit tests against the same vector). The _run_fixture_028 runner recognises both the 0034 prefix-rejection and 0041 exact-name rejection patterns; the mid-invocation augment_metadata case waits on #22's harness primitive. Full suite green: 962 passed, 170 skipped. * Align conformance.toml note + visible 028 skip From PR #95 review: - Rewrite the conformance.toml convention block to acknowledge that feature PRs which bump the spec submodule pin update this file too, setting `since` to the upcoming release version (matches the v0.10.0 cycle's pattern in PR #85 / #88, and how this PR works). - Emit warnings.warn for 028's deferred augment_metadata case so pytest's end-of-run summary surfaces the deferred coverage by name rather than silently passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-A of Phase 6.1, the architectural refactor that closes the three
limitations Phase 6.0 explicitly deferred:
OTelObserverinternalspan maps are outer-keyed by
invocation_id(per spec §5.1) soa single observer instance is safe to share across concurrent
invocations. The original plan-of-record said correlation_id;
that broke
test_phase5_fixture_031_span_assertionsduringimpl because resume preserves the cid across runs (per §5.6) and
a cid-keyed map would have resumed runs inherit the prior
invocation's trace_id, violating §5.1's "each invocation owns
its own trace_id" contract. Switching to invocation_id closes
the gap; the cid stays as the cross-run join attribute on every
span.
entirely to the observer's internal maps within each event
handler's scope.
_OpenSpanno longer carries an OTel contexttoken; spans open with
context=set_span_in_context(parent_span)directly. Theround-7
try/except ValueErrorguards on cross-eventdetach()are deleted because the underlying hazard goesaway — tokens never cross event boundaries.
observability/correlation.py(current_namespace_prefix,current_fan_out_index,current_attempt_index) carry thecalling node's identity to
_make_llm_event;_LlmEventStategains correspondingcalling_*fields; theobserver's
_resolve_llm_parentlooks up the calling node'sspan by the full
_StackKeytriple. Under concurrent fan-outnode regardless of dispatch ordering or which sibling instance
happens to be on the OTel current-span stack.
The three are landed together because they share the same
primitive: parents come from the observer's internal maps within a
single event handler's scope. Splitting would ship intermediate
states that fail their own correctness story (see the spec agent's
ack in
openarmature-coord/threads/phase-6-1-observability-conformance-fillin/02-spec-plan-review.mdfor the precondition framing).
Tests
Four new tests in
tests/unit/test_observability_otel.py:test_shared_observer_concurrent_invocations_dont_collide—asyncio.gather([invoke()] * 5)on a single shared observer;asserts N distinct
trace_idvalues and that each trace'sspan set is exactly the expected three nodes (invocation +
node_a + node_b).
test_concurrent_fan_out_no_lifo_violation— the explicitregression check spec asked for. Drives 5-item fan-out under
warnings.catch_warnings("error")so the path that previouslyneeded the round-7
try/except ValueError: passguardssurfaces as a fail rather than a silent swallow if anything in
the new architecture were to trigger it. Stays quiet — the
underlying hazard is gone.
test_concurrent_fan_out_llm_spans_parent_under_calling_instance—4-instance fan-out, each instance's body calls
OpenAIProvider.complete()(mocked viahttpx.MockTransport);asserts every LLM span parents under its own
askinstance,never a sibling, with N distinct parent span_ids for N calls.
test_llm_call_inside_retried_node_parents_per_attempt—the test from spec's attempt_index flag in
04-spec-attempt-index-flag.md. Node wrapped withRetryMiddleware(max_attempts=3); body callscomplete()oneach attempt (two failures + one success); asserts each LLM span
parents under THAT attempt's
flakyspan, not a hardcodedattempt_index=0.Test plan
uv run pytest -q— 387 passed, 9 skippeduv run pyright— 0 errorsuv run ruff check . && uv run ruff format— clean