observability: phase 6.1 concurrency-safe observer (PR-A) by chris-colinsky · Pull Request #22 · LunarCommand/openarmature-python

chris-colinsky · 2026-05-08T07:02:39Z

Summary

PR-A of Phase 6.1, the architectural refactor that closes the three
limitations Phase 6.0 explicitly deferred:

Concurrency-safe state scoping. OTelObserver internal
span maps are outer-keyed by invocation_id (per spec §5.1) so
a single observer instance is safe to share across concurrent
invocations. The original plan-of-record said correlation_id;
that broke test_phase5_fixture_031_span_assertions during
impl because resume preserves the cid across runs (per §5.6) and
a cid-keyed map would have resumed runs inherit the prior
invocation's trace_id, violating §5.1's "each invocation owns
its own trace_id" contract. Switching to invocation_id closes
the gap; the cid stays as the cross-run join attribute on every
span.
Single-handler attach/detach. Parent resolution moves
entirely to the observer's internal maps within each event
handler's scope. _OpenSpan no longer carries an OTel context
token; spans open with
context=set_span_in_context(parent_span) directly. The
round-7 try/except ValueError guards on cross-event
detach() are deleted because the underlying hazard goes
away — tokens never cross event boundaries.
§5.5 LLM-span parent attribution. Three new ContextVars in
observability/correlation.py (current_namespace_prefix,
current_fan_out_index, current_attempt_index) carry the
calling node's identity to _make_llm_event;
_LlmEventState gains corresponding calling_* fields; the
observer's _resolve_llm_parent looks up the calling node's
span by the full _StackKey triple. Under concurrent fan-out
- retry the LLM span parents under exactly the right calling
  node regardless of dispatch ordering or which sibling instance
  happens to be on the OTel current-span stack.

The three are landed together because they share the same
primitive: parents come from the observer's internal maps within a
single event handler's scope. Splitting would ship intermediate
states that fail their own correctness story (see the spec agent's
ack in
openarmature-coord/threads/phase-6-1-observability-conformance-fillin/02-spec-plan-review.md
for the precondition framing).

Tests

Four new tests in tests/unit/test_observability_otel.py:

test_shared_observer_concurrent_invocations_dont_collide —
asyncio.gather([invoke()] * 5) on a single shared observer;
asserts N distinct trace_id values and that each trace's
span set is exactly the expected three nodes (invocation +
node_a + node_b).
test_concurrent_fan_out_no_lifo_violation — the explicit
regression check spec asked for. Drives 5-item fan-out under
warnings.catch_warnings("error") so the path that previously
needed the round-7 try/except ValueError: pass guards
surfaces as a fail rather than a silent swallow if anything in
the new architecture were to trigger it. Stays quiet — the
underlying hazard is gone.
test_concurrent_fan_out_llm_spans_parent_under_calling_instance —
4-instance fan-out, each instance's body calls
OpenAIProvider.complete() (mocked via httpx.MockTransport);
asserts every LLM span parents under its own ask instance,
never a sibling, with N distinct parent span_ids for N calls.
test_llm_call_inside_retried_node_parents_per_attempt —
the test from spec's attempt_index flag in
04-spec-attempt-index-flag.md. Node wrapped with
RetryMiddleware(max_attempts=3); body calls complete() on
each attempt (two failures + one success); asserts each LLM span
parents under THAT attempt's flaky span, not a hardcoded
attempt_index=0.

Test plan

uv run pytest -q — 387 passed, 9 skipped
uv run pyright — 0 errors
uv run ruff check . && uv run ruff format — clean
Existing 31 unit tests + 5 conformance fixtures preserved

Phase 6.0 left three architectural limitations: OTelObserver state collided across concurrent invocations on a shared instance, OTel attach tokens spanned event boundaries (LIFO violations under interleaved fan-out, suppressed by try/except guards in round-7), and LLM spans had no calling-node identity threaded through, so parent resolution fell back to the OTel current-span context and mis-attributed under concurrent fan-out + retry. Spec agent's ack folded the three into one coherent refactor — they share the same primitive ("parents come from the observer's internal maps within a single event handler's scope") and split would ship intermediate states that fail their own correctness story. Three new ContextVars in observability/correlation.py carry the calling-node identity: current_namespace_prefix (default ()), current_fan_out_index (default None), current_attempt_index (default 0). The engine sets the first two in the outer per-node scope of _step_function_node / _step_subgraph_node / _step_fan_out_node; attempt_index lives inside innermost so retry middleware that re-enters the inner closure bumps the value per attempt. _LlmEventState gains calling_* fields populated from the ContextVars at dispatch time. OTelObserver state is now outer-keyed by invocation_id, not correlation_id. Resume preserves the cid across runs (per spec §5.6 cross-run join), so a cid-keyed map would have the resumed run's spans inherit the prior invocation's trace_id, violating §5.1's "each invocation has its own trace_id". Caught during impl by test_phase5_fixture_031_span_assertions. The cid is still set as openarmature.correlation_id on every span — only the state- scoping key changed. Parent resolution moves entirely to internal maps. _OpenSpan no longer carries an OTel context token; each event handler reads the parent from per-invocation open_spans / subgraph_spans / detached_roots and starts the span with context=set_span_in_context(parent). The round-7 try/except ValueError guards on cross-event detach are deleted (no tokens cross events). The close-prior-correlation_id branch in _handle_started is deleted (per-invocation scoping makes it unreachable). "Concurrency model — NOT safe" warning and cross-event attach-token notes come out of the class docstring. LLM-span parent resolution uses calling-node identity directly: _resolve_llm_parent looks up open_spans by the full _StackKey (calling_namespace_prefix, calling_attempt_index, calling_fan_out_index), then walks ancestors for subgraph dispatch or detached roots, then falls back to the invocation span. Under concurrent fan-out + retry the LLM span lands under exactly the right calling node regardless of dispatch ordering or which sibling instance's span happens to be on the OTel current-span stack. Tests added in tests/unit/test_observability_otel.py: - shared-observer concurrent invocations (asyncio.gather × 5): N distinct trace_ids and clean per-invocation span trees. - concurrent fan-out (4 instances) each calling LLM: every LLM span parents under its own calling instance, not a sibling. - LIFO regression check under warnings.catch_warnings("error"): the path that previously needed try/except guards stays quiet under the new architecture rather than being suppressed. - LLM call inside retried node (max_attempts=3, two failures + one success): each LLM span parents under its own attempt's span, not a hardcoded attempt 0. 387 tests pass (4 new). Pyright clean.

Copilot

Pull request overview

This PR implements Phase 6.1 of the observability refactor by making OTelObserver safe to share across concurrent invocations (scoping internal span state by invocation_id), removing cross-event OpenTelemetry context attach/detach tokens, and adding calling-node identity propagation via new ContextVars so §5.5 LLM spans parent under the correct node attempt under fan-out + retry.

Changes:

Refactor OTelObserver to store all per-run span tracking in per-invocation_id state containers and resolve parents from internal maps (no cross-event OTel tokens).
Add ContextVars (current_namespace_prefix, current_fan_out_index, current_attempt_index) set by the engine to carry calling-node identity into LLM provider events.
Add/expand unit tests covering concurrent invocations, fan-out interleaving, LLM parent attribution under fan-out, and per-attempt attribution under retry.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit/test_observability_otel.py	Adds Phase 6.1 regression + concurrency tests for span isolation and correct LLM parenting.
src/openarmature/observability/otel/observer.py	Refactors OTel observer to per-invocation state, removes attach/detach, and implements LLM parent resolution via calling identity.
src/openarmature/observability/correlation.py	Introduces new calling-node `ContextVar`s and exports them.
src/openarmature/observability/init.py	Re-exports the new correlation/identity accessors.
src/openarmature/llm/providers/openai.py	Extends `_LlmEventState` with calling-node identity and populates it from the new `ContextVar`s.
src/openarmature/graph/compiled.py	Sets/resets calling-node identity `ContextVar`s around node execution and per-attempt attempt_index scope.

Comments suppressed due to low confidence (1)

src/openarmature/observability/otel/observer.py:339

checkpoint_saved spans are currently parented via _resolve_parent_context, but the triggering node span has already been popped from inv_state.open_spans in _handle_completed. As a result, openarmature.checkpoint.save will fall back to the invocation/subgraph span instead of the node that triggered the save (contradicting this method’s docstring). Consider storing the just-completed node span context (or span_id) long enough for the subsequent checkpoint_saved event to attach correctly, e.g., keep a short-lived map keyed by _StackKey or emit the save span directly from _handle_completed when appropriate.

        invocation_id = current_invocation_id()
        if invocation_id is None:
            return
        inv_state = self._inv_states.get(invocation_id)
        if inv_state is None:
            return
        parent_ctx = self._resolve_parent_context(inv_state, invocation_id, event)
        attrs: dict[str, Any] = {
            "openarmature.checkpoint.save_node": event.node_name,
        }
        cid = current_correlation_id()
        if cid is not None:
            attrs["openarmature.correlation_id"] = cid
        span = self._tracer.start_span(
            name="openarmature.checkpoint.save",
            context=cast("Any", parent_ctx),
            kind=SpanKind.INTERNAL,
            attributes=attrs,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix _fan_out_node_span_key hardcoding attempt_index=0. Fan-out nodes wrapped with retry middleware open at the bumped attempt_index, so the lookup missed under retry and the detached-instance Link wasn't added to the §4.4 cross-trace navigation. Replaced with _find_fan_out_node_span scan that finds the in-flight fan-out span at prefix regardless of attempt_index (only one such entry is open at a time). - Update close_invocation docstring to be honest about the invocation_id sourcing constraint (CompiledGraph.invoke doesn't return it; the ContextVar is reset before control returns) and recommend shutdown() as the typical-production lifecycle hook. The deeper UX gap (long-lived observers without periodic shutdown accumulate per-invocation residue) is tracked in phase-6-1-conformance-fillin.md for follow-up. - Fix stale "now-cid-scoped" wording in openai.py — observer state is invocation_id-keyed in the merged code.

* Bump spec pin to v0.31.0 + 0039/0041 fixtures Bumps the spec submodule from v0.27.1 to v0.31.0 (proposals 0037 Anthropic, 0039 caller invocation_id, 0040 open-span, 0041 reserved keys) and wires conformance to python's current state for v0.11.0. conformance.toml: spec_pin to v0.31.0 + entries for 0037 / 0039 / 0040 / 0041. 0039 and 0041 are implemented since 0.11.0; 0037 and 0040 stay not-yet (Anthropic provider + #22 open-span are out of scope for this PR). __spec_version__ in src and pyproject likewise bumps; AGENTS.md regenerated. Conformance harness updates: defer Anthropic fixtures (llm-provider/ 033-043) in both the cross-capability parser and the llm-provider runner; defer observability/034 (waits on #22); defer observability/ 035-036 from the cross-capability parser (the langfuse_trace shape isn't modeled; the derivation is pinned by unit tests against the same vector). The _run_fixture_028 runner recognises both the 0034 prefix-rejection and 0041 exact-name rejection patterns; the mid-invocation augment_metadata case waits on #22's harness primitive. Full suite green: 962 passed, 170 skipped. * Align conformance.toml note + visible 028 skip From PR #95 review: - Rewrite the conformance.toml convention block to acknowledge that feature PRs which bump the spec submodule pin update this file too, setting `since` to the upcoming release version (matches the v0.10.0 cycle's pattern in PR #85 / #88, and how this PR works). - Emit warnings.warn for 028's deferred augment_metadata case so pytest's end-of-run summary surfaces the deferred coverage by name rather than silently passing.

Copilot AI review requested due to automatic review settings May 8, 2026 07:02

Copilot started reviewing on behalf of chris-colinsky May 8, 2026 07:03 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread src/openarmature/observability/otel/observer.py

Comment thread src/openarmature/observability/otel/observer.py Outdated

Comment thread src/openarmature/llm/providers/openai.py Outdated

chris-colinsky merged commit c2c1eda into main May 8, 2026
5 checks passed

chris-colinsky deleted the feature/phase-6-1-concurrency-safe-observer branch May 8, 2026 07:30

chris-colinsky mentioned this pull request May 10, 2026

observability: phase 6.1 PR-C.3 — prepare_sync + fixture 010 #27

Merged

5 tasks

chris-colinsky mentioned this pull request May 29, 2026

Bump spec pin to v0.31.0 + 0039/0041 fixtures #95

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

observability: phase 6.1 concurrency-safe observer (PR-A)#22

observability: phase 6.1 concurrency-safe observer (PR-A)#22
chris-colinsky merged 2 commits into
mainfrom
feature/phase-6-1-concurrency-safe-observer

chris-colinsky commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chris-colinsky commented May 8, 2026

Summary

Tests

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants