Skip to content

observability: phase 6.1 concurrency-safe observer (PR-A)#22

Merged
chris-colinsky merged 2 commits into
mainfrom
feature/phase-6-1-concurrency-safe-observer
May 8, 2026
Merged

observability: phase 6.1 concurrency-safe observer (PR-A)#22
chris-colinsky merged 2 commits into
mainfrom
feature/phase-6-1-concurrency-safe-observer

Conversation

@chris-colinsky
Copy link
Copy Markdown
Member

Summary

PR-A of Phase 6.1, the architectural refactor that closes the three
limitations Phase 6.0 explicitly deferred:

  1. Concurrency-safe state scoping. OTelObserver internal
    span maps are outer-keyed by invocation_id (per spec §5.1) so
    a single observer instance is safe to share across concurrent
    invocations. The original plan-of-record said correlation_id;
    that broke test_phase5_fixture_031_span_assertions during
    impl because resume preserves the cid across runs (per §5.6) and
    a cid-keyed map would have resumed runs inherit the prior
    invocation's trace_id, violating §5.1's "each invocation owns
    its own trace_id" contract. Switching to invocation_id closes
    the gap; the cid stays as the cross-run join attribute on every
    span.
  2. Single-handler attach/detach. Parent resolution moves
    entirely to the observer's internal maps within each event
    handler's scope. _OpenSpan no longer carries an OTel context
    token; spans open with
    context=set_span_in_context(parent_span) directly. The
    round-7 try/except ValueError guards on cross-event
    detach() are deleted because the underlying hazard goes
    away — tokens never cross event boundaries.
  3. §5.5 LLM-span parent attribution. Three new ContextVars in
    observability/correlation.py (current_namespace_prefix,
    current_fan_out_index, current_attempt_index) carry the
    calling node's identity to _make_llm_event;
    _LlmEventState gains corresponding calling_* fields; the
    observer's _resolve_llm_parent looks up the calling node's
    span by the full _StackKey triple. Under concurrent fan-out
    • retry the LLM span parents under exactly the right calling
      node regardless of dispatch ordering or which sibling instance
      happens to be on the OTel current-span stack.

The three are landed together because they share the same
primitive: parents come from the observer's internal maps within a
single event handler's scope. Splitting would ship intermediate
states that fail their own correctness story (see the spec agent's
ack in
openarmature-coord/threads/phase-6-1-observability-conformance-fillin/02-spec-plan-review.md
for the precondition framing).

Tests

Four new tests in tests/unit/test_observability_otel.py:

  • test_shared_observer_concurrent_invocations_dont_collide
    asyncio.gather([invoke()] * 5) on a single shared observer;
    asserts N distinct trace_id values and that each trace's
    span set is exactly the expected three nodes (invocation +
    node_a + node_b).
  • test_concurrent_fan_out_no_lifo_violation — the explicit
    regression check spec asked for. Drives 5-item fan-out under
    warnings.catch_warnings("error") so the path that previously
    needed the round-7 try/except ValueError: pass guards
    surfaces as a fail rather than a silent swallow if anything in
    the new architecture were to trigger it. Stays quiet — the
    underlying hazard is gone.
  • test_concurrent_fan_out_llm_spans_parent_under_calling_instance
    4-instance fan-out, each instance's body calls
    OpenAIProvider.complete() (mocked via httpx.MockTransport);
    asserts every LLM span parents under its own ask instance,
    never a sibling, with N distinct parent span_ids for N calls.
  • test_llm_call_inside_retried_node_parents_per_attempt
    the test from spec's attempt_index flag in
    04-spec-attempt-index-flag.md. Node wrapped with
    RetryMiddleware(max_attempts=3); body calls complete() on
    each attempt (two failures + one success); asserts each LLM span
    parents under THAT attempt's flaky span, not a hardcoded
    attempt_index=0.

Test plan

  • uv run pytest -q — 387 passed, 9 skipped
  • uv run pyright — 0 errors
  • uv run ruff check . && uv run ruff format — clean
  • Existing 31 unit tests + 5 conformance fixtures preserved

Phase 6.0 left three architectural limitations: OTelObserver state
collided across concurrent invocations on a shared instance, OTel
attach tokens spanned event boundaries (LIFO violations under
interleaved fan-out, suppressed by try/except guards in round-7),
and LLM spans had no calling-node identity threaded through, so
parent resolution fell back to the OTel current-span context and
mis-attributed under concurrent fan-out + retry. Spec agent's ack
folded the three into one coherent refactor — they share the same
primitive ("parents come from the observer's internal maps within
a single event handler's scope") and split would ship intermediate
states that fail their own correctness story.

Three new ContextVars in observability/correlation.py carry the
calling-node identity: current_namespace_prefix (default ()),
current_fan_out_index (default None), current_attempt_index
(default 0). The engine sets the first two in the outer per-node
scope of _step_function_node / _step_subgraph_node /
_step_fan_out_node; attempt_index lives inside innermost so retry
middleware that re-enters the inner closure bumps the value per
attempt. _LlmEventState gains calling_* fields populated from the
ContextVars at dispatch time.

OTelObserver state is now outer-keyed by invocation_id, not
correlation_id. Resume preserves the cid across runs (per spec
§5.6 cross-run join), so a cid-keyed map would have the resumed
run's spans inherit the prior invocation's trace_id, violating
§5.1's "each invocation has its own trace_id". Caught during impl
by test_phase5_fixture_031_span_assertions. The cid is still set
as openarmature.correlation_id on every span — only the state-
scoping key changed.

Parent resolution moves entirely to internal maps. _OpenSpan no
longer carries an OTel context token; each event handler reads
the parent from per-invocation open_spans / subgraph_spans /
detached_roots and starts the span with
context=set_span_in_context(parent). The round-7 try/except
ValueError guards on cross-event detach are deleted (no tokens
cross events). The close-prior-correlation_id branch in
_handle_started is deleted (per-invocation scoping makes it
unreachable). "Concurrency model — NOT safe" warning and
cross-event attach-token notes come out of the class docstring.

LLM-span parent resolution uses calling-node identity directly:
_resolve_llm_parent looks up open_spans by the full _StackKey
(calling_namespace_prefix, calling_attempt_index,
calling_fan_out_index), then walks ancestors for subgraph dispatch
or detached roots, then falls back to the invocation span. Under
concurrent fan-out + retry the LLM span lands under exactly the
right calling node regardless of dispatch ordering or which
sibling instance's span happens to be on the OTel current-span
stack.

Tests added in tests/unit/test_observability_otel.py:

- shared-observer concurrent invocations (asyncio.gather × 5):
  N distinct trace_ids and clean per-invocation span trees.
- concurrent fan-out (4 instances) each calling LLM: every LLM
  span parents under its own calling instance, not a sibling.
- LIFO regression check under warnings.catch_warnings("error"):
  the path that previously needed try/except guards stays quiet
  under the new architecture rather than being suppressed.
- LLM call inside retried node (max_attempts=3, two failures +
  one success): each LLM span parents under its own attempt's
  span, not a hardcoded attempt 0.

387 tests pass (4 new). Pyright clean.
Copilot AI review requested due to automatic review settings May 8, 2026 07:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Phase 6.1 of the observability refactor by making OTelObserver safe to share across concurrent invocations (scoping internal span state by invocation_id), removing cross-event OpenTelemetry context attach/detach tokens, and adding calling-node identity propagation via new ContextVars so §5.5 LLM spans parent under the correct node attempt under fan-out + retry.

Changes:

  • Refactor OTelObserver to store all per-run span tracking in per-invocation_id state containers and resolve parents from internal maps (no cross-event OTel tokens).
  • Add ContextVars (current_namespace_prefix, current_fan_out_index, current_attempt_index) set by the engine to carry calling-node identity into LLM provider events.
  • Add/expand unit tests covering concurrent invocations, fan-out interleaving, LLM parent attribution under fan-out, and per-attempt attribution under retry.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/test_observability_otel.py Adds Phase 6.1 regression + concurrency tests for span isolation and correct LLM parenting.
src/openarmature/observability/otel/observer.py Refactors OTel observer to per-invocation state, removes attach/detach, and implements LLM parent resolution via calling identity.
src/openarmature/observability/correlation.py Introduces new calling-node ContextVars and exports them.
src/openarmature/observability/init.py Re-exports the new correlation/identity accessors.
src/openarmature/llm/providers/openai.py Extends _LlmEventState with calling-node identity and populates it from the new ContextVars.
src/openarmature/graph/compiled.py Sets/resets calling-node identity ContextVars around node execution and per-attempt attempt_index scope.
Comments suppressed due to low confidence (1)

src/openarmature/observability/otel/observer.py:339

  • checkpoint_saved spans are currently parented via _resolve_parent_context, but the triggering node span has already been popped from inv_state.open_spans in _handle_completed. As a result, openarmature.checkpoint.save will fall back to the invocation/subgraph span instead of the node that triggered the save (contradicting this method’s docstring). Consider storing the just-completed node span context (or span_id) long enough for the subsequent checkpoint_saved event to attach correctly, e.g., keep a short-lived map keyed by _StackKey or emit the save span directly from _handle_completed when appropriate.
        invocation_id = current_invocation_id()
        if invocation_id is None:
            return
        inv_state = self._inv_states.get(invocation_id)
        if inv_state is None:
            return
        parent_ctx = self._resolve_parent_context(inv_state, invocation_id, event)
        attrs: dict[str, Any] = {
            "openarmature.checkpoint.save_node": event.node_name,
        }
        cid = current_correlation_id()
        if cid is not None:
            attrs["openarmature.correlation_id"] = cid
        span = self._tracer.start_span(
            name="openarmature.checkpoint.save",
            context=cast("Any", parent_ctx),
            kind=SpanKind.INTERNAL,
            attributes=attrs,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openarmature/observability/otel/observer.py
Comment thread src/openarmature/observability/otel/observer.py Outdated
Comment thread src/openarmature/llm/providers/openai.py Outdated
- Fix _fan_out_node_span_key hardcoding attempt_index=0. Fan-out
  nodes wrapped with retry middleware open at the bumped
  attempt_index, so the lookup missed under retry and the
  detached-instance Link wasn't added to the §4.4 cross-trace
  navigation. Replaced with _find_fan_out_node_span scan that
  finds the in-flight fan-out span at prefix regardless of
  attempt_index (only one such entry is open at a time).
- Update close_invocation docstring to be honest about the
  invocation_id sourcing constraint (CompiledGraph.invoke doesn't
  return it; the ContextVar is reset before control returns) and
  recommend shutdown() as the typical-production lifecycle hook.
  The deeper UX gap (long-lived observers without periodic
  shutdown accumulate per-invocation residue) is tracked in
  phase-6-1-conformance-fillin.md for follow-up.
- Fix stale "now-cid-scoped" wording in openai.py — observer
  state is invocation_id-keyed in the merged code.
@chris-colinsky chris-colinsky merged commit c2c1eda into main May 8, 2026
5 checks passed
@chris-colinsky chris-colinsky deleted the feature/phase-6-1-concurrency-safe-observer branch May 8, 2026 07:30
chris-colinsky added a commit that referenced this pull request May 29, 2026
* Bump spec pin to v0.31.0 + 0039/0041 fixtures

Bumps the spec submodule from v0.27.1 to v0.31.0 (proposals 0037
Anthropic, 0039 caller invocation_id, 0040 open-span, 0041 reserved
keys) and wires conformance to python's current state for v0.11.0.

conformance.toml: spec_pin to v0.31.0 + entries for 0037 / 0039 /
0040 / 0041. 0039 and 0041 are implemented since 0.11.0; 0037 and
0040 stay not-yet (Anthropic provider + #22 open-span are out of
scope for this PR). __spec_version__ in src and pyproject likewise
bumps; AGENTS.md regenerated.

Conformance harness updates: defer Anthropic fixtures (llm-provider/
033-043) in both the cross-capability parser and the llm-provider
runner; defer observability/034 (waits on #22); defer observability/
035-036 from the cross-capability parser (the langfuse_trace shape
isn't modeled; the derivation is pinned by unit tests against the
same vector). The _run_fixture_028 runner recognises both the 0034
prefix-rejection and 0041 exact-name rejection patterns; the
mid-invocation augment_metadata case waits on #22's harness
primitive.

Full suite green: 962 passed, 170 skipped.

* Align conformance.toml note + visible 028 skip

From PR #95 review:

- Rewrite the conformance.toml convention block to acknowledge that
  feature PRs which bump the spec submodule pin update this file
  too, setting `since` to the upcoming release version (matches the
  v0.10.0 cycle's pattern in PR #85 / #88, and how this PR works).
- Emit warnings.warn for 028's deferred augment_metadata case so
  pytest's end-of-run summary surfaces the deferred coverage by
  name rather than silently passing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants