-
Notifications
You must be signed in to change notification settings - Fork 862
Description
Which component is this bug for?
Langchain Instrumentation
π Description
The LangChain instrumentation library has multiple code paths where context_api.attach() is called without a corresponding context_api.detach(), leaving orphaned contexts on the stack. This corrupts the OpenTelemetry context, causing subsequent code to lose access to the active trace context.
Impact
After LangChain/LangGraph execution completes, trace.get_current_span() returns an ended span (with is_recording() == False) instead of the parent span that should be active. This causes:
- Missing trace IDs in logs: Logging systems that check
span.is_recording()before adding trace context seeFalseand skip addingtrace_id/span_id - Broken span hierarchy: Child spans created after LangChain execution may attach to the wrong parent
- Inconsistent tracing: The same application has some logs with trace context and others without
Locations of Orphaned Attaches
File: opentelemetry/instrumentation/langchain/callback_handler.py
1. _create_span() method (lines 263-269)
if metadata is not None:
current_association_properties = (
context_api.get_value("association_properties") or {}
)
try:
context_api.attach( # β No detach, token not saved
context_api.set_value(
"association_properties",
{**current_association_properties, **sanitized_metadata},
)
)
except Exception:
pass2. on_chain_end() method (lines 462-471)
self._end_span(span, run_id)
if parent_run_id is None:
try:
context_api.attach( # β No detach, token not saved
context_api.set_value(
SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, False
)
)
except Exception:
passWorkaround We're Using
We implemented a context manager that saves the OpenTelemetry context before calling LangChain/LangGraph and restores it afterward:
from contextlib import contextmanager
from collections.abc import Generator
from opentelemetry import context as otel_context
@contextmanager
def preserve_otel_context() -> Generator[None]:
"""
Preserve OpenTelemetry context across operations that may detach it.
"""
token = otel_context.attach(otel_context.get_current())
try:
yield
finally:
otel_context.detach(token)Usage:
with preserve_otel_context():
result = await agent.ainvoke(...)
# Context is now correctly restoredThis workaround doesn't fix the library's context leaks but allows us to restore the correct parent span after execution.
π Reproduction steps
- Set up OpenTelemetry with LangChain instrumentation:
from opentelemetry import trace
from opentelemetry.instrumentation.langchain import LangchainInstrumentor
LangchainInstrumentor().instrument()
tracer = trace.get_tracer(__name__)- Create a parent span and invoke a LangChain chain inside it:
with tracer.start_as_current_span("parent_operation"):
# Before LangChain
span_before = trace.get_current_span()
print(f"Before: {span_before.name}, recording={span_before.is_recording()}")
# Invoke any LangChain chain or LangGraph
result = await chain.ainvoke(input_data)
# After LangChain
span_after = trace.get_current_span()
print(f"After: {span_after.name}, recording={span_after.is_recording()}")- Observe the output:
Before: parent_operation, recording=True
After: some_langchain_span, recording=False # β Wrong span, already ended
π Expected behavior
After LangChain/LangGraph execution completes:
trace.get_current_span()should return the parent span (parent_operation)- The parent span should have
is_recording() == True - Subsequent logging or span creation should use the correct parent context
π Actual Behavior with Screenshots
FYI I am including logs instead of screenshots.
After LangChain/LangGraph execution, the context points to an ended LangChain instrumentation span instead of the parent:
[TRACE DEBUG] Before ainvoke - Span: _Span(name="case_worker.process", context=SpanContext(trace_id=0x1c60dc422d09a853510b4c8f688decc7, span_id=0x7349095e5220669a, trace_flags=0x01, trace_state=[], is_remote=False)), Recording: True, Trace ID: 1c60dc422d09a853510b4c8f688decc7
[TRACE DEBUG] Inside preserve_otel_context, before ainvoke - Span: _Span(name="case_worker.process", context=SpanContext(trace_id=0x1c60dc422d09a853510b4c8f688decc7, span_id=0x7349095e5220669a, trace_flags=0x01, trace_state=[], is_remote=False)), Recording: True, Trace ID: 1c60dc422d09a853510b4c8f688decc7
# After LangGraph ainvoke() returns:
[TRACE DEBUG] Inside preserve_otel_context, after ainvoke - Span: _Span(name="judge.task", context=SpanContext(trace_id=0x1c60dc422d09a853510b4c8f688decc7, span_id=0x12ed093291f1de81, trace_flags=0x01, trace_state=[], is_remote=False)), Recording: False, Trace ID: None
# With our workaround, context is restored:
[TRACE DEBUG] After exiting preserve_otel_context - Span: _Span(name="case_worker.process", context=SpanContext(trace_id=0x1c60dc422d09a853510b4c8f688decc7, span_id=0x7349095e5220669a, trace_flags=0x01, trace_state=[], is_remote=False)), Recording: True, Trace ID: 1c60dc422d09a853510b4c8f688decc7
Key observations:
- After
ainvoke(), the current span changed fromcase_worker.processtojudge.task - The
judge.taskspan hasRecording: False(it was ended) - Without our workaround, subsequent logs would have no trace_id because the span is not recording
π€ Python Version
3.13.7
π Provide any additional context for the Bug.
Versions
- opentelemetry-instrumentation-langchain: 0.50.1
- opentelemetry-api: 1.29.0
- langchain-core: 0.3.29
- langgraph: 0.2.63
π Have you spent some time to check if this bug has been raised before?
- I checked and didn't find similar issue
Are you willing to submit PR?
Yes I am willing to submit a PR!