Skip to content

fix(grpc): Fix AttributeError when instrumenting with OTel #4405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 20, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion sentry_sdk/integrations/grpc/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from grpc.aio import Server as AsyncServer

from sentry_sdk.integrations import Integration
from sentry_sdk.utils import parse_version

from .client import ClientInterceptor
from .server import ServerInterceptor
Expand Down Expand Up @@ -41,6 +42,8 @@ def __getitem__(self, _):

P = ParamSpec("P")

GRPC_VERSION = parse_version(grpc.__version__)


def _wrap_channel_sync(func: Callable[P, Channel]) -> Callable[P, Channel]:
"Wrapper for synchronous secure and insecure channel."
Expand Down Expand Up @@ -127,7 +130,21 @@ def patched_aio_server( # type: ignore
**kwargs: P.kwargs,
) -> Server:
server_interceptor = AsyncServerInterceptor()
interceptors = (server_interceptor, *(interceptors or []))
interceptors = [
server_interceptor,
*(interceptors or []),
] # type: Sequence[grpc.ServerInterceptor]

try:
# We prefer interceptors as a list because of compatibility with
# opentelemetry https://github.com/getsentry/sentry-python/issues/4389
# However, prior to grpc 1.42.0, only tuples were accepted, so we
# have no choice there.
if GRPC_VERSION is not None and GRPC_VERSION < (1, 42, 0):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with a list instead of the tuple we have been using successfully so far seems like a potentially risky change to me. We could see unexpected consequences, especially if something other than our SDK and OTel, which expects a tuple, is patching gRPC as well.

Other potential solution

I'd prefer instead that we keep the existing behavior by default. We could then wrap the call to the original function in a try/except, calling it again with a list if there is an AttributeError.

There's a strong argument that we should not change anything in the SDK

There is a strong argument that we should not attempt to fix this bug in our SDK, as this problem seems to arise due to a bug in the OpenTelemetry instrumentation.

The reason I say this is that the grpcio library types the interceptors parameter as Optional[Sequence[Any]], and Sequence is the abstract base class for immutable sequences. The OpenTelemetry instrumentation, however, makes the mistake of assuming that the interceptors array is mutable.

My conclusion

So, perhaps, we should keep our SDK as is. Tuples are instances of collections.abc.Sequence, so we are respecting gRPC's API; if something else is patching gRPC in a way that does not expect its API, then that other thing should get fixed. It should be an easy fix in Otel, perhaps we can just close this PR and contribute a fix over there?

Although I guess we can also make a fix in our SDK – I'd just recommend we try a lower-risk way of fixing the problem then universally switching over to using an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already using an array in the wrapper for the sync server which does essentially the same thing (compare the two). Yes, this is technically on OTel, but if we make folks' lives better with a small fix (and there's even precedent for it), why not do it.

I don't understand the typing argument -- a list is also a Sequence.

Re: trying to call the original function and calling it again if there's an AttributeError: we'd be changing the behavior of the original program with that. Imagine if the original function changes the state of something at the beginning, and only runs into the AttributeError at the end. We'd be re-executing everything before the AttributeError with the second call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the typing argument -- a list is also a Sequence.

Yes, a list is also a Sequence, so it is valid to pass a list to a function which expects a Sequence. However, it is also valid to pass any other type of Sequence to that function, including, for example, a tuple. So, the code in the function must restrict itself to APIs available on Sequence; it cannot assume that it is receiving a list and call methods available on list and not on Sequence. This is why this is an Otel bug: Otel is making the assumption that it is receiving a list, when the API has only guaranteed a Sequence (which can be list objects, but can also be a tuple).


Regarding this point:

but if we make folks' lives better with a small fix (and there's even precedent for it), why not do it

I'd dispute the idea of precedent; the sync code path is a completely separate code path; we don't know whether some other library patches the async version in a way that would break with the changes here.

More fundamentally, my issue is that the Sentry SDK should not be in the business of maintaining compatibility with every different possible way a library can be patched. This would be impossible, since Python allows patching anything arbitrarily. Yes, it is our responsibility to ensure the SDK does not break a users app. But in this case, Otel has improperly patched something and therefore it is Otel which has broken the app.

It should be a very simple fix in Otel's side, I think we should just contribute there (regardless of whether we fix in the Sentry SDK), since we anyways want to be more involved in the Otel community.

Copy link
Contributor Author

@sentrivana sentrivana May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware that this is an OTel bug. :) And I'll also submit a PR there. That's not mutually exclusive to also making this change here.

If this was a huge diff, I'd be on board with you with fixing this in OTel only. But it's not, and the time we're spending discussing this is honestly already disproportionate to the size and scope of the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szokeasaurusrex please have a look at our SDK principles, especially this one https://develop.sentry.dev/sdk/philosophy/#prioritize-customer-convenience-over-correctness

While it's worth pointing out that this can and should (also) be fixed upstream, we need to make sure to unblock our users first and foremost. Same is true for unblocking other maintainers, by helping them with a review that focusses on what matters most.

Also, remember https://google.github.io/eng-practices/review/reviewer/standard.html

In general, reviewers should favor approving a CL once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect.

That is the senior principle among all of the code review guidelines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, thanks @stephanie-anderson!

interceptors = tuple(interceptors)
except Exception:
pass

return func(*args, interceptors=interceptors, **kwargs) # type: ignore

return patched_aio_server # type: ignore
Expand Down
Loading