Skip to content

fix(grpc): Fix AttributeError when instrumenting with OTel #4405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 20, 2025

Conversation

sentrivana
Copy link
Contributor

@sentrivana sentrivana commented May 20, 2025

OTel also wraps grpc.aio.server, but expects the interceptors arg to be a list, while Sentry turns it into a tuple.

Non-tuple sequences are actually only supported starting in grpc 1.42.0. So for older versions we need to still use a tuple.

Fixes #4389

@sentrivana sentrivana requested a review from a team as a code owner May 20, 2025 09:31
Copy link

codecov bot commented May 20, 2025

Codecov Report

Attention: Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.60%. Comparing base (9582a48) to head (08385e2).
Report is 1 commits behind head on master.

⚠️ Parser warning

The parser emitted a warning. Please review your JUnit XML file:

Warning while parsing testcase attributes: Limit of string is 1000 chars, for name, we got 2301 at 1:223767 in /home/runner/work/sentry-python/sentry-python/.junitxml
Files with missing lines Patch % Lines
sentry_sdk/integrations/grpc/__init__.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4405      +/-   ##
==========================================
- Coverage   80.65%   80.60%   -0.05%     
==========================================
  Files         142      142              
  Lines       15966    15973       +7     
  Branches     2726     2727       +1     
==========================================
- Hits        12877    12875       -2     
- Misses       2235     2241       +6     
- Partials      854      857       +3     
Files with missing lines Coverage Δ
sentry_sdk/integrations/grpc/__init__.py 90.12% <75.00%> (-1.77%) ⬇️

... and 9 files with indirect coverage changes

@sentrivana sentrivana marked this pull request as draft May 20, 2025 09:36
@sentrivana sentrivana marked this pull request as ready for review May 20, 2025 10:11
Copy link
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a long comment, I think we should consider fixing this problem upstream in Otel rather than in our SDK

# opentelemetry https://github.com/getsentry/sentry-python/issues/4389
# However, prior to grpc 1.42.0, only tuples were accepted, so we
# have no choice there.
if GRPC_VERSION is not None and GRPC_VERSION < (1, 42, 0):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with a list instead of the tuple we have been using successfully so far seems like a potentially risky change to me. We could see unexpected consequences, especially if something other than our SDK and OTel, which expects a tuple, is patching gRPC as well.

Other potential solution

I'd prefer instead that we keep the existing behavior by default. We could then wrap the call to the original function in a try/except, calling it again with a list if there is an AttributeError.

There's a strong argument that we should not change anything in the SDK

There is a strong argument that we should not attempt to fix this bug in our SDK, as this problem seems to arise due to a bug in the OpenTelemetry instrumentation.

The reason I say this is that the grpcio library types the interceptors parameter as Optional[Sequence[Any]], and Sequence is the abstract base class for immutable sequences. The OpenTelemetry instrumentation, however, makes the mistake of assuming that the interceptors array is mutable.

My conclusion

So, perhaps, we should keep our SDK as is. Tuples are instances of collections.abc.Sequence, so we are respecting gRPC's API; if something else is patching gRPC in a way that does not expect its API, then that other thing should get fixed. It should be an easy fix in Otel, perhaps we can just close this PR and contribute a fix over there?

Although I guess we can also make a fix in our SDK – I'd just recommend we try a lower-risk way of fixing the problem then universally switching over to using an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already using an array in the wrapper for the sync server which does essentially the same thing (compare the two). Yes, this is technically on OTel, but if we make folks' lives better with a small fix (and there's even precedent for it), why not do it.

I don't understand the typing argument -- a list is also a Sequence.

Re: trying to call the original function and calling it again if there's an AttributeError: we'd be changing the behavior of the original program with that. Imagine if the original function changes the state of something at the beginning, and only runs into the AttributeError at the end. We'd be re-executing everything before the AttributeError with the second call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the typing argument -- a list is also a Sequence.

Yes, a list is also a Sequence, so it is valid to pass a list to a function which expects a Sequence. However, it is also valid to pass any other type of Sequence to that function, including, for example, a tuple. So, the code in the function must restrict itself to APIs available on Sequence; it cannot assume that it is receiving a list and call methods available on list and not on Sequence. This is why this is an Otel bug: Otel is making the assumption that it is receiving a list, when the API has only guaranteed a Sequence (which can be list objects, but can also be a tuple).


Regarding this point:

but if we make folks' lives better with a small fix (and there's even precedent for it), why not do it

I'd dispute the idea of precedent; the sync code path is a completely separate code path; we don't know whether some other library patches the async version in a way that would break with the changes here.

More fundamentally, my issue is that the Sentry SDK should not be in the business of maintaining compatibility with every different possible way a library can be patched. This would be impossible, since Python allows patching anything arbitrarily. Yes, it is our responsibility to ensure the SDK does not break a users app. But in this case, Otel has improperly patched something and therefore it is Otel which has broken the app.

It should be a very simple fix in Otel's side, I think we should just contribute there (regardless of whether we fix in the Sentry SDK), since we anyways want to be more involved in the Otel community.

Copy link
Contributor Author

@sentrivana sentrivana May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware that this is an OTel bug. :) And I'll also submit a PR there. That's not mutually exclusive to also making this change here.

If this was a huge diff, I'd be on board with you with fixing this in OTel only. But it's not, and the time we're spending discussing this is honestly already disproportionate to the size and scope of the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szokeasaurusrex please have a look at our SDK principles, especially this one https://develop.sentry.dev/sdk/philosophy/#prioritize-customer-convenience-over-correctness

While it's worth pointing out that this can and should (also) be fixed upstream, we need to make sure to unblock our users first and foremost. Same is true for unblocking other maintainers, by helping them with a review that focusses on what matters most.

Also, remember https://google.github.io/eng-practices/review/reviewer/standard.html

In general, reviewers should favor approving a CL once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect.

That is the senior principle among all of the code review guidelines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, thanks @stephanie-anderson!

@sentrivana sentrivana merged commit 385c668 into master May 20, 2025
136 of 137 checks passed
@sentrivana sentrivana deleted the ivana/fix-grpc branch May 20, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenTelemetry instrumentation with Sentry GRPCIntegration raises an exception
4 participants