-
Notifications
You must be signed in to change notification settings - Fork 547
fix(grpc): Fix AttributeError when instrumenting with OTel #4405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
|
Files with missing lines | Patch % | Lines |
---|---|---|
sentry_sdk/integrations/grpc/__init__.py | 75.00% | 2 Missing |
Additional details and impacted files
@@ Coverage Diff @@
## master #4405 +/- ##
==========================================
- Coverage 80.65% 80.60% -0.05%
==========================================
Files 142 142
Lines 15966 15973 +7
Branches 2726 2727 +1
==========================================
- Hits 12877 12875 -2
- Misses 2235 2241 +6
- Partials 854 857 +3
Files with missing lines | Coverage Δ | |
---|---|---|
sentry_sdk/integrations/grpc/__init__.py | 90.12% <75.00%> (-1.77%) |
⬇️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a long comment, I think we should consider fixing this problem upstream in Otel rather than in our SDK
# opentelemetry https://github.com/getsentry/sentry-python/issues/4389 | ||
# However, prior to grpc 1.42.0, only tuples were accepted, so we | ||
# have no choice there. | ||
if GRPC_VERSION is not None and GRPC_VERSION < (1, 42, 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going with a list instead of the tuple
we have been using successfully so far seems like a potentially risky change to me. We could see unexpected consequences, especially if something other than our SDK and OTel, which expects a tuple, is patching gRPC as well.
Other potential solution
I'd prefer instead that we keep the existing behavior by default. We could then wrap the call to the original function in a try
/except
, calling it again with a list if there is an AttributeError
.
There's a strong argument that we should not change anything in the SDK
There is a strong argument that we should not attempt to fix this bug in our SDK, as this problem seems to arise due to a bug in the OpenTelemetry instrumentation.
The reason I say this is that the grpcio
library types the interceptors
parameter as Optional[Sequence[Any]]
, and Sequence
is the abstract base class for immutable sequences. The OpenTelemetry instrumentation, however, makes the mistake of assuming that the interceptors
array is mutable.
My conclusion
So, perhaps, we should keep our SDK as is. Tuples are instances of collections.abc.Sequence
, so we are respecting gRPC's API; if something else is patching gRPC in a way that does not expect its API, then that other thing should get fixed. It should be an easy fix in Otel, perhaps we can just close this PR and contribute a fix over there?
Although I guess we can also make a fix in our SDK – I'd just recommend we try a lower-risk way of fixing the problem then universally switching over to using an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're already using an array in the wrapper for the sync server which does essentially the same thing (compare the two). Yes, this is technically on OTel, but if we make folks' lives better with a small fix (and there's even precedent for it), why not do it.
I don't understand the typing argument -- a list
is also a Sequence
.
Re: trying to call the original function and calling it again if there's an AttributeError
: we'd be changing the behavior of the original program with that. Imagine if the original function changes the state of something at the beginning, and only runs into the AttributeError
at the end. We'd be re-executing everything before the AttributeError
with the second call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the typing argument -- a list is also a Sequence.
Yes, a list
is also a Sequence
, so it is valid to pass a list
to a function which expects a Sequence
. However, it is also valid to pass any other type of Sequence
to that function, including, for example, a tuple
. So, the code in the function must restrict itself to APIs available on Sequence
; it cannot assume that it is receiving a list
and call methods available on list
and not on Sequence
. This is why this is an Otel bug: Otel is making the assumption that it is receiving a list
, when the API has only guaranteed a Sequence
(which can be list
objects, but can also be a tuple
).
Regarding this point:
but if we make folks' lives better with a small fix (and there's even precedent for it), why not do it
I'd dispute the idea of precedent; the sync code path is a completely separate code path; we don't know whether some other library patches the async version in a way that would break with the changes here.
More fundamentally, my issue is that the Sentry SDK should not be in the business of maintaining compatibility with every different possible way a library can be patched. This would be impossible, since Python allows patching anything arbitrarily. Yes, it is our responsibility to ensure the SDK does not break a users app. But in this case, Otel has improperly patched something and therefore it is Otel which has broken the app.
It should be a very simple fix in Otel's side, I think we should just contribute there (regardless of whether we fix in the Sentry SDK), since we anyways want to be more involved in the Otel community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware that this is an OTel bug. :) And I'll also submit a PR there. That's not mutually exclusive to also making this change here.
If this was a huge diff, I'd be on board with you with fixing this in OTel only. But it's not, and the time we're spending discussing this is honestly already disproportionate to the size and scope of the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szokeasaurusrex please have a look at our SDK principles, especially this one https://develop.sentry.dev/sdk/philosophy/#prioritize-customer-convenience-over-correctness
While it's worth pointing out that this can and should (also) be fixed upstream, we need to make sure to unblock our users first and foremost. Same is true for unblocking other maintainers, by helping them with a review that focusses on what matters most.
Also, remember https://google.github.io/eng-practices/review/reviewer/standard.html
In general, reviewers should favor approving a CL once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect.
That is the senior principle among all of the code review guidelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, thanks @stephanie-anderson!
OTel also wraps
grpc.aio.server
, but expects theinterceptors
arg to be a list, while Sentry turns it into a tuple.Non-tuple sequences are actually only supported starting in grpc 1.42.0. So for older versions we need to still use a tuple.
Fixes #4389