-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EXPORTER] Support handling retry-able errors for OTLP/gRPC #3219
Conversation
✅ Deploy Preview for opentelemetry-cpp-api-docs canceled.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3219 +/- ##
==========================================
- Coverage 87.78% 87.77% -0.01%
==========================================
Files 198 198
Lines 6308 6308
==========================================
- Hits 5537 5536 -1
- Misses 771 772 +1 |
dd0c9db
to
c4ee1dd
Compare
{ | ||
TestTraceService(std::vector<grpc::StatusCode> status_codes) : status_codes_(status_codes) {} | ||
|
||
inline grpc::Status Export( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem possible, at least not in a truthful manner, to test this by mocking Export()
on the client side.
Instead, the real exporter is used for this test and the client retry behavior is directly observed on the server side.
@@ -357,6 +363,205 @@ TEST_F(OtlpGrpcExporterTestPeer, ConfigUnknownInsecureFromEnv) | |||
} | |||
# endif | |||
|
|||
# ifndef NO_GETENV | |||
TEST_F(OtlpGrpcExporterTestPeer, ConfigRetryDefaultValues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults matching what is currently implemented in other flavors of OTel (verified dotnet, java, and js). I don't believe rust has a retry policy in place, though.
c4ee1dd
to
651c1da
Compare
dd7ec38
to
7d422cd
Compare
std::uint32_t retry_policy_max_attempts{}; | ||
|
||
/** The initial backoff delay between retry attempts, random between (0, initial_backoff). */ | ||
float retry_policy_initial_backoff{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use std::chrono::duration<>
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that as a chrono duration initially, but it was not really of any use for otlp/grpc since it just gets passed down to the service config, so it was moved to otlp/http, where it is being required to perform some computations for the backoff.
FYI, implementation in previous commit was like this: cb14857
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, the exporting of both otlp/http and otlp/grpc will cost much more CPU than type conversion here. I think it's more important to make it clear what this parameters means(We don't know the meaning and the unit of this variable by just the name and comments here), and also float number has EPS and is more imprecise.
What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, precision is probably very subjective since examples of normal use cases are limited to a single decimal place (and this is how it is formatted here before passing the config settings to grpc library), which seems logical given that measuring backoff in tens of milliseconds or lower is probably a very niche requirement.
I think there is some truth in that chrono duration makes the type more descriptive. Part of the reasoning I went back to float was because I could not find a common place where I could alias this to a more descriptive name without having to repeat it in at least one more header file (for instance, otlp_environment.h and http_client.h).
For now, I will revert/update this in #3223 until it is approved/merged to avoid duplicating all these work in progress changes for common code bits...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks.
More comments to follow.
64b875c
to
0a8bc82
Compare
6a92aa0
to
7f3d420
Compare
7f3d420
to
8677f89
Compare
Please merge from main to resolve conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the feature.
Looking to merge first #3248, so that maintainers test do cover GRPC in functional tests. This is to ensure that changes from this PR do work properly in functional tests. |
[EXPORTER] Support handling retry-able errors for OTLP/gRPC (open-telemetry#3219)
Fixes #2049
Changes
This change introduces a retry policy in OTLP/gRPC exporter for select failures via the gRPC service config mechanism
The changes to support retries for OTLP/HTTP exporter are addressed in #3223
For significant contributions please make sure you have completed the following items:
CHANGELOG.md
updated for non-trivial changes