From 0f54745451b769ac96ae6199754c34c2a92c32a3 Mon Sep 17 00:00:00 2001 From: Alan West <3676547+alanwest@users.noreply.github.com> Date: Wed, 9 Aug 2023 17:41:26 -0700 Subject: [PATCH 1/6] Define transient error in OTLP exporter specification --- specification/protocol/exporter.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/specification/protocol/exporter.md b/specification/protocol/exporter.md index c9d7a8cd239..57e2de1d758 100644 --- a/specification/protocol/exporter.md +++ b/specification/protocol/exporter.md @@ -162,7 +162,17 @@ The `OTEL_EXPORTER_OTLP_HEADERS`, `OTEL_EXPORTER_OTLP_TRACES_HEADERS`, `OTEL_EXP Transient errors MUST be handled with a retry strategy. This retry strategy MUST implement an exponential back-off with jitter to avoid overwhelming the destination until the network is restored or the destination has recovered. -For OTLP/HTTP, the errors `408 (Request Timeout)` and `5xx (Server Errors)` are defined as transient, detailed information about errors can be found in the [HTTP failures section](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures). For the OTLP/gRPC, the full list of the gRPC retryable status codes can be found in the [gRPC response section](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlpgrpc-response). +Transient errors are defined by a set of retryable response status codes +received from the server. Refer to the +[protocol specification][protocol-spec] for the set of retryable status codes: + +* [OTLP/HTTP retryable status codes][retryable-http-status-codes] +* [OTLP/gRPC retryable status codes][retryable-grpc-status-codes] + +For OTLP/HTTP, the following scenarios are also considered a transient error: + +* The server disconnects without returning a response. +* The exporter cannot connect to the server. ## User Agent @@ -177,3 +187,6 @@ The format of the header SHOULD follow [RFC 7231][rfc-7231]. The conventions use [resource-semconv]: ../resource/semantic_conventions/README.md#telemetry-sdk [otlphttp-req]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-request [rfc-7231]: https://datatracker.ietf.org/doc/html/rfc7231#section-5.5.3 +[protocol-spec]: (https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md) +[retryable-grpc-status-codes]: (https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures) +[retryable-http-status-codes]: (https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures-1) From 20ab240a4272f1c725ed4e5d4a455224d1fff62f Mon Sep 17 00:00:00 2001 From: Alan West <3676547+alanwest@users.noreply.github.com> Date: Wed, 9 Aug 2023 17:57:54 -0700 Subject: [PATCH 2/6] Update changelog --- CHANGELOG.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index c38ae70674d..e4e903e8230 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -33,6 +33,11 @@ release. ### Resource +### Protocol + +- Fix and clarify definition of "transient error" in the OTLP exporter specification. + ([#3653](https://github.com/open-telemetry/opentelemetry-specification/pull/3653)) + ### Compatibility - Prometheus exporters SHOULD provide configuration to disable the addition of `_total` suffixes. From b632fee4a5ad40708574334727ad2721f88c2e53 Mon Sep 17 00:00:00 2001 From: Alan West <3676547+alanwest@users.noreply.github.com> Date: Wed, 9 Aug 2023 18:02:21 -0700 Subject: [PATCH 3/6] Fix links --- specification/protocol/exporter.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specification/protocol/exporter.md b/specification/protocol/exporter.md index 57e2de1d758..3842a75745a 100644 --- a/specification/protocol/exporter.md +++ b/specification/protocol/exporter.md @@ -187,6 +187,6 @@ The format of the header SHOULD follow [RFC 7231][rfc-7231]. The conventions use [resource-semconv]: ../resource/semantic_conventions/README.md#telemetry-sdk [otlphttp-req]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-request [rfc-7231]: https://datatracker.ietf.org/doc/html/rfc7231#section-5.5.3 -[protocol-spec]: (https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md) -[retryable-grpc-status-codes]: (https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures) -[retryable-http-status-codes]: (https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures-1) +[protocol-spec]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md +[retryable-grpc-status-codes]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures +[retryable-http-status-codes]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures-1 From aaa46e56467293bafcc560686c329b05eb73bd24 Mon Sep 17 00:00:00 2001 From: Alan West <3676547+alanwest@users.noreply.github.com> Date: Mon, 21 Aug 2023 11:32:27 -0700 Subject: [PATCH 4/6] PR feedback --- specification/protocol/exporter.md | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/specification/protocol/exporter.md b/specification/protocol/exporter.md index 3842a75745a..3d8f881b8f7 100644 --- a/specification/protocol/exporter.md +++ b/specification/protocol/exporter.md @@ -162,17 +162,22 @@ The `OTEL_EXPORTER_OTLP_HEADERS`, `OTEL_EXPORTER_OTLP_TRACES_HEADERS`, `OTEL_EXP Transient errors MUST be handled with a retry strategy. This retry strategy MUST implement an exponential back-off with jitter to avoid overwhelming the destination until the network is restored or the destination has recovered. -Transient errors are defined by a set of retryable response status codes -received from the server. Refer to the -[protocol specification][protocol-spec] for the set of retryable status codes: +### Transient errors -* [OTLP/HTTP retryable status codes][retryable-http-status-codes] -* [OTLP/gRPC retryable status codes][retryable-grpc-status-codes] +Transient errors are defined by the +[OTLP protocol specification][protocol-spec]. -For OTLP/HTTP, the following scenarios are also considered a transient error: +For [OTLP/gRPC](otlpgrpc), transient errors are defined by a set of +[retryable gRPC status codes][retryable-grpc-status-codes]. -* The server disconnects without returning a response. -* The exporter cannot connect to the server. +For [OTLP/HTTP](otlphttp), transient errors are defined by: + +1. A set of [retryable HTTP status codes][retryable-http-status-codes] received + from the server. +2. The scenarios described in: + [All other responses](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#all-other-responses) + and + [OTLP/HTTP Connection](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-connection). ## User Agent @@ -188,5 +193,7 @@ The format of the header SHOULD follow [RFC 7231][rfc-7231]. The conventions use [otlphttp-req]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-request [rfc-7231]: https://datatracker.ietf.org/doc/html/rfc7231#section-5.5.3 [protocol-spec]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md +[oltpgrpc]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlpgrpc +[otlphttp]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp [retryable-grpc-status-codes]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures [retryable-http-status-codes]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures-1 From 4a3e7a6e7f58f6b9b898a46e0a515f20d38b749c Mon Sep 17 00:00:00 2001 From: Alan West <3676547+alanwest@users.noreply.github.com> Date: Mon, 21 Aug 2023 13:35:15 -0700 Subject: [PATCH 5/6] Fix links --- specification/protocol/exporter.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/specification/protocol/exporter.md b/specification/protocol/exporter.md index 3d8f881b8f7..e63e7a64eef 100644 --- a/specification/protocol/exporter.md +++ b/specification/protocol/exporter.md @@ -167,10 +167,10 @@ Transient errors MUST be handled with a retry strategy. This retry strategy MUST Transient errors are defined by the [OTLP protocol specification][protocol-spec]. -For [OTLP/gRPC](otlpgrpc), transient errors are defined by a set of +For [OTLP/gRPC](otlp-grpc), transient errors are defined by a set of [retryable gRPC status codes][retryable-grpc-status-codes]. -For [OTLP/HTTP](otlphttp), transient errors are defined by: +For [OTLP/HTTP](otlp-http), transient errors are defined by: 1. A set of [retryable HTTP status codes][retryable-http-status-codes] received from the server. @@ -193,7 +193,7 @@ The format of the header SHOULD follow [RFC 7231][rfc-7231]. The conventions use [otlphttp-req]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-request [rfc-7231]: https://datatracker.ietf.org/doc/html/rfc7231#section-5.5.3 [protocol-spec]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md -[oltpgrpc]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlpgrpc -[otlphttp]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp +[otlp-grpc]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlpgrpc +[otlp-http]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp [retryable-grpc-status-codes]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures [retryable-http-status-codes]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures-1 From 567d7412f526a09bd699d26efedc81e6969e71e2 Mon Sep 17 00:00:00 2001 From: Alan West <3676547+alanwest@users.noreply.github.com> Date: Wed, 6 Sep 2023 17:23:58 -0700 Subject: [PATCH 6/6] Fix links --- specification/protocol/exporter.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specification/protocol/exporter.md b/specification/protocol/exporter.md index e63e7a64eef..30d601ce65d 100644 --- a/specification/protocol/exporter.md +++ b/specification/protocol/exporter.md @@ -167,10 +167,10 @@ Transient errors MUST be handled with a retry strategy. This retry strategy MUST Transient errors are defined by the [OTLP protocol specification][protocol-spec]. -For [OTLP/gRPC](otlp-grpc), transient errors are defined by a set of +For [OTLP/gRPC][otlp-grpc], transient errors are defined by a set of [retryable gRPC status codes][retryable-grpc-status-codes]. -For [OTLP/HTTP](otlp-http), transient errors are defined by: +For [OTLP/HTTP][otlp-http], transient errors are defined by: 1. A set of [retryable HTTP status codes][retryable-http-status-codes] received from the server.