Skip to content

Commit

Permalink
Define transient error in OTLP exporter specification (#3653)
Browse files Browse the repository at this point in the history
Fixes #3652

This PR attempts to clearly define `transient error` in the OTLP
exporter specification.

It also fixes a bug in that the set of retryable status codes in the
OTLP exporter spec vs. the protocol spec are not aligned.
  • Loading branch information
alanwest authored Sep 7, 2023
1 parent 5d2958e commit e5b3109
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 1 deletion.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ release.

### Resource

### Protocol

- Fix and clarify definition of "transient error" in the OTLP exporter specification.
([#3653](https://github.com/open-telemetry/opentelemetry-specification/pull/3653))

### Compatibility

- OpenTracing Shim: Allow invalid but sampled SpanContext to be returned.
Expand Down
22 changes: 21 additions & 1 deletion specification/protocol/exporter.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,22 @@ The `OTEL_EXPORTER_OTLP_HEADERS`, `OTEL_EXPORTER_OTLP_TRACES_HEADERS`, `OTEL_EXP

Transient errors MUST be handled with a retry strategy. This retry strategy MUST implement an exponential back-off with jitter to avoid overwhelming the destination until the network is restored or the destination has recovered.

For OTLP/HTTP, the errors `408 (Request Timeout)` and `5xx (Server Errors)` are defined as transient, detailed information about errors can be found in the [HTTP failures section](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures). For the OTLP/gRPC, the full list of the gRPC retryable status codes can be found in the [gRPC response section](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlpgrpc-response).
### Transient errors

Transient errors are defined by the
[OTLP protocol specification][protocol-spec].

For [OTLP/gRPC][otlp-grpc], transient errors are defined by a set of
[retryable gRPC status codes][retryable-grpc-status-codes].

For [OTLP/HTTP][otlp-http], transient errors are defined by:

1. A set of [retryable HTTP status codes][retryable-http-status-codes] received
from the server.
2. The scenarios described in:
[All other responses](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#all-other-responses)
and
[OTLP/HTTP Connection](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-connection).

## User Agent

Expand All @@ -177,3 +192,8 @@ The format of the header SHOULD follow [RFC 7231][rfc-7231]. The conventions use
[resource-semconv]: ../resource/semantic_conventions/README.md#telemetry-sdk
[otlphttp-req]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-request
[rfc-7231]: https://datatracker.ietf.org/doc/html/rfc7231#section-5.5.3
[protocol-spec]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md
[otlp-grpc]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlpgrpc
[otlp-http]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp
[retryable-grpc-status-codes]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures
[retryable-http-status-codes]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures-1

0 comments on commit e5b3109

Please sign in to comment.