You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
For OpenTelemetry spans Data Prepper is using the span id as document id in OpenSearch. The OpenTelemetry span id is supposed to be an 8 byte array with at least one non-zero value. Data Prepper encodes the array in hex and uses the result as the document id for indexing in OpenSearch. This can create collisions between different traces when span have the same span id.
To Reproduce
Run one of the tracing examples and ingest some spans. Query the span data from OpenSearch and compare the fields _id and spanId.
Expected behavior
The document id should uniquely determine a span without a collision across different traces. The used document id should either be random or respect both traceId and spanId of the corresponding span.
Screenshots
Environment (please complete the following information):
Docker setup from examples folder
Version 2.10
Additional context
There is a work-around by specifying a document_id in the OpenSearch sink, e.g. traceId-spanId. I did not find out, where the current behavior using just spanId is encoded.
The text was updated successfully, but these errors were encountered:
I tested the pattern traceId-spanId successfully. Since both ids are arbitrary byte arrays, any shorter combination of the two will lose entropy. There is the possibility to use encide_base64(traceId + spanId). The concatenation should be a 24 byte array. Base64 encoding would result in 32 characters as document_id.
I think, the limit for the document id is 512 bytes. Still, keeping it as short as possible will also lead to smaller index sizes. The longer pattern traceId-spanId (or maybe traceId/spanId) takes 49 characters with hex encoding. This is not too much of an overhead. I would be fine with both approaches.
Describe the bug
For OpenTelemetry spans Data Prepper is using the span id as document id in OpenSearch. The OpenTelemetry span id is supposed to be an 8 byte array with at least one non-zero value. Data Prepper encodes the array in hex and uses the result as the document id for indexing in OpenSearch. This can create collisions between different traces when span have the same span id.
To Reproduce
Run one of the tracing examples and ingest some spans. Query the span data from OpenSearch and compare the fields
_id
andspanId
.Expected behavior
The document id should uniquely determine a span without a collision across different traces. The used document id should either be random or respect both
traceId
andspanId
of the corresponding span.Screenshots
Environment (please complete the following information):
Additional context
There is a work-around by specifying a
document_id
in the OpenSearch sink, e.g.traceId-spanId
. I did not find out, where the current behavior using justspanId
is encoded.The text was updated successfully, but these errors were encountered: