-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve retryability of DLQ data #4295
Comments
@travisbenedict , Can you share an example of the DLQ S3 with multiple DLQ objects? |
I do see such an example in my DLQ bucket that has multiple objects: {
"dlqObjects": [
{
"pluginId": "opensearch",
"pluginName": "opensearch",
"pipelineName": "log-pipeline",
"failedData": {
"index": "raw-logs-alias",
"indexId": null,
"status": 400,
"message": "no write index is defined for alias [raw-logs-alias]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index",
"document": {
"http_user_agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1"
}
},
"timestamp": "2024-01-10T17:57:19.983Z"
},
{
"pluginId": "opensearch",
"pluginName": "opensearch",
"pipelineName": "log-pipeline",
"failedData": {
"index": "raw-logs-alias",
"indexId": null,
"status": 400,
"message": "no write index is defined for alias [raw-logs-alias]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index",
"document": {
"http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 12.6; rv:42.0) Gecko/20100101 Firefox/42.0"
}
},
"timestamp": "2024-01-10T17:57:19.984Z"
},
{
"pluginId": "opensearch",
"pluginName": "opensearch",
"pipelineName": "log-pipeline",
"failedData": {
"index": "raw-logs-alias",
"indexId": null,
"status": 400,
"message": "no write index is defined for alias [raw-logs-alias]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index",
"document": {
"http_user_agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
}
},
"timestamp": "2024-01-10T17:57:19.984Z"
}
]
} |
@travisbenedict , @oeyh , The existing The following example should work.
|
I just tested this using a very simple pipeline on the latest
Docker compose to help get started:
I see the following logs (note that
|
Thanks @dlvenable let me try this out. I think there might be some room to improve the documentation for the JSON codec. This is all it says now
https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/#json-codec |
@travisbenedict , Yes, one thought I have is to split out the codec documentation and then we can put some examples in there as well. |
Confirmed that S3 scan with the |
Posting my full working DLQ redriving pipeline configuration here in case anyone else comes across this issue
I'll create a documentation PR if I get some time |
Hi! I am currently configuring a DLQ pipeline to redrive failed events from each of my pipelines (which each use a different index template since theyre processing from different dynamo tables). With the final pipeline config you've posted, would I need to define the template again under > sink, and have 2 separate DLQ pipelines, 1 for each of the indexes? Or would it be enough to just have index set to the already-existing index name-- would the template get linked that way? Thank you! |
Is your feature request related to a problem? Please describe.
I have an OpenSearch cluster that intermittently has specific indexes write blocked due to miscellaneous failures. To maintain the ingestion throughput for the non blocked indexes during this time I set the
opensearch.max_retries
to a low value. This way data for the blocked index gets sent to the DLQ quickly and the rest of the data continues being written to my sink.After I resolve the index write blocks I want to redrive the data from my DLQ using a separate instance of Data Prepper with an S3 Scan source. As far as I can tell the existing codecs/processors for Data Prepper do not allow for directly processing the JSON object that's written to the DLQ and splitting the
dlqObjects
array into multiple documents.Here's an example of the data that was written to my DLQ bucket
Describe the solution you'd like
I would like to be able to enable my source codec to parse the elements out of the
dlqObjects
array.The S3 json codec configuration could add something like an
array_source
key. The JSON array at thearray_source
would then be processed by the codec and the rest of the data in the original JSON object would be ignored/dropped.An S3 scan pipeline for redriving DLQ data might look something like this:
Describe alternatives you've considered (Optional)
Add a new configuration for specifying the format of the data that's written to the DLQ. This might allow Data Prepper to write the DLQ as a JSON array rather than a JSON object.
This might look some thing like this:
or
The data written to the DLQ would then be the same as the
dlqObjects
JSON array.In my case it might look like this:
Additional context
None
The text was updated successfully, but these errors were encountered: