-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for data normalization in arrays #4291
Comments
A few clarification questions:
[{"category": "a"}, {"category": "b"}, {"category": "c"}]
Curious to know how you use Data Prepper in this case? |
We do support converting from: [{"category": "a"}, {"category": "b"}, {"category": "c"}] to {"category": ["a", "b", "c"]} with We currently don't support the opposite. But if that is what you are looking for, we can probably enhance |
Yes that's correct |
Yes that's exactly what I am looking for. This use case is more for streaming NoSQL arrays into row based SQL databases for data warehousing. In the end, once we got
we will be likely be consuming them as separate rows in a SQL database i.e,
Is there a way in data prepper to perform the |
Thanks for clarifying! This make me think maybe what you need is to split the original event with data processor:
- add_entries:
entries:
- key: category
value_expression: join(/category)
overwrite_if_key_exists: true
- split_event:
field: category
delimiter: ","
Still wondering how you plan to use Data Prepper for it. What's the pipeline source and sink here? |
Source will be DynamoDB, sink will be S3 bucket for datalake. What is the output of your proposed processor in the json format? 3 rows in S3 file eventually like the following?
or a single row
|
Both are possible. That will depend on the codec used on the s3 sink ( It's probably clearer if I include another field in the event, for example, assume the input event has this data: {"category": ["a", "b", "c"], "itemId": "A"} With the potential {"category_list": [{"category": "a"}, {"category": "b"}, {"category": "c"}], "itemId": "A"} With the {"category": "a", "itemId": "A"}
{"category": "b", "itemId": "A"}
{"category": "c", "itemId": "A"} The latter seems to make more sense in your case. What do you think? |
Okay, split_event seems more applicable to my use case, in your example Will |
In that form it's ndjson. 2.7 release will be coming in 1-2 weeks. The |
Cool thanks, will give |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. It would be nice to have [...]
Hi, I have a question about ingestion pipeline: is there a standard way to process arrays of data in ingestion processor that is recommended by data prepper? for example, if I have a {‘category’: [‘a’, ‘b’, ‘c’]}, how can I break elements in the array down?
Describe the solution you'd like
A clear and concise description of what you want to happen.
in the example {‘category’: [‘a’, ‘b’, ‘c’]}, ideally it can be broken down like {‘category’: ‘a’,‘category’: ‘b’,‘category’: ‘c’ }?
Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
common use case is the transition from NoSql database arrays to Sql database to support table joins in data warehousing.
The text was updated successfully, but these errors were encountered: