From 0f78e706493e39d6ab4556866da2a4362f796119 Mon Sep 17 00:00:00 2001 From: Archer Date: Fri, 16 May 2025 16:55:35 -0500 Subject: [PATCH 1/4] Add feedback on Parse JSON processor Signed-off-by: Archer --- .../configuration/processors/parse-json.md | 58 ++++++++++--------- 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/parse-json.md b/_data-prepper/pipelines/configuration/processors/parse-json.md index 2b654451f2..ed09b5fe09 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-json.md +++ b/_data-prepper/pipelines/configuration/processors/parse-json.md @@ -8,8 +8,7 @@ nav_order: 80 # parse_json -The `parse_json` processor parses JSON data for an event, including any nested fields. The processor extracts the JSON pointer data and adds the input event to the extracted fields. - +The `parse_json` processor parses JSON-formatted strings within an event, including nested fields. It can optionally use a JSON pointer to extract a specific part of the source JSON and add the extracted data to the event. ## Configuration @@ -24,65 +23,70 @@ This table is autogenerated. Do not edit it. | Option | Required | Type | Description | | :--- | :--- | :--- | :--- | -| `source` | No | String | The field in the `event` that will be parsed. Default value is `message`. | -| `destination` | No | String | The destination field of the parsed JSON. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only `string` because these are not valid `event` fields. | -| `pointer` | No | String | A JSON pointer to the field to be parsed. There is no `pointer` by default, meaning the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid then the entire `source` data is parsed into the outgoing `event`. If the key that is pointed to already exists in the `event` and the `destination` is the root, then the pointer uses the entire path of the key. | -| `parse_when` | No | String | Specifies under which conditions the processor should perform parsing. Default is no condition. Accepts an OpenSearch Data Prepper expression string following the [expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | -| `overwrite_if_destination_exists` | No | Boolean | Overwrites the destination if set to `true`. Set to `false` to prevent changing a destination value that exists. Defaults to `true`. | -| `delete_source` | No | Boolean | If set to `true` then this will delete the source field. Defaults to `false`. | -| `tags_on_failure` | No | String | A list of strings specifying the tags to be set in the event that the processor fails or an unknown exception occurs during parsing. +| `source` | No | String | The field in the event that will be parsed. Default is `message`. | +| `destination` | No | String | The destination field for the parsed JSON. Defaults to the root of the event. Cannot be `""`, `/`, or any white-space-only string. | +| `pointer` | No | String | A JSON pointer (as defined by [RFC 6901](https://datatracker.ietf.org/doc/html/rfc6901)) to a specific field in the source JSON. If omitted, the entire `source` is parsed. If the pointer is invalid, the full `source` is parsed instead. When writing to the root destination, existing keys will be preserved unless overwritten. | +| `parse_when` | No | String | A condition expression that determines when to parse the field. Accepts a string following the [expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | +| `overwrite_if_destination_exists` | No | Boolean | Whether to overwrite the destination field if it already exists. Default is `true`. | +| `delete_source` | No | Boolean | Whether to delete the source field after parsing. Default is `false`. | +| `tags_on_failure` | No | String | A list of tags to apply if parsing fails or an unexpected exception occurs. | ## Usage -To get started, create the following `pipeline.yaml` file: +To use the `parse_json` processor, add it to your `pipeline.yaml` configuration file: ```yaml parse-json-pipeline: source: ... - .... + ... processor: - parse_json: ``` ### Basic example -To test the `parse_json` processor with the previous configuration, run the pipeline and paste the following line into your console, then enter `exit` on a new line: +This example parses a JSON message field and flattens the data into the event. -``` +For example, the following JSON message contains a key-value pair: + +```json {"outer_key": {"inner_key": "inner_value"}} ``` {% include copy.html %} -The `parse_json` processor parses the message into the following format: +In example event, the original `message` field remains, and the parsed content is added at the root level. Use the `delete_source` option if you want to remove the original field: -``` -{"message": {"outer_key": {"inner_key": "inner_value"}}", "outer_key":{"inner_key":"inner_value"}}} +```json +{ + "message": "{\"outer_key\": {\"inner_key\": \"inner_value\"}}", + "outer_key": { + "inner_key": "inner_value" + } +} ``` -### Example with a JSON pointer +### Example using a JSON pointer -You can use a JSON pointer to parse a selection of the JSON data by specifying the `pointer` option in the configuration. To get started, create the following `pipeline.yaml` file: +You can use the `pointer` option to extract a specific nested field from the JSON data. ```yaml parse-json-pipeline: source: ... - .... + ... processor: - parse_json: - pointer: "outer_key/inner_key" + pointer: "/outer_key/inner_key" ``` -To test the `parse_json` processor with the pointer option, run the pipeline, paste the following line into your console, and then enter `exit` on a new line: +Using the same JSON message as the previous example, only the value at the pointer path `/outer_key/inner_key` is extracted and added to the event. If you set `destination`, the extracted value will be added under that field instead: ``` -{"outer_key": {"inner_key": "inner_value"}} +{ + "message": "{\"outer_key\": {\"inner_key\": \"inner_value\"}}", + "inner_key": "inner_value" +} ``` -{% include copy.html %} -The processor parses the message into the following format: -``` -{"message": {"outer_key": {"inner_key": "inner_value"}}", "inner_key": "inner_value"} -``` \ No newline at end of file From ea6235b74916c3f5987f9d381e6938e8ffde7613 Mon Sep 17 00:00:00 2001 From: Archer Date: Mon, 19 May 2025 17:41:49 -0500 Subject: [PATCH 2/4] Small tweaks Signed-off-by: Archer --- _data-prepper/pipelines/configuration/processors/parse-json.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/parse-json.md b/_data-prepper/pipelines/configuration/processors/parse-json.md index ed09b5fe09..6604f86c69 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-json.md +++ b/_data-prepper/pipelines/configuration/processors/parse-json.md @@ -88,5 +88,3 @@ Using the same JSON message as the previous example, only the value at the point "inner_key": "inner_value" } ``` - - From dec99924ca26e7c8c0c570e01133b5bfd7f1bacd Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 20 May 2025 06:24:15 -0500 Subject: [PATCH 3/4] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../pipelines/configuration/processors/parse-json.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/parse-json.md b/_data-prepper/pipelines/configuration/processors/parse-json.md index 6604f86c69..285c18d667 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-json.md +++ b/_data-prepper/pipelines/configuration/processors/parse-json.md @@ -24,7 +24,7 @@ This table is autogenerated. Do not edit it. | Option | Required | Type | Description | | :--- | :--- | :--- | :--- | | `source` | No | String | The field in the event that will be parsed. Default is `message`. | -| `destination` | No | String | The destination field for the parsed JSON. Defaults to the root of the event. Cannot be `""`, `/`, or any white-space-only string. | +| `destination` | No | String | The destination field for the parsed JSON. Default is the root of the event. Cannot be `""`, `/`, or any white-space-only string. | | `pointer` | No | String | A JSON pointer (as defined by [RFC 6901](https://datatracker.ietf.org/doc/html/rfc6901)) to a specific field in the source JSON. If omitted, the entire `source` is parsed. If the pointer is invalid, the full `source` is parsed instead. When writing to the root destination, existing keys will be preserved unless overwritten. | | `parse_when` | No | String | A condition expression that determines when to parse the field. Accepts a string following the [expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | | `overwrite_if_destination_exists` | No | Boolean | Whether to overwrite the destination field if it already exists. Default is `true`. | @@ -55,7 +55,7 @@ For example, the following JSON message contains a key-value pair: ``` {% include copy.html %} -In example event, the original `message` field remains, and the parsed content is added at the root level. Use the `delete_source` option if you want to remove the original field: +From the example event, the original `message` field remains, and the parsed content is added at the root level. ```json { From ca78fc2e37c3beb4b36650352e0a101feab7f115 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 20 May 2025 06:34:58 -0500 Subject: [PATCH 4/4] Add technical feedback Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../configuration/processors/parse-json.md | 44 +++++++++++++++---- 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/parse-json.md b/_data-prepper/pipelines/configuration/processors/parse-json.md index 285c18d667..8d7bc02fc2 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-json.md +++ b/_data-prepper/pipelines/configuration/processors/parse-json.md @@ -43,19 +43,18 @@ parse-json-pipeline: processor: - parse_json: ``` +{% include copy.html %} -### Basic example - -This example parses a JSON message field and flattens the data into the event. - -For example, the following JSON message contains a key-value pair: +All examples use the following JSON message for the event output: ```json {"outer_key": {"inner_key": "inner_value"}} ``` {% include copy.html %} -From the example event, the original `message` field remains, and the parsed content is added at the root level. +### Basic example + +This example parses a JSON message field and flattens the data into the event. From the example event, the original `message` field remains, and the parsed content is added at the root level, as shown in the following output: ```json { @@ -66,9 +65,35 @@ From the example event, the original `message` field remains, and the parsed con } ``` +### Delete a source + +If you want to remove the original field from the orginating JSON message, use the `delete_source` option, as shown in the following example pipeline: + +```yaml +parse-json-pipeline: + source: + ... + ... + processor: + - parse_json: + delete_source: true +``` +{% include copy.html %} + +In the following event, the `message` field is parsed and removed from the event, leaving only the structured output: + +```json +{ + "outer_key": { + "inner_key": "inner_value" + } +} +``` + + ### Example using a JSON pointer -You can use the `pointer` option to extract a specific nested field from the JSON data. +You can use the `pointer` option to extract a specific nested field from the JSON data, as shown in the following example pipeline: ```yaml parse-json-pipeline: @@ -79,10 +104,11 @@ parse-json-pipeline: - parse_json: pointer: "/outer_key/inner_key" ``` +{% include copy.html %} -Using the same JSON message as the previous example, only the value at the pointer path `/outer_key/inner_key` is extracted and added to the event. If you set `destination`, the extracted value will be added under that field instead: +Only the value at the pointer path `/outer_key/inner_key` is extracted and added to the event. If you set `destination`, the extracted value will be added under that field instead: -``` +```json { "message": "{\"outer_key\": {\"inner_key\": \"inner_value\"}}", "inner_key": "inner_value"