Skip to content

Add feedback on Parse JSON processor #9917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 31 additions & 29 deletions _data-prepper/pipelines/configuration/processors/parse-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@ nav_order: 80

# parse_json

The `parse_json` processor parses JSON data for an event, including any nested fields. The processor extracts the JSON pointer data and adds the input event to the extracted fields.

The `parse_json` processor parses JSON-formatted strings within an event, including nested fields. It can optionally use a JSON pointer to extract a specific part of the source JSON and add the extracted data to the event.

## Configuration

Expand All @@ -24,65 +23,68 @@ This table is autogenerated. Do not edit it.

| Option | Required | Type | Description |
| :--- | :--- | :--- | :--- |
| `source` | No | String | The field in the `event` that will be parsed. Default value is `message`. |
| `destination` | No | String | The destination field of the parsed JSON. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only `string` because these are not valid `event` fields. |
| `pointer` | No | String | A JSON pointer to the field to be parsed. There is no `pointer` by default, meaning the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid then the entire `source` data is parsed into the outgoing `event`. If the key that is pointed to already exists in the `event` and the `destination` is the root, then the pointer uses the entire path of the key. |
| `parse_when` | No | String | Specifies under which conditions the processor should perform parsing. Default is no condition. Accepts an OpenSearch Data Prepper expression string following the [expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). |
| `overwrite_if_destination_exists` | No | Boolean | Overwrites the destination if set to `true`. Set to `false` to prevent changing a destination value that exists. Defaults to `true`. |
| `delete_source` | No | Boolean | If set to `true` then this will delete the source field. Defaults to `false`. |
| `tags_on_failure` | No | String | A list of strings specifying the tags to be set in the event that the processor fails or an unknown exception occurs during parsing.
| `source` | No | String | The field in the event that will be parsed. Default is `message`. |
| `destination` | No | String | The destination field for the parsed JSON. Defaults to the root of the event. Cannot be `""`, `/`, or any white-space-only string. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destination field for the parsed JSON.

Maybe it would be clearer to say:

The destination field for fields parsed from the JSON.

| `pointer` | No | String | A JSON pointer (as defined by [RFC 6901](https://datatracker.ietf.org/doc/html/rfc6901)) to a specific field in the source JSON. If omitted, the entire `source` is parsed. If the pointer is invalid, the full `source` is parsed instead. When writing to the root destination, existing keys will be preserved unless overwritten. |
| `parse_when` | No | String | A condition expression that determines when to parse the field. Accepts a string following the [expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). |
| `overwrite_if_destination_exists` | No | Boolean | Whether to overwrite the destination field if it already exists. Default is `true`. |
| `delete_source` | No | Boolean | Whether to delete the source field after parsing. Default is `false`. |
| `tags_on_failure` | No | String | A list of tags to apply if parsing fails or an unexpected exception occurs. |

## Usage

To get started, create the following `pipeline.yaml` file:
To use the `parse_json` processor, add it to your `pipeline.yaml` configuration file:

```yaml
parse-json-pipeline:
source:
...
....
...
processor:
- parse_json:
```

### Basic example

To test the `parse_json` processor with the previous configuration, run the pipeline and paste the following line into your console, then enter `exit` on a new line:
This example parses a JSON message field and flattens the data into the event.

```
For example, the following JSON message contains a key-value pair:

```json
{"outer_key": {"inner_key": "inner_value"}}
```
{% include copy.html %}

The `parse_json` processor parses the message into the following format:
In example event, the original `message` field remains, and the parsed content is added at the root level. Use the `delete_source` option if you want to remove the original field:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In example event, ...

Should this be?

For the example event above, ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the delete_source option if you want to remove the original field:

The example below includes the original. So putting this sentence about delete_source is a little confusing in the context.

I recommend either of these changes:

  • Drop the sentence about delete_source; or
  • Create a new paragraph below that has this sentence and then show the result (which would not have message).


```
{"message": {"outer_key": {"inner_key": "inner_value"}}", "outer_key":{"inner_key":"inner_value"}}}
```json
{
"message": "{\"outer_key\": {\"inner_key\": \"inner_value\"}}",
"outer_key": {
"inner_key": "inner_value"
}
}
```

### Example with a JSON pointer
### Example using a JSON pointer

You can use a JSON pointer to parse a selection of the JSON data by specifying the `pointer` option in the configuration. To get started, create the following `pipeline.yaml` file:
You can use the `pointer` option to extract a specific nested field from the JSON data.

```yaml
parse-json-pipeline:
source:
...
....
...
processor:
- parse_json:
pointer: "outer_key/inner_key"
pointer: "/outer_key/inner_key"
```

To test the `parse_json` processor with the pointer option, run the pipeline, paste the following line into your console, and then enter `exit` on a new line:
Using the same JSON message as the previous example, only the value at the pointer path `/outer_key/inner_key` is extracted and added to the event. If you set `destination`, the extracted value will be added under that field instead:

```
{"outer_key": {"inner_key": "inner_value"}}
```
{% include copy.html %}

The processor parses the message into the following format:

{
"message": "{\"outer_key\": {\"inner_key\": \"inner_value\"}}",
"inner_key": "inner_value"
}
```
{"message": {"outer_key": {"inner_key": "inner_value"}}", "inner_key": "inner_value"}
```