Skip to content

adding regex conditional ingest pipeline docs #9726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 180 additions & 0 deletions _ingest-pipelines/regex-conditional.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
---
layout: default
title: Regex conditionals
nav_order: 70
---

# Regex conditionals

Ingest pipelines support conditional logic using regular expressions with the Painless scripting language. This allows fine-grained control over which documents get processed based on the structure and contents of text fields. Regular expressions can be used within the `if` parameter to evaluate string patterns. This is especially useful for matching IP formats, validating email addresses, identifying UUIDs, or processing logs with specific keywords.

## Example: Email domain filtering

The following pipeline uses regex to identify users from `@example.com` email domain and tag those documents accordingly.

```json
PUT _ingest/pipeline/tag_example_com_users
{
"processors": [
{
"set": {
"field": "user_domain",
"value": "example.com",
"if": "ctx.email != null && ctx.email =~ /@example.com$/"
}
}
]
}
```
{% include copy-curl.html %}

You can simulate the pipeline using the following request:

```json
POST _ingest/pipeline/tag_example_com_users/_simulate
{
"docs": [
{ "_source": { "email": "[email protected]" } },
{ "_source": { "email": "[email protected]" } }
]
}
```
{% include copy-curl.html %}

Only the first document has `user_domain` added:

```json
{
"docs": [
{
"doc": {
"_source": {
"email": "[email protected]",
"user_domain": "example.com"
}
}
},
{
"doc": {
"_source": {
"email": "[email protected]"
}
}
}
]
}
```

## Example: Detect IPv6 addresses

The following pipeline uses regex to identify and flag IPv6-formatted addresses:

```json
PUT _ingest/pipeline/ipv6_flagger
{
"processors": [
{
"set": {
"field": "ip_type",
"value": "IPv6",
"if": "ctx.ip != null && ctx.ip =~ /^[a-fA-F0-9:]+$/ && ctx.ip.contains(':')"
}
}
]
}
```
{% include copy-curl.html %}

Simulate the pipeline using the following request:

```json
POST _ingest/pipeline/ipv6_flagger/_simulate
{
"docs": [
{ "_source": { "ip": "2001:0db8:85a3:0000:0000:8a2e:0370:7334" } },
{ "_source": { "ip": "192.168.0.1" } }
]
}
```
{% include copy-curl.html %}

The first document has added field `ip_type` set to `IPv6`:

```json
{
"docs": [
{
"doc": {
"_source": {
"ip": "2001:0db8:85a3:0000:0000:8a2e:0370:7334",
"ip_type": "IPv6"
}
}
},
{
"doc": {
"_source": {
"ip": "192.168.0.1"
}
}
}
]
}
```

## Example: Validate UUID strings

The following pipeline uses a regular expression to verify if a `session_id` field is a valid UUID:

```json
PUT _ingest/pipeline/uuid_checker
{
"processors": [
{
"set": {
"field": "valid_uuid",
"value": true,
"if": "ctx.session_id != null && ctx.session_id =~ /^[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89ab][a-f0-9]{3}-[a-f0-9]{12}$/"
}
}
]
}
```
{% include copy-curl.html %}

Use the following request to simulate the pipeline:

```json
POST _ingest/pipeline/uuid_checker/_simulate
{
"docs": [
{ "_source": { "session_id": "550e8400-e29b-41d4-a716-446655440000" } },
{ "_source": { "session_id": "invalid-uuid-1234" } }
]
}
```
{% include copy-curl.html %}

The first document is tagged with new field `valid_uuid`:

```json
{
"docs": [
{
"doc": {
"_source": {
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"valid_uuid": true
}
}
},
{
"doc": {
"_source": {
"session_id": "invalid-uuid-1234"
}
}
}
]
}
```