Skip to content

stage.regex Label Extraction from path Does Not Work #5504

@obie1somebody

Description

@obie1somebody

Component(s)

loki.process

What's wrong?

The loki.process component's stage.regex label extraction from the path field does not work in any version of Grafana Alloy. Labels extracted using named capture groups are never promoted to Loki labels, even when using stage.labels with empty string values as documented.

This bug has existed since the first Alloy release (v1.9.0) and persists through the current version (v1.13.0).

Use Case
In our production environment, we monitor Jenkins build logs with paths structured as:

/var/lib/alloy/logs/{job_name}/{build_number}/{platform}/{branch}/{hostname}/log
We need to extract these 5 dimensions as labels for:

Filtering logs by specific builds
Aggregating metrics by platform/branch/hostname/job_name etc

Because of the shear volume of jenkins logs (on multiple jenkins servers), we need to parse out only the errors to send to the grafana/loki cloud, or we will overwhelm its capacity.

Steps to reproduce

  1. Create Test Directory Structure
    mkdir -p /tmp/alloy_test/logs/testjob/777/ubuntu/trunk/myhost
    echo -e "line one\nline two\nline three" > /tmp/alloy_test/logs/testjob/777/ubuntu/trunk/myhost/log
  2. Start Loki

loki.yaml

auth_enabled: false
server:
http_listen_port: 3100
common:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /tmp/loki_data
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
loki -config.file=loki.yaml &
3. Create Alloy Configuration

config.alloy

local.file_match "test" {
path_targets = [{path = "/tmp/alloy_test/logs/////*/log", job = "test"}]
}

loki.source.file "test" {
targets = local.file_match.test.targets
forward_to = [loki.process.extract.receiver]
}

loki.process "extract" {
stage.regex {
source = "path"
expression = "/tmp/alloy_test/logs/(?P<job_name>[^/]+)/(?P<build_number>[^/]+)/(?P[^/]+)/(?P[^/]+)/(?P[^/]+)/log"
}

stage.labels {
values = {
job_name = "",
build_number = "",
platform = "",
branch = "",
hostname = "",
}
}

forward_to = [loki.write.local.receiver]
}

loki.write "local" {
endpoint {
url = "http://localhost:3100/loki/api/v1/push"
}
}
4. Run Alloy
alloy run config.alloy --server.http.listen-addr=127.0.0.1:12345 --disable-reporting
5. Query Loki
curl -s -G "http://localhost:3100/loki/api/v1/query_range"
--data-urlencode 'query={job="test"}'
--data-urlencode "start=$(date -d '1 minute ago' +%s)000000000"
--data-urlencode "end=$(date +%s)000000000" | jq '.data.result[0].stream'
6. Observe Results
Expected labels: job, job_name, build_number, platform, branch, hostname

Actual labels: job, filename (only static labels)

System information

Linux U22

Software version

Grafana Alloy Versions Tested: v1.8.1, v1.9.0, v1.9.2, v1.10.2, v1.11.3, v1.12.2, v1.13.0

Configuration

# As seen above in steps to reproduce
# config.alloy
local.file_match "test" {
  path_targets = [{__path__ = "/tmp/alloy_test/logs/*/*/*/*/*/log", job = "test"}]
}

loki.source.file "test" {
  targets    = local.file_match.test.targets
  forward_to = [loki.process.extract.receiver]
}

loki.process "extract" {
  stage.regex {
    source     = "__path__"
    expression = "/tmp/alloy_test/logs/(?P<job_name>[^/]+)/(?P<build_number>[^/]+)/(?P<platform>[^/]+)/(?P<branch>[^/]+)/(?P<hostname>[^/]+)/log"
  }

  stage.labels {
    values = {
      job_name     = "",
      build_number = "",
      platform     = "",
      branch       = "",
      hostname     = "",
    }
  }

  forward_to = [loki.write.local.receiver]
}

loki.write "local" {
  endpoint {
    url = "http://localhost:3100/loki/api/v1/push"
  }
}

Logs

# Alloy Logs
No errors or warnings are logged. The loki.process.extract component evaluates successfully:

ts=2026-02-11T01:25:40.796571417Z level=info msg="finished node evaluation" node_id=loki.process.extract duration=326.9µs

Loki Query Response
{
  "status": "success",
  "data": {
    "resultType": "streams",
    "result": [
      {
        "stream": {
          "filename": "/tmp/alloy_test/logs/testjob/777/ubuntu/trunk/myhost/log",
          "job": "test"
        },
        "values": [
          ["1770773011469653698", "line three"],
          ["1770773011469611617", "line two"],
          ["1770773011469577464", "line one"]
        ]
      }
    ]
  }
}
Note: Only job (from path_targets) and filename (automatically added) appear. The 5 regex-extracted labels are missing.

Tip

React with 👍 if this issue is important to you.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions