Metrics Exporter #1301

psilabs-dev · 2025-08-02T06:54:39Z

An implementation of the LANraragi metrics exporter, where library, API and process level metrics are conditionally collected and served through the "/api/metrics" endpoint in the Prometheus exposition format, and with Redis as the shared metrics state. Metrics data is stored in Redis at db4, but we can change to an existing db if we want.

Most of the code written by AI, and then reviewed/rewritten by me. Architectural decisions made by me.

Already tested this on personal prod environments for about a week, will continue doing so 👌

Demo screenshots and pretty pictures

Things you can do in Prometheus/Grafana, with the metrics provided

Things you can do with the metrics provided

Configuring the metrics exporter settings

3rd-party implementations and shared state

There are 2 perl implementations (probably more) of metrics exporter, Net::Prometheus and mojolicious-plugin-prometheus. The main issue with just using them is that we need a shared state. Net::Prometheus is lower-level but no shared state, while mojolicious uses shared state but is higher level with IPC.

On the other hand with LRR, I want to collect all the metrics, and might as well use Redis too since that's what it's good for anyways so I just decided to rawdog it

Opt-in

Metrics collection and endpoint exposure is optional and opt-in (via the enablemetrics setting flag from config), and instructions to enable metrics has been documented.

OS dependent

Each OS (e.g. macos, windows, linux*) needs an implementation of the process-level metrics collection. Currently only Linux process-level metrics are supported, but people are welcome to contribute for other OS's :)

Openmetrics and general spec stuff

This implementation is was written to be compliant to the OpenMetrics 1.0 specification, with minor adjustments to conform to Prometheus server capabilities (turns out that Prometheus didn't actually support the OpenMetrics spec, despite the spec being the first thing that shows up when you do Prom exporter specification research...!). A couple deviations from the OpenMetrics spec:

The metrics endpoint is exposed at "/api/metrics" instead of "/metrics"; this is flexible though we can change it back (overview)
Metrics endpoint is exposed on the same port (3000) as the end user port: this is because having mojolicious support a new port requires adding routing conditions to ensure that the right port serves the right content, which incurs performance costs (security)
Return headers are normal text instead of "application/openmetrics-text". (security)
Server info is type "GAUGE" not "INFO" (info)

Still, OpenMetrics has some good practices, so most of its rules were followed. There's also OpenTelemetry which is another thing, but we're sticking mostly to Prometheus exposition.

Metric collection types

There are 3 broad categories of metrics being collected: API/http, library, and process.

API

API/HTTP refers to metrics collected by a mojolicious worker handling a single HTTP request/endpoint. Metrics collected include duration and bytes sent/received.

The natural way to handle this passive collection is via mojo hooks.

Also, API metrics group requests by endpoint type. I.e., instead of the full "/api/archives/123456..." endpoint, we use the Routing.pm "/api/archives/:id` endpoint. This is to handle cardinality explosion.

Library

Library/stats refers to the stats mentioned in the initial issue: how many archives, how many pages, etc.. These are usually aggregated values by another worker process during a file monitor event, so metrics can get these data for free and we don't need a separate periodic hook. (archive byte size, on the other hand...)

And since prometheus servers periodically scrape metrics from LRR, it doesn't make sense for metric scraping to also trigger expensive calls that drag the whole server down, so it's best for the metrics API to be as lean as possible.

Process (Minion/Shinobu)

These are the CPU/memory/FD/IO metrics that one may find in node exporter. Process metrics collection is a passive "process", so it's done on as a 30s recurring task.

DB Cleanups

There are 3 ways of doing cleanups for metrics: shutdown cleanup, startup cleanup, and TTL (continuous cleanup).
TTL isn't exaclty a valid approach because it violates OpenMetrics specification that metrics should generally exist for the lifetime of the process (and it causes metrics to disappear). That leaves startup and shutdown, but startup is generally more reliable.

openmetrics compliance (possibly breaking)

…metric-names Revert "openmetrics compliance (possibly breaking)"

Derive Prometheus exposition format from OpenMetrics 1.0 specification

Shinobu metrics

psilabs-dev · 2025-08-02T07:11:25Z

I still need to work on cleaning things up (and marinate on Model/Metrics.pm; I'm still contemplating whether to do MetricsFamily refactoring) and reviewing tests, but I have some things we might discuss at this point:

Do we want db4 to be a dedicated metrics store, or put metrics data in an existing db? Either is fine by me
What endpoint do we want to expose this in?
What other metrics (process/API/library) should we collect?
Thoughts/feedback on current direction?

On minion/shinobu, I only chose some basic process metrics to collect (cpu/io/disk), but if there are other process metrics (and there are dozens of them) that people find useful I wouldn't mind adding them. I just don't want to add things for no reason

Also, I've left out the implementation of more specific purpose-based types of metrics (e.g. plugin invokations, archive addition/deletion, archive size calculation, etc.) because these are kind of scattered over the place and it's not obvious which layer the metrics collection should be, if we want them.

psilabs-dev added 30 commits July 22, 2025 01:28

lrr node exporter

17c8d8d

separate concern

39744e4

include server info metrics

6fadde7

fix convention problems

adc4945

fix extract_endpoint and add tests

4697455

organize metrics stats

fdac4b7

styling

ead5558

refactor and remove dead code

0f1a9e9

clean up exports

35dd537

clean up metrics at startup, remove expiry, standardize methods

ea3ad78

openmetrics compliance

ba4a84e

openmetrics compliance 2

eade72f

escaping rules

15e4b21

openmetrics ordering compliance

8c7ea3c

Merge pull request #46 from psilabs-dev/dev-metrics/rename-metric-names

35187e1

openmetrics compliance (possibly breaking)

Revert "openmetrics compliance (possibly breaking)"

f493185

Merge pull request #47 from psilabs-dev/revert-46-dev-metrics/rename-…

4fc6ce7

…metric-names Revert "openmetrics compliance (possibly breaking)"

openmetrics 1.0 compliance

2afb46d

prometheus compatibility

5e4c18c

Merge branch 'dev' into dev-metrics/main

f87fe89

Merge branch 'dev-metrics/main' into dev-metrics/otlp

dcb95b4

metrics documentation

cd33390

Merge branch 'dev-metrics/main' into dev-metrics/otlp

93540a9

Merge pull request #49 from psilabs-dev/dev-metrics/otlp

4d73a7e

Derive Prometheus exposition format from OpenMetrics 1.0 specification

include shinobu metrics

1ef4152

refactor metrics_enabled out

a25a22b

merge metrics families together

ef61804

include disk rw

453389f

remove the indirection

070d34a

Merge pull request #51 from psilabs-dev/dev-metrics/shinobu

cea2a27

Shinobu metrics

Merge branch 'dev' into dev-metrics/main

cee2d97

psilabs-dev changed the title ~~Dev metrics/main~~ Metrics Exporter Aug 2, 2025

psilabs-dev added 5 commits August 18, 2025 21:30

update locales

a14e34b

Merge branch 'dev' into dev-metrics/main

34b2e7f

document exporter applying high pressure to redis

a160873

Merge branch 'dev' into dev-metrics/main

90881f6

Merge branch 'dev' into dev-metrics/main

7d453ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metrics Exporter #1301

Metrics Exporter #1301

Uh oh!

psilabs-dev commented Aug 2, 2025

Uh oh!

psilabs-dev commented Aug 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Metrics Exporter #1301

Are you sure you want to change the base?

Metrics Exporter #1301

Uh oh!

Conversation

psilabs-dev commented Aug 2, 2025

Demo screenshots and pretty pictures

3rd-party implementations and shared state

Opt-in

OS dependent

Openmetrics and general spec stuff

Metric collection types

API

Library

Process (Minion/Shinobu)

DB Cleanups

Uh oh!

psilabs-dev commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

psilabs-dev commented Aug 2, 2025 •

edited

Loading