-
-
Notifications
You must be signed in to change notification settings - Fork 195
Metrics Exporter #1301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Metrics Exporter #1301
Conversation
openmetrics compliance (possibly breaking)
…metric-names Revert "openmetrics compliance (possibly breaking)"
Derive Prometheus exposition format from OpenMetrics 1.0 specification
Shinobu metrics
|
I still need to work on cleaning things up (and marinate on Model/Metrics.pm; I'm still contemplating whether to do MetricsFamily refactoring) and reviewing tests, but I have some things we might discuss at this point:
On minion/shinobu, I only chose some basic process metrics to collect (cpu/io/disk), but if there are other process metrics (and there are dozens of them) that people find useful I wouldn't mind adding them. I just don't want to add things for no reason Also, I've left out the implementation of more specific purpose-based types of metrics (e.g. plugin invokations, archive addition/deletion, archive size calculation, etc.) because these are kind of scattered over the place and it's not obvious which layer the metrics collection should be, if we want them. |
Issue: #1080
An implementation of the LANraragi metrics exporter, where library, API and process level metrics are conditionally collected and served through the "/api/metrics" endpoint in the Prometheus exposition format, and with Redis as the shared metrics state. Metrics data is stored in Redis at db4, but we can change to an existing db if we want.
Most of the code written by AI, and then reviewed/rewritten by me. Architectural decisions made by me.
Already tested this on personal prod environments for about a week, will continue doing so 👌
Demo screenshots and pretty pictures
Things you can do in Prometheus/Grafana, with the metrics provided
Configuring the metrics exporter settings
3rd-party implementations and shared state
There are 2 perl implementations (probably more) of metrics exporter, Net::Prometheus and mojolicious-plugin-prometheus. The main issue with just using them is that we need a shared state. Net::Prometheus is lower-level but no shared state, while mojolicious uses shared state but is higher level with IPC.
On the other hand with LRR, I want to collect all the metrics, and might as well use Redis too since that's what it's good for anyways so I just decided to rawdog it
Opt-in
Metrics collection and endpoint exposure is optional and opt-in (via the
enablemetricssetting flag from config), and instructions to enable metrics has been documented.OS dependent
Each OS (e.g. macos, windows, linux*) needs an implementation of the process-level metrics collection. Currently only Linux process-level metrics are supported, but people are welcome to contribute for other OS's :)
Openmetrics and general spec stuff
This implementation
iswas written to be compliant to the OpenMetrics 1.0 specification, with minor adjustments to conform to Prometheus server capabilities (turns out that Prometheus didn't actually support the OpenMetrics spec, despite the spec being the first thing that shows up when you do Prom exporter specification research...!). A couple deviations from the OpenMetrics spec:Still, OpenMetrics has some good practices, so most of its rules were followed. There's also OpenTelemetry which is another thing, but we're sticking mostly to Prometheus exposition.
Metric collection types
There are 3 broad categories of metrics being collected: API/http, library, and process.
API
API/HTTP refers to metrics collected by a mojolicious worker handling a single HTTP request/endpoint. Metrics collected include duration and bytes sent/received.
The natural way to handle this passive collection is via mojo hooks.
Also, API metrics group requests by endpoint type. I.e., instead of the full "/api/archives/123456..." endpoint, we use the
Routing.pm"/api/archives/:id` endpoint. This is to handle cardinality explosion.Library
Library/stats refers to the stats mentioned in the initial issue: how many archives, how many pages, etc.. These are usually aggregated values by another worker process during a file monitor event, so metrics can get these data for free and we don't need a separate periodic hook. (archive byte size, on the other hand...)
And since prometheus servers periodically scrape metrics from LRR, it doesn't make sense for metric scraping to also trigger expensive calls that drag the whole server down, so it's best for the metrics API to be as lean as possible.
Process (Minion/Shinobu)
These are the CPU/memory/FD/IO metrics that one may find in node exporter. Process metrics collection is a passive "process", so it's done on as a 30s recurring task.
DB Cleanups
There are 3 ways of doing cleanups for metrics: shutdown cleanup, startup cleanup, and TTL (continuous cleanup).
TTL isn't exaclty a valid approach because it violates OpenMetrics specification that metrics should generally exist for the lifetime of the process (and it causes metrics to disappear). That leaves startup and shutdown, but startup is generally more reliable.