[draft] add warmup and startup metrics #69

pmur · 2025-04-03T16:21:53Z

Export the startup variable configuration, and warmup metrics to the /metrics endpoint.

The gauges are prefixed vllm:plugin:spyre in hopes of avoiding namespace conflicts with other users.

TODO: also port to V1 plugin.

github-actions · 2025-04-03T16:22:01Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Export the startup variable configuration, and warmup metrics to the /metrics endpoint. The gauges are prefixed vllm:plugin:spyre in hopes of avoiding namespace conflicts with other users. For vllm-project#58 TODO: also port to V1 plugin. Signed-off-by: Paul Murphy <[email protected]>

pmur · 2025-04-04T14:18:01Z

The github actions don't seem to entirely work on ppc64le:

[murp@rain6p1 vllm-spyre]$ bash format.sh
vLLM yapf: Done
vLLM mypy:
Running mypy on vllm_spyre
Success: no issues found in 21 source files
Running mypy on examples
Success: no issues found in 3 source files
vLLM mypy: Done
vLLM codespell: Done
All checks passed!
vLLM ruff: Done
vLLM isort: Done
vLLM actionlint:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4058  100  4058    0     0  36232      0 --:--:-- --:--:-- --:--:-- 36232
Start downloading actionlint v1.7.3 to /home/murp/git/vllm-spyre
Could not determine arch from machine hardware name 'ppc64le'
tools/actionlint.sh: line 13: ./actionlint: No such file or directory

pmur · 2025-04-04T14:18:56Z

@tdoublep fyi. First stab at pulling startup metrics.

CI usage leads to recycling gauges. Attempt to clean them up before installing them.

joerunde · 2025-04-07T15:25:16Z

vllm_spyre/model_executor/model_loader/spyre_setup.py

+            prometheus_client.REGISTRY.unregister(collector)
+        labels = {}
+        for k in envs_spyre.environment_variables.keys():
+            labels[k] = str(envs_spyre.environment_variables[k]())


Interesting, I think this is kinda cool to dump out all the environment variables as labels, but I think this will be pretty hard to interpret since the values that we're interested in are in the labels and not the gauge values. If I understand this correctly, we're just gonna have a bunch of gauges all set to the value 1, with a lot of labels attached to them saying what the values for these environment variables are, and trying to visualize that info with a tool like instana or grafana sounds like it would be pretty clunky at best and confusing at worst. For instance, if I tried to filter on one of these config values, I might expect to see all the runtime metrics filtered by pods with that configuration, but instead I would just get this one gauge.

I'm not sure that prometheus is the best tool to use to report static deployment configuration like this. Shouldn't the logs already report configuration, and the user's CD solution should have their deployment configs checked in as well?

This is the only label gauge, so far. I should be consistent with other usage within vllm which exposes configuration options. It's much less fickle than parsing logs, which may not be available to testing clients.

The desired usage here isn't to fill out a dashboard, but to provide performance testing a means to query the data in a structured way. Parsing the contents of curl localhost:8000/metrics is preferable to finding and reconstructing logs.

Ah, gotcha. I've never seen config exposed this way for automated perf tests before, but that does sound easier than trying to inject new rest endpoints into the running app to expose the config.

Since this is designed specifically for use during performance testing, it would be really great to have that documented in a comment here. It'd also be great to have the ability to turn this off when we're not running performance tests, so that we can reduce the load on prometheus and avoid confusing users with any of these labels

joerunde · 2025-04-07T15:28:30Z

vllm_spyre/worker/spyre_worker.py

+# Capture a oneshot metric (e.x warmup)
+def _spyre_metric_oneshot(name, labels, value):
+    name = "vllm:plugin:spyre:" + name
+    # Cleanup existing, for CI.


Why is this cleanup required?

The prometheus client is independent of the vllm objects being created and destroyed while inside the same python vm. The original PR didn't clean up, and the result was many failures due to already registered gauges. vllm handles this in a similar way.

joerunde · 2025-04-07T15:37:12Z

vllm_spyre/worker/spyre_worker.py

+        prometheus_client.REGISTRY.unregister(collector)
+    gauge = prometheus_client.Gauge(name=name,
+                                    documentation="Oneshot spyre metric",
+                                    multiprocess_mode='mostrecent',


v0 uses multiprocess mode to collect metrics from processes that aren't the main server process, but my last understanding of the prometheus integration in v1 was that it didn't use multiprocessing. I thought it instead collected all the stats to be reported as part of the EngineCoreOutputs class and then reported those from the frontend process.

If that's still true, then we're gonna have to do something a lot different to capture any extra metrics for v1

Using the default multiprocess mode adds a pid label to each gauge. Instead, rank is used to differentiate each sampling. It can be removed without harm if we don't mind the extra noise.

Right, what I'm saying is that I know this works for v0 because multiprocessing mode is enabled, but I think that trying this same strategy for v1 might just fail because vllm doesn't set up prometheus multiprocessing at all. So your TODO: also port to V1 plugin might require a very different design

Looking into the V1 changes, I don't think we can use prometheus unless the multiprocess support is turned back on. I am not sure if or when that would happen given it is turned off for performance reasons.

I am not sure what other options beyond finding and scraping logs exist when using V1. Any hints?

Dang, yeah that's what I suspected would be the case. I also doubt the multiprocess support will be turned back on, though we could ask.

We could write out the info to files? Then you could mount a shared remote storage when you deploy and run, and collect the data you need from there.

joerunde · 2025-04-07T15:40:18Z

vllm_spyre/worker/spyre_worker.py

+        collector = prometheus_client.REGISTRY._names_to_collectors[name]
+        prometheus_client.REGISTRY.unregister(collector)
+    gauge = prometheus_client.Gauge(name=name,
+                                    documentation="Oneshot spyre metric",


Documentation about the specific metric would be great for users here

joerunde · 2025-04-07T15:51:31Z

vllm_spyre/worker/spyre_worker.py

+                    "prompt_len": str(prompt_len),
+                    "num_decode_tokens": str(num_decode_tokens),
+                    "batch_size": str(batch_size)
+                }, time)


Instead of emitting a uniquely labeled gauge for each warmup pass, it might be sufficient to use a histogram metric here without all the labels. The system logs will still show the time spent in each warmup pass, so if the histogram reveals some warmup passes are too slow we'll still be able to identify which pass it was.

Having lots of labels is probably fine for smaller deployments, but this adds 3 * len(warmup_shapes) new timeseries for each replica, instead of just one, which could cause problems with prometheus at scale

How many shapes do you think will be used at one time? Similarly, specific timing needs to be maintained to line up with performance logging.

What are the thoughts on adding another envvar to enable metrics, ex VLLM_SPYRE_WARMUP_ENABLE_METRICS or similar?

How many shapes do you think will be used at one time?

I don't know, but once this is all actually ready for production I imagine there could be quite a lot of shapes for some models. If these are all meant just to enable performance testing then I'm in favor of putting all of them behind one single flag, maybe something like VLLM_SPYRE_PERF_TESTING_METRICS to be specific about it.

For regular deployment monitoring, I'd expect that all deployments would used cached graphs so this warmup will always be fast, and maybe we'd only need the single total warmup time metric to potentially alert on any invalid caches

pmur · 2025-04-09T17:08:39Z

Back to the drawing board...

pmur force-pushed the murp/add-metrics branch from 4c21f54 to b4a5dcd Compare April 4, 2025 14:18

TODO: fix metrics to work under ci

c3564c7

CI usage leads to recycling gauges. Attempt to clean them up before installing them.

joerunde reviewed Apr 7, 2025

View reviewed changes

pmur closed this Apr 9, 2025

[draft] add warmup and startup metrics #69

[draft] add warmup and startup metrics #69

Uh oh!

Conversation

pmur commented Apr 3, 2025

Uh oh!

github-actions bot commented Apr 3, 2025

Uh oh!

pmur commented Apr 4, 2025

Uh oh!

pmur commented Apr 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmur commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants