[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

ApostaC · 2025-01-27T21:58:51Z

gaocegege · 2025-01-28T01:14:53Z

I think if we have the refactor in the roadmap:

(P2) Transcode the router using a more performant language (e.g., Rust and Go) for better QPS/throughput and lower delay

It would be best to tackle this as soon as possible. As the router grows, it’s only going to get trickier. For the language, I’d go with Go since it meshes well with Kubernetes and monitoring. Go’s k8s client library is way more mature, while Rust's support for Kubernetes isn’t great.

hmellor · 2025-01-28T10:42:13Z

(P0) format checker for the code

I can help bring over the new formatting from vllm if you'd like? It's much simpler than format.sh and (as long as it gets installed) you can't forget to run it!

ApostaC · 2025-01-28T17:21:39Z

It would be best to tackle this as soon as possible. As the router grows, it’s only going to get trickier. For the language, I’d go with Go since it meshes well with Kubernetes and monitoring. Go’s k8s client library is way more mature, while Rust's support for Kubernetes isn’t great.

@gaocegege I see your point. One thing that makes me a bit hesitant is that python is more friendly to the LLM community.
The current plan is to first have a performance benchmark for the router and see how "bad" the current python version is.

Another backup solution in my mind is we have a Go backbone for the data plane but have some python-interface for the routing logics, so that the community (including both industry and academia) can contribute to it.

ApostaC · 2025-01-28T17:22:24Z

I can help bring over the new formatting from vllm if you'd like? It's much simpler than format.sh and (as long as it gets installed) you can't forget to run it!

@hmellor Thanks for chiming in! Would love to see your contribution! Feel free to create a new issue or PR for this!

gaocegege · 2025-01-29T01:14:13Z

It would be best to tackle this as soon as possible. As the router grows, it’s only going to get trickier. For the language, I’d go with Go since it meshes well with Kubernetes and monitoring. Go’s k8s client library is way more mature, while Rust's support for Kubernetes isn’t great.

@gaocegege I see your point. One thing that makes me a bit hesitant is that python is more friendly to the LLM community. The current plan is to first have a performance benchmark for the router and see how "bad" the current python version is.

Another backup solution in my mind is we have a Go backbone for the data plane but have some python-interface for the routing logics, so that the community (including both industry and academia) can contribute to it.

SGTM.

hmellor · 2025-01-29T12:42:41Z

@ApostaC great! Please take a look at #35

spron-in · 2025-01-29T13:30:46Z

I'm curious if you see the evolution of production stack towards an operator. I see someone already suggested CRDs here: #7

With the growth of complexity I would see this stack evolving and becoming more k8s native to simplify operations even more.

ApostaC · 2025-01-29T17:39:58Z

I'm curious if you see the evolution of production stack towards an operator.

Thanks for bring up the question @spron-in .
Right now, the stack is relatively simple so we don’t have immediate plans to do this. Also, we hope that the components in this stack can also be directly ported for different purposes.

But we will definitely consider a more end-to-end solution to simplify the operations when the stack grows more complicated.

nithin8702 · 2025-02-02T03:09:48Z

Looks like we can't change resources(cpu,memory) of vllm-stack-deployment-router deployment
Examples of bringing your own model would be helpful - pytorch preferred
Shall we have image tag with specific version instead of latest for every helm releases?

ApostaC · 2025-02-03T15:10:06Z

Hey @nithin8702 , thanks for your interest. Towards your questions:

Looks like we can't change resources(cpu,memory) of vllm-stack-deployment-router deployment

This should be done in #38

Examples of bringing your own model would be helpful - pytorch preferred

Do you mean load models from some local storage? Please refer to this tutorial.

Shall we have image tag with specific version instead of latest for every helm releases?

Can you elaborate a bit more about this question? I'm not sure I get what you want.

AlexXi19 · 2025-02-05T17:52:52Z

Also interested in seeing least request load balancing!

sitloboi2012 · 2025-02-06T15:30:38Z

I'm currently looking at this (P1) Router observability (Current QPS, router-side queueing delay, number of pending / prefilling / decoding requests, average prefill / decoding length, etc).

Do we have like a list of default observability metrics that we want to have yet: eg the list in the requirement, or you know let's just kinda do some research and pick out some of them for the features and use as default for now ?

I'm planning to contribute to this one

ApostaC · 2025-02-06T18:59:47Z

Also interested in seeing least request load balancing!

@AlexXi19 Hey Alex, we are discussing this in #59, and Kuntai (@KuntaiDu) is currently designing and implementing the functionality in the router.

ApostaC · 2025-02-06T19:08:19Z

I'm currently looking at this (P1) Router observability (Current QPS, router-side queueing delay, number of pending / prefilling / decoding requests, average prefill / decoding length, etc).

Do we have like a list of default observability metrics that we want to have yet: eg the list in the requirement, or you know let's just kinda do some research and pick out some of them for the features and use as default for now ?

I'm planning to contribute to this one

Thanks @sitloboi2012 ! Currently the router does maintain a list of stats internally. We can first open the interface and dump those metrics to Prometheus.

Would you like to open an issue for further discussion?

sitloboi2012 · 2025-02-07T04:44:52Z

I'm currently looking at this (P1) Router observability (Current QPS, router-side queueing delay, number of pending / prefilling / decoding requests, average prefill / decoding length, etc).
Do we have like a list of default observability metrics that we want to have yet: eg the list in the requirement, or you know let's just kinda do some research and pick out some of them for the features and use as default for now ?
I'm planning to contribute to this one

Thanks @sitloboi2012 ! Currently the router does maintain a list of stats internally. We can first open the interface and dump those metrics to Prometheus.

Would you like to open an issue for further discussion?

Yep, let's move this to this Issue: Feat: Router Observability @ApostaC

nithin8702 · 2025-02-13T06:00:52Z

Hi Team

Shall we have release notes for every release?

ApostaC · 2025-02-13T16:29:27Z

Hi Team

Shall we have release notes for every release?

Good question! Will do that soon. @Shaoting-Feng @YuhanLiu11 Take a note please?

nithin8702 · 2025-02-13T17:40:35Z

Does this support multi-tenancy or namespace isolation ?

ApostaC · 2025-02-13T17:49:23Z

Does this support multi-tenancy or namespace isolation ?

You can specify the namespace when doing helm install

simon-mo mentioned this issue Jan 29, 2025

[Roadmap] vLLM Roadmap Q1 2025 vllm-project/vllm#11862

Open

38 tasks

ApostaC pinned this issue Jan 29, 2025

KuntaiDu mentioned this issue Feb 4, 2025

[RFC] prefix-cache-aware routing #59

Open

gaocegege mentioned this issue Feb 17, 2025

[Misc] Implement Singleton Design Pattern for EngineStat Scraper, RequestStat Monitor, and Router #131

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

ApostaC commented Jan 27, 2025 •

edited

Loading

gaocegege commented Jan 28, 2025

hmellor commented Jan 28, 2025

ApostaC commented Jan 28, 2025

ApostaC commented Jan 28, 2025

gaocegege commented Jan 29, 2025

hmellor commented Jan 29, 2025

spron-in commented Jan 29, 2025

ApostaC commented Jan 29, 2025

nithin8702 commented Feb 2, 2025

ApostaC commented Feb 3, 2025

AlexXi19 commented Feb 5, 2025

sitloboi2012 commented Feb 6, 2025 •

edited

Loading

ApostaC commented Feb 6, 2025

ApostaC commented Feb 6, 2025

sitloboi2012 commented Feb 7, 2025 •

edited

Loading

nithin8702 commented Feb 13, 2025

ApostaC commented Feb 13, 2025

nithin8702 commented Feb 13, 2025

ApostaC commented Feb 13, 2025

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

Comments

ApostaC commented Jan 27, 2025 • edited Loading

Core features

CI/CD and packaging

OSS-related supports

gaocegege commented Jan 28, 2025

hmellor commented Jan 28, 2025

ApostaC commented Jan 28, 2025

ApostaC commented Jan 28, 2025

gaocegege commented Jan 29, 2025

hmellor commented Jan 29, 2025

spron-in commented Jan 29, 2025

ApostaC commented Jan 29, 2025

nithin8702 commented Feb 2, 2025

ApostaC commented Feb 3, 2025

AlexXi19 commented Feb 5, 2025

sitloboi2012 commented Feb 6, 2025 • edited Loading

ApostaC commented Feb 6, 2025

ApostaC commented Feb 6, 2025

sitloboi2012 commented Feb 7, 2025 • edited Loading

nithin8702 commented Feb 13, 2025

ApostaC commented Feb 13, 2025

nithin8702 commented Feb 13, 2025

ApostaC commented Feb 13, 2025

ApostaC commented Jan 27, 2025 •

edited

Loading

sitloboi2012 commented Feb 6, 2025 •

edited

Loading

sitloboi2012 commented Feb 7, 2025 •

edited

Loading