-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] vLLM production stack roadmap for 2025 Q1 #26
Comments
I think if we have the refactor in the roadmap:
It would be best to tackle this as soon as possible. As the router grows, it’s only going to get trickier. For the language, I’d go with Go since it meshes well with Kubernetes and monitoring. Go’s k8s client library is way more mature, while Rust's support for Kubernetes isn’t great. |
I can help bring over the new formatting from |
@gaocegege I see your point. One thing that makes me a bit hesitant is that python is more friendly to the LLM community. Another backup solution in my mind is we have a Go backbone for the data plane but have some python-interface for the routing logics, so that the community (including both industry and academia) can contribute to it. |
@hmellor Thanks for chiming in! Would love to see your contribution! Feel free to create a new issue or PR for this! |
SGTM. |
I'm curious if you see the evolution of production stack towards an operator. I see someone already suggested CRDs here: #7 With the growth of complexity I would see this stack evolving and becoming more k8s native to simplify operations even more. |
Thanks for bring up the question @spron-in . But we will definitely consider a more end-to-end solution to simplify the operations when the stack grows more complicated. |
|
Hey @nithin8702 , thanks for your interest. Towards your questions:
This should be done in #38
Do you mean load models from some local storage? Please refer to this tutorial.
Can you elaborate a bit more about this question? I'm not sure I get what you want. |
Also interested in seeing least request load balancing! |
I'm currently looking at this Do we have like a list of default observability metrics that we want to have yet: eg the list in the requirement, or you know let's just kinda do some research and pick out some of them for the features and use as default for now ? I'm planning to contribute to this one |
Thanks @sitloboi2012 ! Currently the router does maintain a list of stats internally. We can first open the interface and dump those metrics to Prometheus. Would you like to open an issue for further discussion? |
Yep, let's move this to this Issue: Feat: Router Observability @ApostaC |
Hi Team Shall we have release notes for every release? |
Good question! Will do that soon. @Shaoting-Feng @YuhanLiu11 Take a note please? |
Does this support multi-tenancy or namespace isolation ? |
You can specify the namespace when doing |
This project's scope involves a set of production-related modules around vLLM, including router, autoscaling, observability, KV cache offloading, and framework supports (KServe, Ray, etc).
This document will include the items on our Q1 roadmap. We will keep updating this document to include the related issues, pull requests, and discussions in the #production-stack channel in the vLLM slack.
Core features
CI/CD and packaging
vllm-router
(chore: Make router a python package #17)OSS-related supports
pre-commit
based linting and formatting #35)If any of the items you wanted are not on the roadmap, your suggestion and contribution are strongly welcomed! Please feel free to comment in this thread, open a feature request, or create an RFC.
Happy vLLMing!
The text was updated successfully, but these errors were encountered: