-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development Roadmap (2024 Q4) #1487
Comments
Are there any plans to optimize long context latency? |
Hi,can I help for Multi-layer radix cache (GPU/CPU/Disk)? Really insterested in that. |
I am interested in contributing to P-D split inference architechure and I have machines that support me to develop the architechure, if you guys got any related develop plans please let me know. Thank you @Ying1123 @zhyncs @fengyang95 |
@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that. |
@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw |
thanks for invitation, I am in slack now. forward to collaberate with you |
Thanks for your invitation! |
@lumiere-ml @zhyncs I'm also very interested, could you share which channel you're using to discuss? |
If no one is actively working on supporting pipeline parallelism, I'm down to help |
@mfdj2002 I think @CalvinXKY has expressed interest on slack, you can chat with him there |
No one is working on pipeline parallelism. Feel free to contribute one. |
I recently completed a reward model implementation for RMs trained by LlamaFactory. Everything worked well but I've noticed a relatively small value diff in last hidden states between my SGLang implementation and the counterpart in TRL (resulting a ROC loss of ~0.3%) Regardless, I think I can help with the task "Support generalized reward API (adding linear layers to any Causal LM to get the reward)" |
i am interested in sequence parallelism, i want to know if the sequence parallelism will use the method of Context Parallelism for Scalable Million-Token Inference , thanks |
Amazing, could you please send an Email with your wechat or other connection to [email protected] We can also discuss this on our Slack. find [email protected] on sglang slack plz! |
I am also very interested in the scenario of PD disaggregation, and I hope to combine radix tree with PD disaggregation for some experiments. I saw that someone mentioned this in October. May I ask how the current development plan is progressing? |
@trh11111 Yeah. We have new members joined our team work on this and PD disaggregation is the first-priority in our developmap for our next quoter. |
Hi, I have just finish my graduation recruiment senson and am working on my ATC paper. I'll be soon looking into the development |
@trh11111 if you feel interested in this part, could reach out to us on slack. |
how to join this slack channel |
Here is the development roadmap for 2024 Q4. Contributions and feedback are welcome (Join Bi-weekly Development Meeting). Previous 2024 Q3 roadmap can be found in #634.
Performance
Parallelism
Hardware Coverage
Model Coverage
New features
sglang/docs/references/faq.md
Line 3 in 8912b76
Quantization
@HaiShaw @zhyncs @ispobock
Server API
Observability
Others
The text was updated successfully, but these errors were encountered: