-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Pipeline parallelism support #101
Comments
Is pipeline parallelism necessary for single-node deployment? I believe tensor parallelism is more suitable in this situation (Single-Node Multi-GPU). |
I think Yuhan @YuhanLiu11 is already working on tensor parallelism (issue #97 ). We are also having some discussion about how to do the muti-node stuff. Will try creating an RFC for that soon. |
I think tp and pp can work together, especially for users with a single node containing multiple GPUs, such as an 8-GPU setup. I agree tp is more suitable for a single node, but pp will be supported for multi nodes in the future. |
Totally agree that we should have pipeline parallelism for multi-node setups! I'm just not sure if we really need it for single-node in the Helm chart.
But, it might not be a big deal either way. |
@gaocegege Multi-node support with multiple GPUs is a highly valuable feature, and it would be fantastic to see this included in the chart. If I’m not mistaken, implementing this would require deploying a Ray cluster to enable multi-node functionality. |
Yes, unless vllm-project/vllm#3902 vllm-project/vllm#12511 is supported. |
We aim to support pipeline parallelism for vLLM engines, which will enable us to efficiently handle large-scale models by dividing the workload into manageable portions. By utilizing pipeline parallelism, we can significantly enhance inference throughput.
In vLLM, pipeline parallelism on a single node can be enabled through a command-line argument. We should incorporate this functionality into our Helm chart to provide seamless support.
The text was updated successfully, but these errors were encountered: