Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Pipeline parallelism support #101

Open
Shaoting-Feng opened this issue Feb 10, 2025 · 6 comments
Open

Discussion: Pipeline parallelism support #101

Shaoting-Feng opened this issue Feb 10, 2025 · 6 comments

Comments

@Shaoting-Feng
Copy link
Collaborator

We aim to support pipeline parallelism for vLLM engines, which will enable us to efficiently handle large-scale models by dividing the workload into manageable portions. By utilizing pipeline parallelism, we can significantly enhance inference throughput.

In vLLM, pipeline parallelism on a single node can be enabled through a command-line argument. We should incorporate this functionality into our Helm chart to provide seamless support.

@gaocegege
Copy link
Collaborator

Is pipeline parallelism necessary for single-node deployment? I believe tensor parallelism is more suitable in this situation (Single-Node Multi-GPU).

@ApostaC
Copy link
Collaborator

ApostaC commented Feb 10, 2025

I think Yuhan @YuhanLiu11 is already working on tensor parallelism (issue #97 ). We are also having some discussion about how to do the muti-node stuff. Will try creating an RFC for that soon.

@Shaoting-Feng
Copy link
Collaborator Author

I think tp and pp can work together, especially for users with a single node containing multiple GPUs, such as an 8-GPU setup. I agree tp is more suitable for a single node, but pp will be supported for multi nodes in the future.

@gaocegege
Copy link
Collaborator

gaocegege commented Feb 10, 2025

Totally agree that we should have pipeline parallelism for multi-node setups! I'm just not sure if we really need it for single-node in the Helm chart.

pipeline parallelism on a single node can be enabled through a command-line argument. We should incorporate this functionality into our Helm chart to provide seamless support.

But, it might not be a big deal either way.

@moriabs88
Copy link

@gaocegege Multi-node support with multiple GPUs is a highly valuable feature, and it would be fantastic to see this included in the chart.

If I’m not mistaken, implementing this would require deploying a Ray cluster to enable multi-node functionality.

@gaocegege
Copy link
Collaborator

If I’m not mistaken, implementing this would require deploying a Ray cluster to enable multi-node functionality.

Yes, unless vllm-project/vllm#3902 vllm-project/vllm#12511 is supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants