Discussion: Pipeline parallelism support #101

Shaoting-Feng · 2025-02-10T03:16:58Z

We aim to support pipeline parallelism for vLLM engines, which will enable us to efficiently handle large-scale models by dividing the workload into manageable portions. By utilizing pipeline parallelism, we can significantly enhance inference throughput.

In vLLM, pipeline parallelism on a single node can be enabled through a command-line argument. We should incorporate this functionality into our Helm chart to provide seamless support.

gaocegege · 2025-02-10T03:22:28Z

Is pipeline parallelism necessary for single-node deployment? I believe tensor parallelism is more suitable in this situation (Single-Node Multi-GPU).

ApostaC · 2025-02-10T03:50:29Z

I think Yuhan @YuhanLiu11 is already working on tensor parallelism (issue #97 ). We are also having some discussion about how to do the muti-node stuff. Will try creating an RFC for that soon.

Shaoting-Feng · 2025-02-10T03:53:53Z

I think tp and pp can work together, especially for users with a single node containing multiple GPUs, such as an 8-GPU setup. I agree tp is more suitable for a single node, but pp will be supported for multi nodes in the future.

gaocegege · 2025-02-10T04:24:14Z

Totally agree that we should have pipeline parallelism for multi-node setups! I'm just not sure if we really need it for single-node in the Helm chart.

pipeline parallelism on a single node can be enabled through a command-line argument. We should incorporate this functionality into our Helm chart to provide seamless support.

But, it might not be a big deal either way.

moriabs88 · 2025-02-18T17:27:14Z

@gaocegege Multi-node support with multiple GPUs is a highly valuable feature, and it would be fantastic to see this included in the chart.

If I’m not mistaken, implementing this would require deploying a Ray cluster to enable multi-node functionality.

gaocegege · 2025-02-19T02:14:41Z

If I’m not mistaken, implementing this would require deploying a Ray cluster to enable multi-node functionality.

Yes, unless vllm-project/vllm#3902 vllm-project/vllm#12511 is supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Pipeline parallelism support #101

Discussion: Pipeline parallelism support #101

Shaoting-Feng commented Feb 10, 2025

gaocegege commented Feb 10, 2025

ApostaC commented Feb 10, 2025

Shaoting-Feng commented Feb 10, 2025

gaocegege commented Feb 10, 2025 •

edited

Loading

moriabs88 commented Feb 18, 2025

gaocegege commented Feb 19, 2025

Discussion: Pipeline parallelism support #101

Discussion: Pipeline parallelism support #101

Comments

Shaoting-Feng commented Feb 10, 2025

gaocegege commented Feb 10, 2025

ApostaC commented Feb 10, 2025

Shaoting-Feng commented Feb 10, 2025

gaocegege commented Feb 10, 2025 • edited Loading

moriabs88 commented Feb 18, 2025

gaocegege commented Feb 19, 2025

gaocegege commented Feb 10, 2025 •

edited

Loading