[V1] Support DP with Ray #18779

ruisearch42 · 2025-05-27T21:29:50Z

This PR adds support for DP with Ray, with support for multi-node and API server scale out.

We reuse ZMQ communication mechanism between frontend and engine cores, as in #15977 , and the same API server scale out mechanism, as in #17546

Main differences from those PRs:

The handshake between frontend and engine cores are greatly simplified, thanks to the Ray API
We can launch all DP ranks just on the head node

Examples

This will run DP=4 on the head node.

# Head node  (with ip address 10.99.48.128)
vllm serve $MODEL --data-parallel-size 4 --data-parallel-size-local 4 \
                  --data-parallel-address 10.99.48.128 --data-parallel-rpc-port 13345
                  --data-parallel-backend ray

This will run DP=4 with DP ranks 0 and 1 on the head node and ranks 2 and 3 on other nodes.

# Head node  (with ip address 10.99.48.128)
vllm serve $MODEL --data-parallel-size 4 --data-parallel-size-local 2 \
                  --data-parallel-address 10.99.48.128 --data-parallel-rpc-port 13345
                  --data-parallel-backend ray

This will run DP=4 with only the API server on the head node and all engines other nodes:

# Head node  (with ip address 10.99.48.128)
vllm serve $MODEL --data-parallel-size 4 --data-parallel-size-local 0 \
                  --data-parallel-address 10.99.48.128 --data-parallel-rpc-port 13345
                  --data-parallel-backend ray

Design

See the following illustration. DP Coordinator is omitted, but is the same as #17546 .

Signed-off-by: Nick Hill <[email protected]>

…-engines Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core_client.py # vllm/v1/utils.py

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/config.py # vllm/engine/arg_utils.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Signed-off-by: Nick Hill <[email protected]>

…-engines Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/config.py # vllm/v1/engine/core.py

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core_client.py # vllm/v1/utils.py

…-engines

…-engines Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Signed-off-by: Nick Hill <[email protected]>

Avoid exception but still needs more work to be functional with multiple api server procs. Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core_client.py

…nto all-to-all Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/entrypoints/openai/api_server.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Signed-off-by: Nick Hill <[email protected]>

# Conflicts: # vllm/v1/core/sched/scheduler.py

# Conflicts: # vllm/v1/engine/core.py

Signed-off-by: Nick Hill <[email protected]>

Co-authored-by: Tyler Michael Smith <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

# Conflicts: # vllm/entrypoints/cli/serve.py

@ZhengWG

Thanks @ZhengWG for catching this! Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

github-actions · 2025-05-27T21:30:01Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Nick Hill <[email protected]>

kouroshHakha

Overall, looks good. There are some comments that need to be addressed and some follow ups that are not blockers.

kouroshHakha · 2025-05-28T01:11:52Z

vllm/v1/utils.py

+            assert nodes[0].node_ip == head_node_ip, (
+                "The first node must be the head node")


This check is not quite right and actually blocks.

-dpa can be localhost or 127.0.0.1 in which case it will violate this assertion and block the serve run.

If there is a single head node, 127.0.0.1 should work fine here. Regarding localhost, we should clearly define the expected IP or hostname.
If there are multiple IPs, 127.0.0.1 should not be passed as -dpa.

even in 127.0.0.1, node_ip would return the physical ip and not 127.0.0.1. (think of using this on anyscale, the node.node_ip would return the actual ip of the head node)

hmm, why should the user pass in 127.0.0.1? If they want to specify a public IP, they can pass in. If they want to use the current node IP, they can leave it out, current code:

if self.data_parallel_address is None: if self.data_parallel_backend == "ray": host_ip = get_ip() logger.info( "Using host IP %s as ray-based data parallel address", host_ip) data_parallel_address = host_ip

hmm, why should the user pass in 127.0.0.1? If they want to specify a public IP

Because it's a simple ip to pass in when you mean local :) without having to do a get_ip yourself. I know they can leave it out, but explicit should also work I think. I don't think you are really gaining anything by checking if node.ip == specified_ip

Still don't feel passing in 127.0.0.1 is necessary :) but it might be convenient.

In that case, we may just resolve to host IP, just like when None is passed in, if we want to support it.

Checking node.ip == dp_master_ip is just to respect dp_size_local, or more accurately, the DP ranks to allocate on DP master node. If we don't need that functionality, we can remove it. Currently it gives a knob to decide how to allocate on DP master node.

kouroshHakha · 2025-05-28T01:13:52Z

vllm/v1/utils.py

+                           key=lambda node: node.node_ip != head_node_ip)
+            assert nodes[0].node_ip == head_node_ip, (
+                "The first node must be the head node")
+            assert len(nodes) == 1 or nodes[1].node_ip != head_node_ip, (


I don't understand the second condition in or?

If it is multi-node, the second node (after sorting) cannot have the same IP as the head node.

how can this even happen? I feel like we can remove this assertion and it will never get violated? Also there would be no reason to sort.

yeah this should not happen, using assertion is just for defensive programming, in cases things messed up. We can remove this as well, should be pretty safe as you mentioned.

See the other comments regarding why the sorting is preferred. I originally did not use sorting, but code is cleaner with it.

kouroshHakha · 2025-05-28T01:14:39Z

vllm/v1/utils.py

+        else:
+            logger.info("Creating placement groups")
+            nodes = list_nodes()
+            nodes = sorted(list_nodes(),


why do you need to sort the nodes by node ip?

Sorting makes the head node always the first node, and worker nodes follow. This makes it easier to create and manage placement_groups/local_dp_ranks

why would sorting put head node first? What is the comparison basis? Is this a property of node objects returned by ray api?

Oh because of the lambda we use:

nodes = sorted(list_nodes(), key=lambda node: node.node_ip != head_node_ip)

oh I C. You are doing all this sorting for the check below, right? Or you need sorting for another reason?

basically to be able to respect dp_size_local. With sorting the code is simpler.

kouroshHakha · 2025-05-28T01:39:55Z

vllm/engine/arg_utils.py

@@ -618,6 +619,11 @@ def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser:
                                    type=int,
                                    help='Port for data parallel RPC '
                                    'communication.')
+        parallel_group.add_argument('--data-parallel-backend',


add default as EngineArgs.data_parallel_backend

In the class definition of EngineArgs, there is already

data_parallel_backend: str = ParallelConfig.data_parallel_backend

and ParallelConfig.data_parallel_backend is by default mp.

This follows the current convention of defining defaults.

When you don't pass --data-parallel-backend through cli it will be None and will barf later where you check it should be either be ray or mp. Try it.

hmm, tried it, I think it is correctly statically initialized at https://github.com/vllm-project/vllm/pull/18779/files#diff-ea8b8ff63961713ccb62d78e53e96404b587b7828cb9fee08a9e5576bf563673R293
no?

kouroshHakha · 2025-05-28T05:18:05Z

vllm/v1/utils.py

+                    assert available_engine_count >= local_engine_count, (
+                        "Not enough resources to allocate DP ranks "
+                        f"on DP master node {node_ip}")
+                    for i in range(local_engine_count):


so if I don't pass data_parallel_local when using ray, I think the current config path chooses dp_size for data_parallel_local size, but it does not make sense to make this loop the default one. This loop, should only be on the critical path if you really want granular control over selecting a certain amount of dp workers to be guaranteed to run locally.

So to summarize

vllm serve $MODEL -dp 4 -dpb ray

should pass through bundles = [{"GPU": 1.0}] * world_size + [{"CPU": 1.0}] as critical path not the other one.

yeah you are right. If we don't specify dp_size_local then dp_size will be used. This behavior is the same as MP.

To achieve the behavior you mentioned (and respect dp_size_local), we could do one of the following:

Use special value (None or -1) for dp_size_local to mean "not specified". However, MP probably does not like these special values.

Use another ray only flag (e.g., dp_size_local_ignore, dp_size_local_auto or some better name) to override dp_size_local

Out of these, 2) is probably better.

Or for ray we simply do not respect dp_size_local? Is this your preference?

You can do 2 without adding a new flag. Basically the default for dp_size_local is established at runtime, so you can set it to 0 if it's not specified and backend is ray.

dp_size_local=0 means "don't allocate on DP master node" (in current semantics, including MP). Did you mean to change it to "allocate on DP master node (and other nodes) based on resource availability"?

I think you are suggesting 1) but with 0 as the special value. Not sure if 0 is a good special value:

The semantics differ between MP and Ray, and

0 is a bit confusing when it means "unspecified". It's weird that when dp_size_local is respected when the value > 0, but when it is 0, it means "unspecified".

kouroshHakha · 2025-05-28T05:25:40Z

vllm/v1/utils.py

+            ray.util.remove_placement_group(pg)
+
+
+def wait_for_engine_startup(


Please add a todo to unify this with the other wait_for_enigne_startup or sth. There is a few redundancies between this and the other one.

This is MP only, and Ray doesn't actually call this method. Ray wait for engine startup is trivial, basically:

refs.append(actor.wait_for_init.remote()) for actor in actors ray.get(refs)

And there is no need to introduce a method for Ray.

I think I missed that this was from nick's PR.

kouroshHakha · 2025-05-28T05:27:28Z

vllm/v1/utils.py

+                        f"died with exit code {proc.exitcode}")
+
+            if actor_run_refs:
+                import ray


import at the top?

not every vLLM installation has ray. This avoids import error for cases don't need to use ray.

kouroshHakha · 2025-05-28T05:31:06Z

vllm/v1/utils.py

@@ -164,9 +284,328 @@ def finished_procs(self) -> dict[str, int]:
        }


+class CoreEngineActorManager:


Should this go to ray_utils.py?

Currently there is a ray_utils.py in vllm/executor. That is not a proper place.
I also tried moving this to a new file ray_dp.py in vllm/v1, and feels it does not actually improve the organization: 1) the imports will be circular unless we do larger refactoring, e.g. reorganizing vllm/v1/utils.py ; 2) placing CoreEngineProcManager and CoreEngineActorManager closer for now groups similar functionalities.
I feel refactoring is better done as a follow up, where we reorganize vllm/v1/utils.py as well.

vllm/v1/engine/core.py

kouroshHakha · 2025-05-28T05:57:11Z

vllm/v1/utils.py

+            placement_groups = []
+            local_dp_ranks = []
+
+            for node in nodes:


Can we re-iterate on why we need to be aware of nodes when using ray? We could be very much in a cluster that has not scaled, as a result the nodes would be just a cpu head node. Why not do sth simple and general like the following?

if placement_groups is not None: ... else: placement_groups = [] for _ in range(dp_size): bundles = [{"GPU": 1.0}] * world_size + [{"CPU": 1.0}] pg = ray.util.placement_group( name=f"dp_rank_{len(placement_groups)}", strategy="STRICT_PACK", bundles=bundles, ) placement_groups.append(pg)

We should not have any diff between local / remote actors then.

I do think we should merge this and then re-evaluate the need for keep local vs. remote concept at all.

Yeah good point. We may end up getting rid of the node IP placement restriction. I wanted to use this in Ray Serve and clarify the exact requirements and preference, so thought it would be better to re-evaluate (this is an easy change, internal to Ray DP) rather than optimizing too early.

mergify · 2025-05-28T07:16:26Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ruisearch42.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Rui Qiao <[email protected]>

njhill added 30 commits April 4, 2025 17:04

[V1] DP scale-out (2/N): Decouple engine process management and comms

8802521

Signed-off-by: Nick Hill <[email protected]>

Headless mode

e869380

Signed-off-by: Nick Hill <[email protected]>

Wire data_parallel_address arg

1ca3d15

Signed-off-by: Nick Hill <[email protected]>

Some code cleanup

a551183

Signed-off-by: Nick Hill <[email protected]>

Fix offline DP compatibility

a662169

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'refs/remotes/origin/main' into decouple…

b29dcf4

…-engines Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core_client.py # vllm/v1/utils.py

Address some review comments

8126f72

Signed-off-by: Nick Hill <[email protected]>

Address other minor review comments

8fdc6f5

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'origin/main' into decouple-engines

9c90ad4

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/config.py # vllm/engine/arg_utils.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Merge remote-tracking branch 'origin/main' into decouple-engines

80f9c98

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Fix merge error, address @russellb's ipv6 review comment

efa8ad8

Signed-off-by: Nick Hill <[email protected]>

Hande ipv6 URIs in all places

30ab14b

Signed-off-by: Nick Hill <[email protected]>

Fix head node with no engines, don't require dp size on other nodes

acc5af3

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'refs/remotes/origin/main' into decouple…

1649d7d

…-engines Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/config.py # vllm/v1/engine/core.py

Merge remote-tracking branch 'origin/main' into decouple-engines

4fbf90e

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core_client.py # vllm/v1/utils.py

Merge remote-tracking branch 'refs/remotes/origin/main' into decouple…

86a0453

…-engines

Merge remote-tracking branch 'origin/main' into decouple-engines

e70545c

Merge remote-tracking branch 'refs/remotes/origin/main' into decouple…

24b2e1e

…-engines Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

[Perf] API-server scaleout with all-to-all server-engine comms

c76e8e5

Signed-off-by: Nick Hill <[email protected]>

Fix engine init num_gpu_blocks logging

742b532

Avoid exception but still needs more work to be functional with multiple api server procs. Signed-off-by: Nick Hill <[email protected]>

Improve load balancing

6340c87

Signed-off-by: Nick Hill <[email protected]>

small fixes

877f195

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'origin/main' into decouple-engines

f7a909e

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core_client.py

Merge remote-tracking branch 'refs/remotes/njhill/decouple-engines' i…

12da06b

…nto all-to-all Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/entrypoints/openai/api_server.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Fix test_startup_failure

42c30bf

Signed-off-by: Nick Hill <[email protected]>

Fix mock config related test failure

3904d10

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'origin/main' into decouple-engines

cece58a

Merge remote-tracking branch 'njhill/decouple-engines' into all-to-all

811d8f4

# Conflicts: # vllm/v1/core/sched/scheduler.py

Merge remote-tracking branch 'origin/main' into decouple-engines

02f7263

# Conflicts: # vllm/v1/engine/core.py

Merge remote-tracking branch 'njhill/decouple-engines' into all-to-all

77b7821

njhill and others added 10 commits May 23, 2025 10:57

Ensure first_req_send zmq socket is closed properly

b425be0

Signed-off-by: Nick Hill <[email protected]>

Add some more comments

4489ad3

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'refs/remotes/origin/main' into all-to-all

9329570

More comments

8f921cd

Signed-off-by: Nick Hill <[email protected]>

Fix msgpack decode for strongly-typed handshake metadat

9b74c58

Signed-off-by: Nick Hill <[email protected]>

Update vllm/v1/engine/coordinator.py

7ab4fea

Co-authored-by: Tyler Michael Smith <[email protected]>

Address review comments from @tlrmchlsmth

8213b20

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'origin/main' into all-to-all

4af0845

# Conflicts: # vllm/entrypoints/cli/serve.py

Fix wave variable assignment in coordinator

ed522eb

Thanks @ZhengWG for catching this! Signed-off-by: Nick Hill <[email protected]>

Add more comments to scheduler changes

497a28f

Signed-off-by: Nick Hill <[email protected]>

ruisearch42 requested review from DarkLight1337, robertgshaw2-redhat, simon-mo, WoosukKwon, njhill, ywang96, comaniac, alexm-redhat and jeejeelee as code owners May 27, 2025 21:29

mergify bot added frontend v1 labels May 27, 2025

fix scheduler Request constructor backwards compat

9347984

Signed-off-by: Nick Hill <[email protected]>

ruisearch42 force-pushed the ray_dp_rebase3 branch from 6e504a4 to 349fa10 Compare May 28, 2025 00:13

kouroshHakha reviewed May 28, 2025

View reviewed changes

mergify bot added the needs-rebase label May 28, 2025

ruisearch42 force-pushed the ray_dp_rebase3 branch from 349fa10 to e025582 Compare May 28, 2025 18:59

[V1] Support DP with Ray

4d6f14a

Signed-off-by: Rui Qiao <[email protected]>

ruisearch42 force-pushed the ray_dp_rebase3 branch from 2170b87 to 4d6f14a Compare May 28, 2025 23:24

		assert nodes[0].node_ip == head_node_ip, (
		"The first node must be the head node")

		ray.util.remove_placement_group(pg)


		def wait_for_engine_startup(

		@@ -164,9 +284,328 @@ def finished_procs(self) -> dict[str, int]:
		}


		class CoreEngineActorManager:

Uh oh!

[V1] Support DP with Ray #18779

Are you sure you want to change the base?

[V1] Support DP with Ray #18779

Uh oh!

Conversation

ruisearch42 commented May 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Examples

Design

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruisearch42 May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruisearch42 May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruisearch42 commented May 27, 2025 •

edited by github-actions bot

Loading

kouroshHakha May 28, 2025 •

edited

Loading

kouroshHakha May 28, 2025 •

edited

Loading

ruisearch42 May 28, 2025 •

edited

Loading

ruisearch42 May 28, 2025 •

edited

Loading

ruisearch42 May 28, 2025 •

edited

Loading