Ray Worker Ops Optimization #136

noemotiovon · 2025-02-21T09:40:58Z

What this PR does / why we need it?

In the case where backend = ray, only the main process completes the forward_oot call, while the other worker processes call forward_native. (This bug should also exist when backend = mp.)

Does this PR introduce any user-facing change?

no.

How was this patch tested?

Environment:

CANN: 8.0.0
PyTorch: 2.5.1
Torch: 2.5.1rc1
python: 3.10
python: 3.10
vllm: branch main
vllm-ascend: branch main
The current implementation avoids the Ray Worker initialization issue, as addressed in the PR. Then, during the forward_oot call, logging will be performed.

Script:

python examples/offline_distributed_inference_npu.py

Result:

NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
forward_oot run. #############################################
forward_oot run. #############################################
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.96s/it, est. speed input: 2.80 toks/s, output: 51.00 toks/s]
Prompt: 'Hello, my name is', Generated text: ' Alex and I am a 16 year old male. I have been diagnosed with a rare genetic disorder called X-linked recessive. I have been told that I will not be able to have children. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of'
Prompt: 'The president of the United States is', Generated text: ' Statesman. He is the leader of the country. He is the one who makes the decisions. He is the one who makes the laws. He is the one who makes the rules. He is the one who makes the country strong. He is the one who makes the country happy. He is the one who makes the country safe. He is the one who makes the country free. He is the one who makes the country beautiful. He is the one who makes the country great. He is'
Prompt: 'The capital of France is', Generated text: ' the city of Paris. It is the largest city in France and the second largest city in Europe. It is located in the center of the country, in the south of the country. It is situated on the banks of the Seine River, which flows through the city. The city is surrounded by the Alps and the Pyrenees mountains. The city is also surrounded by the Mediterranean Sea. The city is known for its beautiful architecture, its museums, its parks, and its food. Paris is'
Prompt: 'The future of AI is', Generated text: ' following the path of the internet, and the internet is following the path of the web. The web is a network of interconnected web pages, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network'

Signed-off-by: Chenguang Li <[email protected]>

noemotiovon · 2025-02-21T09:41:24Z

@MengqingCao cc.

Signed-off-by: Chenguang Li <[email protected]>

MengqingCao · 2025-02-21T09:45:34Z

Thanks for this fixing! Let's remove the register of ops in check_and_update_config to avoid redundant registering

noemotiovon · 2025-02-21T09:47:50Z

Thanks for this fixing! Let's remove the register of ops in check_and_update_config to avoid redundant registering

Thanks for reviewing my code. This is a good suggestion!

Signed-off-by: Chenguang Li <[email protected]>

Ray Worker Ops Optimization

b2f09ab

Signed-off-by: Chenguang Li <[email protected]>

Ray Worker Ops Optimization

4f2d18a

Signed-off-by: Chenguang Li <[email protected]>

Ray Worker Ops Optimization

f29a0e2

Signed-off-by: Chenguang Li <[email protected]>

wuhuikx approved these changes Feb 21, 2025

View reviewed changes

wangxiyuan approved these changes Feb 21, 2025

View reviewed changes

wangxiyuan merged commit 202b39a into vllm-project:main Feb 21, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray Worker Ops Optimization #136

Ray Worker Ops Optimization #136

noemotiovon commented Feb 21, 2025

noemotiovon commented Feb 21, 2025

MengqingCao commented Feb 21, 2025

noemotiovon commented Feb 21, 2025

Ray Worker Ops Optimization #136

Ray Worker Ops Optimization #136

Conversation

noemotiovon commented Feb 21, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

noemotiovon commented Feb 21, 2025

MengqingCao commented Feb 21, 2025

noemotiovon commented Feb 21, 2025