[Don't merge] Deploying DeepSeek-R1 on H20-96G with SGLang: Best Practices #11854
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Deploying DeepSeek-R1 on H20-96G with SGLang: Best Practices
Introduction
We published an article on LMSYS titled "Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G", sharing our best practices for deploying the DeepSeek-R1 model on H20-96G hardware.
To facilitate reproduction of our experimental results and provide access to our code, we have released this pull request in the DeepSeek-R1 repository.
Reproduction Steps
Pulling the Docker Image
To obtain the Docker image, use the following command:
The image is hosted at: https://github.com/orgs/antgroup/packages/container/package/sglang
Checking Environment Variables
All environment variables are stored in the
/root/env.sh
file, configured for our H20 environment. Before launching SGLang, verify that these variables are suitable for your environment.Launching SGLang
We recommend running four containers: two for Prefill nodes and two for Decode nodes.
1. Launching Prefill Nodes (Identical Configuration for Both Nodes)
Note:
2. Launching Decode Nodes
Note:
{node_rank}
to0
or1
for the respective node.{decode_master_ip}
with the IP address of Node 0.Node-0
3. Launching SGLang Router
Note:
{decode_master_ip}
,{prefill_node_0_ip}
, and{prefill_node_1_ip}
with the respective IP addresses.Testing
1. Running the Benchmark
Note:
--request-rate
is set toinf
, all requests are sent at once, making TTFT and TPOT data less meaningful.{path-to-shareGPT}
with the path to the ShareGPT dataset.2. Observing Logs
To monitor peak performance, filter logs for entries with
running-req: 32
:grep -E 'Decode batch.*running-req: 32' /home/admin/logs/sglang.log
Example Output (for batch size = 32):
Related PRs