Open
Description
System Info
- GPU B200
- TensorRT-LLM v0.20.0, commit: 82d918b
- Model id:
nvidia/DeepSeek-R1-0528-FP4
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Server command:
trtllm-serve nvidia/DeepSeek-R1-0528-FP4 --max_batch_size 80 --max_num_tokens 163840 --max_seq_len 163840 --kv_cache_free_gpu_memory_fraction 0.6 --port 8000 --trust_remote_code true --backend pytorch --tp_size 8 --pp_size 1 --ep_size 8 --extra_llm_api_options /workspace1/extra-llm-api-config.yaml
extra-llm-api-config.yaml
:
pytorch_backend_config:
use_cuda_graph: true
cuda_graph_batch_sizes: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,100,128,256]
cuda_graph_padding_enabled: true
kv_cache_dtype: fp8
enable_attention_dp: false
enable_chunked_prefill: true
speculative_config:
decoding_type: MTP
num_nextn_predict_layers: 3
use_relaxed_acceptance_for_thinking: true
relaxed_topk: 10
relaxed_delta: 0.6
Then send requests with 50+ RPM.
Expected behavior
Serve continuously.
actual behavior
The server hangs after some requests.
No any error. The main process can continue to response new request, but no token generated.
additional notes
nvidia-smi result when hanging:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA B200 On | 00000000:03:00.0 Off | Off |
| N/A 36C P0 246W / 1000W | 155110MiB / 183359MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA B200 On | 00000000:13:00.0 Off | Off |
| N/A 42C P0 259W / 1000W | 154480MiB / 183359MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA B200 On | 00000000:63:00.0 Off | Off |
| N/A 34C P0 244W / 1000W | 154480MiB / 183359MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA B200 On | 00000000:73:00.0 Off | Off |
| N/A 43C P0 248W / 1000W | 154480MiB / 183359MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA B200 On | 00000000:83:00.0 Off | Off |
| N/A 36C P0 241W / 1000W | 154480MiB / 183359MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA B200 On | 00000000:93:00.0 Off | Off |
| N/A 42C P0 252W / 1000W | 154480MiB / 183359MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA B200 On | 00000000:E3:00.0 Off | Off |
| N/A 35C P0 244W / 1000W | 154480MiB / 183359MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA B200 On | 00000000:F3:00.0 Off | Off |
| N/A 41C P0 245W / 1000W | 154160MiB / 183359MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1431234 C /usr/bin/python 684MiB |
| 0 N/A N/A 1432070 C /usr/bin/python 15440... |
| 1 N/A N/A 1432071 C /usr/bin/python 15446... |
| 2 N/A N/A 1432072 C /usr/bin/python 15446... |
| 3 N/A N/A 1432073 C /usr/bin/python 15446... |
| 4 N/A N/A 1432074 C /usr/bin/python 15446... |
| 5 N/A N/A 1432075 C /usr/bin/python 15446... |
| 6 N/A N/A 1432076 C /usr/bin/python 15446... |
| 7 N/A N/A 1432077 C /usr/bin/python 15414... |
+-----------------------------------------------------------------------------------------+