Skip to content

Server stuck with high load #5391

Open
Open
@k-l-lambda

Description

@k-l-lambda

System Info

  • GPU B200
  • TensorRT-LLM v0.20.0, commit: 82d918b
  • Model id: nvidia/DeepSeek-R1-0528-FP4

Who can help?

@kaiyux

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Server command:

trtllm-serve nvidia/DeepSeek-R1-0528-FP4 --max_batch_size 80 --max_num_tokens 163840 --max_seq_len 163840 --kv_cache_free_gpu_memory_fraction 0.6 --port 8000 --trust_remote_code true --backend pytorch --tp_size 8 --pp_size 1 --ep_size 8 --extra_llm_api_options /workspace1/extra-llm-api-config.yaml

extra-llm-api-config.yaml:

pytorch_backend_config:
  use_cuda_graph: true
  cuda_graph_batch_sizes: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,100,128,256]
  cuda_graph_padding_enabled: true
  kv_cache_dtype: fp8
enable_attention_dp: false
enable_chunked_prefill: true

speculative_config:
  decoding_type: MTP
  num_nextn_predict_layers: 3

  use_relaxed_acceptance_for_thinking: true
  relaxed_topk: 10
  relaxed_delta: 0.6

Then send requests with 50+ RPM.

Expected behavior

Serve continuously.

actual behavior

The server hangs after some requests.

No any error. The main process can continue to response new request, but no token generated.

additional notes

nvidia-smi result when hanging:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03              Driver Version: 575.51.03      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA B200                    On  |   00000000:03:00.0 Off |                  Off |
| N/A   36C    P0            246W / 1000W |  155110MiB / 183359MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA B200                    On  |   00000000:13:00.0 Off |                  Off |
| N/A   42C    P0            259W / 1000W |  154480MiB / 183359MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA B200                    On  |   00000000:63:00.0 Off |                  Off |
| N/A   34C    P0            244W / 1000W |  154480MiB / 183359MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA B200                    On  |   00000000:73:00.0 Off |                  Off |
| N/A   43C    P0            248W / 1000W |  154480MiB / 183359MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA B200                    On  |   00000000:83:00.0 Off |                  Off |
| N/A   36C    P0            241W / 1000W |  154480MiB / 183359MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA B200                    On  |   00000000:93:00.0 Off |                  Off |
| N/A   42C    P0            252W / 1000W |  154480MiB / 183359MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA B200                    On  |   00000000:E3:00.0 Off |                  Off |
| N/A   35C    P0            244W / 1000W |  154480MiB / 183359MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA B200                    On  |   00000000:F3:00.0 Off |                  Off |
| N/A   41C    P0            245W / 1000W |  154160MiB / 183359MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         1431234      C   /usr/bin/python                         684MiB |
|    0   N/A  N/A         1432070      C   /usr/bin/python                       15440... |
|    1   N/A  N/A         1432071      C   /usr/bin/python                       15446... |
|    2   N/A  N/A         1432072      C   /usr/bin/python                       15446... |
|    3   N/A  N/A         1432073      C   /usr/bin/python                       15446... |
|    4   N/A  N/A         1432074      C   /usr/bin/python                       15446... |
|    5   N/A  N/A         1432075      C   /usr/bin/python                       15446... |
|    6   N/A  N/A         1432076      C   /usr/bin/python                       15446... |
|    7   N/A  N/A         1432077      C   /usr/bin/python                       15414... |
+-----------------------------------------------------------------------------------------+

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions