Skip to content

trtllm-serve without any output Qwne2.5-7b #2667

Open
@Justin-12138

Description

@Justin-12138

System Info

-CPU: Intel Xeon Platinum 8352V (144) @ 3.500GHz X86
-Memory: 1031689MiB
-GPU:RTX-4090*8
-Librarys
tensorrt 10.7.0
tensorrt_cu12 10.7.0
tensorrt-cu12-bindings 10.7.0
tensorrt-cu12-libs 10.7.0
tensorrt-llm 0.16.0
nvidia driver version
Driver Version: 550.135 CUDA Version: 12.4
OS Ubuntu 22.04.5 LTS x86_64

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I ran the trtllm-serve comman like:
trtllm-serve /home/lz/tensorrt/build/Qwen2.5-7B-Instructtrt_engines/weight_only/1-gpu
--tokenizer /home/lz/tensorrt/models/Qwen2.5-7B-Instruct
--max_batch_size 128 --max_num_tokens 4096 --max_seq_len 4096
--kv_cache_free_gpu_memory_fraction 0.95

But there is no output except the:
[TensorRT-LLM] TensorRT-LLM version: 0.16.0

No errors,no warnings no Port occupation
Image

But it ran well with the test:
python3 /home/lz/TensorRT-LLM/examples/run.py --input_text "你好,请问你叫什么?"
--max_output_len=50
--tokenizer_dir /home/lz/tensorrt/models/Qwen2.5-7B-Instruct
--engine_dir=/home/lz/tensorrt/build/Qwen2.5-7B-Instructtrt_engines/weight_only/1-gpu
Image

What Can I do to run an OpenAI API compatible server

Expected behavior

Does it should output somemore info?

actual behavior

Nnothing but version

additional notes

Is that a problem with Qwen2.5-7b
I 'd appreciate if you guys could give me some help

Metadata

Metadata

Assignees

No one assigned

    Labels

    OpenAI APItrtllm-serve's OpenAI-compatible API: endpoint behavior, req/resp formats, feature parity.bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions