Description
System Info
- CPU: x86
- GPU: 2xL40S
- Memory: 256GB
- System: Ubuntu 22.04
- Docker Image: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
- TensorRT-LLM version: 0.16.0
Who can help?
I have tested the examples under examples/multimodal. But when I try to convert the Qwen2-VL-7B to checkpoint via python3 ../qwen/convert_checkpoint.py --model_dir Qwen2-VL-7B-Instruct \ --output_dir trt_models/Qwen2-VL-7B-Instruct/fp16/1-gpu \ --dtype float16
, I got the error Unrecognized keys in
rope_scaling for 'rope_type'='default': {'mrope_section'}
, seems the Qwen2-VL is not supported. Is it due to the docker image I used or I have build the trtllm from the source?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- Cd to examples/multimodal
- Run
python3 ../qwen/convert_checkpoint.py --model_dir Qwen2-VL-7B-Instruct \ --output_dir trt_models/Qwen2-VL-7B-Instruct/fp16/1-gpu \ --dtype float16
Expected behavior
Got trt_models/Qwen2-VL-7B-Instruct/fp16/1-gpu without any errors.
actual behavior
Got error log:
root@04292e29d243:/workspace/TensorRT-LLM/examples/multimodal# python3 ../qwen/convert_checkpoint.py --model_dir Qwen2-VL-7B-Instruct \ --output_dir trt_models/Qwen2-VL-7B-Instruct/fp16/1-gpu \ --dtype float16 2025-01-03 11:20:24.426668: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0. 2025-01-03 11:20:24.441389: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1735903224.456763 2272 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1735903224.461320 2272 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2025-01-03 11:20:24.477010: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. [TensorRT-LLM] TensorRT-LLM version: 0.16.0 0.16.0 Unrecognized keys in
rope_scalingfor 'rope_type'='default': {'mrope_section'} Unrecognized keys in
rope_scalingfor 'rope_type'='default': {'mrope_section'} Traceback (most recent call last): File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/functional.py", line 656, in from_string return RotaryScalingType[s] ~~~~~~~~~~~~~~~~~^^^ File "/usr/lib/python3.12/enum.py", line 814, in __getitem__ return cls._member_map_[name] ~~~~~~~~~~~~~~~~^^^^^^ KeyError: 'default'
additional notes
I have tried Phi-3 vision, Qwen2-7B-instruct as well, both of them works.