Open
Description
System Info
NVIDIA Driver Version: 550.54.15
CUDA Version: 12.4
GPU: NVIDIA A100-SXM4-40GB
System: Linux (Ubuntu)
Who can help?
When I am trying to build the TRT engine on the multimodal examples for InternVL2, I get the following error:
No protocol specified
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
[01/10/2025-10:16:12] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gemm_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set nccl_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set lora_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set moe_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set low_latency_gemm_swiglu_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set context_fmha to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set remove_input_padding to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set reduce_fusion to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set user_buffer to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set tokens_per_block to 64.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set multiple_profiles to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set paged_state to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set streamingllm to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_fused_mlp to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set pp_reduce_scatter to False.
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.original_max_position_embeddings = 4096
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.longrope_scaling_short_factors = [1.05, 1.05, 1.05, 1.1, 1.1, 1.1500000000000001, 1.2000000000000002, 1.2500000000000002, 1.3000000000000003, 1.3500000000000003, 1.5000000000000004, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.0500000000000007, 2.0500000000000007, 2.0500000000000007, 2.1000000000000005, 2.1000000000000005, 2.1000000000000005, 2.1500000000000004, 2.1500000000000004, 2.3499999999999996, 2.549999999999999, 2.5999999999999988, 2.5999999999999988, 2.7499999999999982, 2.849999999999998, 2.849999999999998, 2.9499999999999975]
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.longrope_scaling_long_factors = [1.0299999713897705, 1.0499999523162842, 1.0499999523162842, 1.0799999237060547, 1.2299998998641968, 1.2299998998641968, 1.2999999523162842, 1.4499999284744263, 1.5999999046325684, 1.6499998569488525, 1.8999998569488525, 2.859999895095825, 3.68999981880188, 5.419999599456787, 5.489999771118164, 5.489999771118164, 9.09000015258789, 11.579999923706055, 15.65999984741211, 15.769999504089355, 15.789999961853027, 18.360000610351562, 21.989999771118164, 23.079999923706055, 30.009998321533203, 32.35000228881836, 32.590003967285156, 35.56000518798828, 39.95000457763672, 53.840003967285156, 56.20000457763672, 57.95000457763672, 59.29000473022461, 59.77000427246094, 59.920005798339844, 61.190006256103516, 61.96000671386719, 62.50000762939453, 63.3700065612793, 63.48000717163086, 63.48000717163086, 63.66000747680664, 63.850006103515625, 64.08000946044922, 64.760009765625, 64.80001068115234, 64.81001281738281, 64.81001281738281]
[01/10/2025-10:16:13] [TRT-LLM] [W] Provided but not required tensors: {'long_rope_rotary_inv_freq', 'embed_positions', 'rotary_inv_freq', 'embed_positions_for_gpt_attention', 'long_rope_embed_positions', 'long_rope_embed_positions_for_gpt_attention'}
[01/10/2025-10:16:13] [TRT-LLM] [I] Set dtype to float16.
[01/10/2025-10:16:13] [TRT-LLM] [I] Set paged_kv_cache to True.
[01/10/2025-10:16:13] [TRT-LLM] [W] Overriding paged_state to False
[01/10/2025-10:16:13] [TRT-LLM] [I] Set paged_state to False.
[01/10/2025-10:16:13] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width.
[01/10/2025-10:16:13] [TRT-LLM] [W] max_num_tokens (4608) shouldn't be greater than max_seq_len * max_batch_size (4608), specifying to max_seq_len * max_batch_size (4608).
[01/10/2025-10:16:13] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[01/10/2025-10:16:13] [TRT] [I] [MemUsageChange] Init CUDA: CPU -17, GPU +0, now: CPU 207, GPU 561 (MiB)
[01/10/2025-10:16:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2039, GPU +374, now: CPU 2351, GPU 935 (MiB)
[01/10/2025-10:16:16] [TRT-LLM] [I] Set nccl_plugin to None.
[01/10/2025-10:16:17] [TRT-LLM] [I] Total time of constructing network from module object 4.069404602050781 seconds
[01/10/2025-10:16:17] [TRT-LLM] [I] Total optimization profiles added: 1
Traceback (most recent call last):
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/bin/trtllm-build", line 8, in <module>
sys.exit(main())
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 627, in main
parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
engine = build_model(build_config,
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 383, in build_model
return build(model, build_config)
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 1264, in build
engine = None if build_config.dry_run else builder.build_engine(
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/_common.py", line 220, in decorated
return f(*args, **kwargs)
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 412, in build_engine
if not param.set_name(name, network):
File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/parameter.py", line 227, in set_name
return network.trt_network.set_weights_name(
TypeError: set_weights_name(): incompatible function arguments. The following argument types are supported:
1. (self: tensorrt_bindings.tensorrt.INetworkDefinition, weights: tensorrt_bindings.tensorrt.Weights, name: str) -> bool
Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7f6a29b30f30>, array([0., 0., 0., ..., 0., 0., 0.], shape=(12582912,), dtype=float32), 'embed_positions'
It seems to occur for both 4b
and 8b
versions.
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
cd examples
pip install transformers==4.37.2
export MODEL_NAME="InternVL2-4B"
git lfs clone https://huggingface.co/OpenGVLab/${MODEL_NAME} tmp/hf_models/${MODEL_NAME}
export LLM_MODEL_NAME="phi"
python ${LLM_MODEL_NAME}/convert_checkpoint.py --model_dir tmp/hf_models/${MODEL_NAME} --output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu --dtype float16
trtllm-build \
--checkpoint_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \
--output_dir tmp/trt_engines/${MODEL_NAME}/fp16/1-gpu \
--gemm_plugin auto \
--max_batch_size 1 \
--max_input_len 4096 \
--max_seq_len 4608 \
--max_multimodal_len 3328
Expected behavior
The engine should build.
actual behavior
A TypeError
arises, that has to do with embed_positions
.
additional notes
I am also using poetry if that matters.