Error when building the TRT engine on InternVL2 examples

### System Info

NVIDIA Driver Version: 550.54.15
CUDA Version: 12.4
GPU: NVIDIA A100-SXM4-40GB
System: Linux (Ubuntu)

### Who can help?

When I am trying to build the TRT engine on the multimodal examples for InternVL2, I get the following error:
```
No protocol specified
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
[01/10/2025-10:16:12] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gemm_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set nccl_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set lora_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set moe_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set low_latency_gemm_swiglu_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set context_fmha to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set remove_input_padding to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set reduce_fusion to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set user_buffer to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set tokens_per_block to 64.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set multiple_profiles to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set paged_state to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set streamingllm to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_fused_mlp to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set pp_reduce_scatter to False.
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.original_max_position_embeddings = 4096
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.longrope_scaling_short_factors = [1.05, 1.05, 1.05, 1.1, 1.1, 1.1500000000000001, 1.2000000000000002, 1.2500000000000002, 1.3000000000000003, 1.3500000000000003, 1.5000000000000004, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.0500000000000007, 2.0500000000000007, 2.0500000000000007, 2.1000000000000005, 2.1000000000000005, 2.1000000000000005, 2.1500000000000004, 2.1500000000000004, 2.3499999999999996, 2.549999999999999, 2.5999999999999988, 2.5999999999999988, 2.7499999999999982, 2.849999999999998, 2.849999999999998, 2.9499999999999975]
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.longrope_scaling_long_factors = [1.0299999713897705, 1.0499999523162842, 1.0499999523162842, 1.0799999237060547, 1.2299998998641968, 1.2299998998641968, 1.2999999523162842, 1.4499999284744263, 1.5999999046325684, 1.6499998569488525, 1.8999998569488525, 2.859999895095825, 3.68999981880188, 5.419999599456787, 5.489999771118164, 5.489999771118164, 9.09000015258789, 11.579999923706055, 15.65999984741211, 15.769999504089355, 15.789999961853027, 18.360000610351562, 21.989999771118164, 23.079999923706055, 30.009998321533203, 32.35000228881836, 32.590003967285156, 35.56000518798828, 39.95000457763672, 53.840003967285156, 56.20000457763672, 57.95000457763672, 59.29000473022461, 59.77000427246094, 59.920005798339844, 61.190006256103516, 61.96000671386719, 62.50000762939453, 63.3700065612793, 63.48000717163086, 63.48000717163086, 63.66000747680664, 63.850006103515625, 64.08000946044922, 64.760009765625, 64.80001068115234, 64.81001281738281, 64.81001281738281]
[01/10/2025-10:16:13] [TRT-LLM] [W] Provided but not required tensors: {'long_rope_rotary_inv_freq', 'embed_positions', 'rotary_inv_freq', 'embed_positions_for_gpt_attention', 'long_rope_embed_positions', 'long_rope_embed_positions_for_gpt_attention'}
[01/10/2025-10:16:13] [TRT-LLM] [I] Set dtype to float16.
[01/10/2025-10:16:13] [TRT-LLM] [I] Set paged_kv_cache to True.
[01/10/2025-10:16:13] [TRT-LLM] [W] Overriding paged_state to False
[01/10/2025-10:16:13] [TRT-LLM] [I] Set paged_state to False.
[01/10/2025-10:16:13] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width. 

[01/10/2025-10:16:13] [TRT-LLM] [W] max_num_tokens (4608) shouldn't be greater than max_seq_len * max_batch_size (4608), specifying to max_seq_len * max_batch_size (4608).
[01/10/2025-10:16:13] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[01/10/2025-10:16:13] [TRT] [I] [MemUsageChange] Init CUDA: CPU -17, GPU +0, now: CPU 207, GPU 561 (MiB)
[01/10/2025-10:16:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2039, GPU +374, now: CPU 2351, GPU 935 (MiB)
[01/10/2025-10:16:16] [TRT-LLM] [I] Set nccl_plugin to None.
[01/10/2025-10:16:17] [TRT-LLM] [I] Total time of constructing network from module object 4.069404602050781 seconds
[01/10/2025-10:16:17] [TRT-LLM] [I] Total optimization profiles added: 1
Traceback (most recent call last):
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 627, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
    engine = build_model(build_config,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 383, in build_model
    return build(model, build_config)
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 1264, in build
    engine = None if build_config.dry_run else builder.build_engine(
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/_common.py", line 220, in decorated
    return f(*args, **kwargs)
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 412, in build_engine
    if not param.set_name(name, network):
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/parameter.py", line 227, in set_name
    return network.trt_network.set_weights_name(
TypeError: set_weights_name(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt_bindings.tensorrt.INetworkDefinition, weights: tensorrt_bindings.tensorrt.Weights, name: str) -> bool

Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7f6a29b30f30>, array([0., 0., 0., ..., 0., 0., 0.], shape=(12582912,), dtype=float32), 'embed_positions'
```

It seems to occur for both `4b` and `8b` versions.

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```
cd examples
pip install transformers==4.37.2
export MODEL_NAME="InternVL2-4B"
git lfs clone https://huggingface.co/OpenGVLab/${MODEL_NAME} tmp/hf_models/${MODEL_NAME}
export LLM_MODEL_NAME="phi"

python ${LLM_MODEL_NAME}/convert_checkpoint.py --model_dir tmp/hf_models/${MODEL_NAME} --output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu  --dtype float16

trtllm-build \
    --checkpoint_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \
    --output_dir tmp/trt_engines/${MODEL_NAME}/fp16/1-gpu \
    --gemm_plugin auto \
    --max_batch_size 1 \
    --max_input_len 4096 \
    --max_seq_len 4608 \
    --max_multimodal_len 3328
```

### Expected behavior

The engine should build.

### actual behavior

A `TypeError` arises, that has to do with `embed_positions`.

### additional notes

I am also using poetry if that matters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error when building the TRT engine on InternVL2 examples #2679

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when building the TRT engine on InternVL2 examples #2679

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions