Description
I built the engine, and had two separate LoRA layers with the base llama3.1 model. The output from the build is rank0.engine, config.json, and then a lora folder with the following structure:
lora
|
|>0
| |_> adapter_config.json
| |> adapter_model.safetensors
|
|>1
| |> adapter_config.json
|__ |> adapter_model.safetensors
Is this expected? I figured there would be rank engines? I passed these in the lora directory on the engine build:
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_tp1 --output_dir /opt/tensorrt_llm_engine --gemm_plugin auto --lora_plugin auto --max_batch_size 8 --max_input_len 512 --max_seq_len 562 --lora_dir "/opt/lora_1" "/opt/lora_2" --max_lora_rank 8 --lora_target_modules attn_q attn_k attn_v
Any advice is appreciated.