Skip to content

trtllm-build llama3.1-8b failed #2688

Open
@765500005

Description

@765500005

trtllm-build --checkpoint_dir ./tllm_checkpoint_2gpu_tp2
--output_dir ./tmp/llama/7B/trt_engines/fp16/2-gpu/
--context_fmha enable
--remove_input_padding enable
--gpus_per_node 8
--gemm_plugin auto

[TRT] [E] IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/transformer/layers/0/attention/wrapper_L562/gpt_attention_L5483/PLUGIN_V2_GPTAttention_0 requires 210571452800 bytes of scratch space, but only 47697362944 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
)

I have 8 46GB GPUs, but this error occurs. Is this issue related to increasing the workspace size? How can I increase it?

Metadata

Metadata

Assignees

Labels

InvestigatingLLM API/WorkflowHigh-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions