-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
bugSomething isn't workingSomething isn't workingstaletriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
Following instructions at https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md
I tried a bunch of different models and they all fail in run.py
on:
Traceback (most recent call last):
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/../run.py", line 874, in <module>
main(args)
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/../run.py", line 681, in main
runner = runner_cls.from_dir(**runner_kwargs)
File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 311, in from_dir
assert max_batch_size <= model_config.max_batch_size
AssertionError
Traceback (most recent call last):
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/../run.py", line 874, in <module>
main(args)
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/../run.py", line 681, in main
runner = runner_cls.from_dir(**runner_kwargs)
File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 311, in from_dir
assert max_batch_size <= model_config.max_batch_size
AssertionError
here is a full repro:
git clone https://github.com/NVIDIA/TensorRT-LLM/
cd TensorRT-LLM/examples/llama
pip install -r requirements.txt
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
python convert_checkpoint.py --model_dir Llama-2-7b-chat-hf \
--output_dir Llama-2-7b-chat-hf_2gpu_tp2 \
--dtype float16 \
--tp_size 2
trtllm-build --checkpoint_dir Llama-2-7b-chat-hf_2gpu_tp2 \
--output_dir trt_engines/Llama-2-7b-chat-hf_2gpu_tp2 \
--gemm_plugin auto
wget https://www.gutenberg.org/cache/epub/64317/pg64317.txt
awk '{printf "%s\\n", $0} END {printf "\\nSummarize this story:"}' pg64317.txt > pg64317_sanitized.txt
mpirun -n 2 --allow-run-as-root --oversubscribe \
python ../run.py \
--max_output_len 128 \
--max_input_length 32768 \
--input_file pg64317_sanitized.txt \
--engine_dir trt_engines/Llama-2-7b-chat-hf_2gpu_tp2 \
--tokenizer_dir Llama-2-7b-chat-hf
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaletriagedIssue has been triaged by maintainersIssue has been triaged by maintainers