Skip to content

llama examples fail to run #2303

@stas00

Description

@stas00

Following instructions at https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md
I tried a bunch of different models and they all fail in run.py on:

Traceback (most recent call last):
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/../run.py", line 874, in <module>
    main(args)
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/../run.py", line 681, in main
    runner = runner_cls.from_dir(**runner_kwargs)
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 311, in from_dir
    assert max_batch_size <= model_config.max_batch_size
AssertionError
Traceback (most recent call last):
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/../run.py", line 874, in <module>
    main(args)
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/../run.py", line 681, in main
    runner = runner_cls.from_dir(**runner_kwargs)
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 311, in from_dir
    assert max_batch_size <= model_config.max_batch_size
AssertionError

here is a full repro:

git clone https://github.com/NVIDIA/TensorRT-LLM/
cd TensorRT-LLM/examples/llama
pip install -r requirements.txt
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
python convert_checkpoint.py --model_dir Llama-2-7b-chat-hf \
                            --output_dir Llama-2-7b-chat-hf_2gpu_tp2 \
                            --dtype float16 \
                            --tp_size 2
trtllm-build --checkpoint_dir Llama-2-7b-chat-hf_2gpu_tp2 \
            --output_dir trt_engines/Llama-2-7b-chat-hf_2gpu_tp2 \
            --gemm_plugin auto

wget https://www.gutenberg.org/cache/epub/64317/pg64317.txt
awk '{printf "%s\\n", $0} END {printf "\\nSummarize this story:"}' pg64317.txt > pg64317_sanitized.txt

mpirun -n 2 --allow-run-as-root --oversubscribe \
    python ../run.py \
    --max_output_len 128 \
    --max_input_length 32768 \
    --input_file pg64317_sanitized.txt \
    --engine_dir  trt_engines/Llama-2-7b-chat-hf_2gpu_tp2 \
    --tokenizer_dir Llama-2-7b-chat-hf

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaletriagedIssue has been triaged by maintainers

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions