This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form
conda install -r requirements.txt
Then, in this repository
pip install -e .
To run:
torchrun --nproc_per_node $MP example.py --ckpt_dir $TARGET_FOLDER/model_size --tokenizer_path $TARGET_FOLDER/tokenizer.model
Different models require different MP values:
Model | MP |
---|---|
7B | 1 |
13B | 2 |
33B | 4 |
65B | 8 |
See MODEL_CARD.md
See the LICENSE file.