LLaMA is a Large Language Model developed by Meta AI.
It was trained on more tokens than previous models. The result is that the smallest version with 7 billion parameters has similar performance to GPT-3 with 175 billion parameters.
This guide will cover usage through the official transformers implementation. For 4-bit mode, head over to GPTQ models (4 bit mode)
.
- Torrent: oobabooga#530 (comment)
- Direct download: https://huggingface.co/Neko-Institute-of-Science
python download-model.py oobabooga/llama-tokenizer
Once downloaded, it will be automatically applied to every LlamaForCausalLM model that you try to load.
- Install the
protobuflibrary:
pip install protobuf==3.20.1
- Use the script below to convert the model in
.pthformat that you, a fellow academic, downloaded using Meta's official link:
python convert_llama_weights_to_hf.py --input_dir /path/to/LLaMA --model_size 7B --output_dir /tmp/outputs/llama-7b
- Move the
llama-7bfolder inside yourtext-generation-webui/modelsfolder.
python server.py --model llama-7b