Enterprise-Inference/docs/deploy-llm-model-from-hugging-face.md at main · opea-project/Enterprise-Inference · GitHub

22 lines (16 loc) · 871 Bytes

Deploy LLM Model from Hugging Face

This option allows you to deploy any Hugging Face-hosted LLM on the Inference Cluster using its model ID.

To deploy:

Run the deployment script:
```
bash ~/core/inference-stack-deploy.sh
```
Choose the following options from the menu:
- 3 – Update Deployed Inference Cluster
- 2 – Manage LLM Models
- 4 – Deploy Model from Hugging Face
When prompted, provide:
- Hugging Face Model ID (e.g., meta-llama/Meta-Llama-3-8B)
- Model Deployment Name (e.g., metallama-8b)
- Tensor Parallel Size (based on available Intel® AI Accelerator cards)

Note: This deploys a model that has not been pre-validated. Make sure the tensor parallel size is configured correctly. An incorrect value can result in the model being stuck in a "not ready" state.