An illustration how to setup a local LLM server with OpenWEB UI interface. Run 30B, 235B, and 480B parameter models locally on you workstation!
- Intel/AMD system with at least 24 cores
- 256 GB DDR5
- NVIDIA 5090 w/ 32GB VRAM
- 1.5TB free storage space for models
System setup with:
- kernel drivers for NVIDIA GPU
- docker with NVIDIA-Runtime support
- Python environment
- Download ~1.5TB of LLM models:
$ pip install -r requirements.txt
$ python ./download_models.py - Run LLM server
$ docker compose up -d- Connect to web UI at http://localhost:3000 (a control panel for llama-swap will be at http://localhost:8080)