Text Generation Inference is a Rust, Python and gRPC server for text generation inference.
This allows you to run Hugging Face Hub models and other LLMs on your own infrastructure.
- Set up the Text Generation Inference server.
- Download the aifile and load it with ownAI (in ownAI, click on the logo in the upper left corner to open the menu, then select "AI Workshop", then "New AI" and "Load Aifile").
- Set the
inference_server_url
setting in the aifile to the URL of your server.
These AIs are running on your own machine or on a server where you install the inference server.