The example presents a simple Linear model implemented in PyTorch
Example consists of following scripts:
server.py- start the model with Triton Inference Serverclient.py- execute HTTP/gRPC requests to the deployed model
The example requires the torch package. It can be installed in your current environment using pip:
pip install torchOr you can use NVIDIA PyTorch container:
docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:24.08-py3 bashIf you select to use container we recommend to install NVIDIA Container Toolkit.
The step-by-step guide:
- Install PyTriton following the installation instruction
- In current terminal start the model on Triton using
server.py
./server.py- Open new terminal tab (ex.
Ctrl + Ton Ubuntu) or window - Go to the example directory
- Run the
client.pyto perform queries on model:
./client.py