Assuming you have a trained model, first convert it to torchscript format.
python 01_model_to_tscript.py
This will generate a traced_model.pt
file.
Then, convert it to MAR format
# pip install torch-model-archiver
# git clone https://github.com/pytorch/serve
# cp cifar34_handler.py ts/torch_handler/
torch-model-archiver --model-name cifar34 --version 1.0 --serialized-file ./traced_model.pt --handler ./ts/torch_handler/cifar34_handler.py --extra-files ./index_to_name.json
This will generate a cifar34.mar
file.
We will then upload the model file to s3 so it can be retrieved during inference.
├── config
│ ├── config.properties
├── model-store
│ └── cifar34.mar
Replace the S3 bucket name in intel-service.yaml
Copy the cifar34.mar
file inside the docker
folder
cp cifar34.mar docker/
docker build -t torchserve:01 .
docker run -it --rm --net=host torchserve:01
# see available models
curl http://localhost:8081/models
# download sample image
https://raw.githubusercontent.com/kshitijzutshi/INFO6105-CNN-Assignment/main/Intel-image-classification-dataset/seg_train/seg_train/mountain/153.jpg
curl http://127.0.0.1:8080/predictions/cifar34 -T 153.jpg
This will return top 5 predictions. This confirms that our MAR file has not only been successfully generated but is also functional.
Note The ports for torchserve are different in non-kserve and in-kserve environment.
# kubectl
curl -LO https://dl.k8s.io/release/v1.25.1/bin/linux/amd64/kubectl
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# kind
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.17.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
# create cluster
kind create cluster --image kindest/node:v1.24.1 --name kind
kind get clusters
kubectl cluster-info --context kind-kind
kubectl config use-context kind-kind
# verify
kubectl get service
kubectl get pod
kubectl get deployment
# kserve
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.9/hack/quick_install.sh" | bash
kubectl get namespaces
kubectl get pods -n kserve
kubectl apply -f intel-service.yaml
kubectl get pods
kubectl get isvc # once all are running
If you have a pre-processor (aka transformer), port-forward to istio
and not the predictor pod since the data first needs to be preprocessed.
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
The latter will give incorrect results.
Else, if you don't have a transformer, you can either port-forward to istio or to the predictor pod
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
# OR
kubectl port-forward intel-predictor-default-00001-deployment-5bf7f9b9f6-mlgcv 8080:8080
You can now fetch predictions using a python script.
python test.py
or using a curl command
curl http://localhost:8080/v1/models/cifar34:predict -d @./input.json
whereas the file input.json
can be generated via
python3 img2bytearray.py 153.jpg