diff --git a/README.md b/README.md
index 43da7e9fe..f5d1dec83 100644
--- a/README.md
+++ b/README.md
@@ -206,6 +206,7 @@ Practical deployment and model usage guides for Nemotron models.
 |-------|----------|--------------|-----------|
 | [**Nemotron 3 Super 120B A12B**](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16) | Production deployments needing strong reasoning | 1M context, in NVFP4 single B200, RAG & tool calling | [Cookbooks](./usage-cookbook/Nemotron-3-Super) |
 | [**Nemotron 3 Nano 30B A3B**](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) | Resource-constrained environments | 1M context, sparse MoE hybrid Mamba-2, controllable reasoning | [Cookbooks](./usage-cookbook/Nemotron-3-Nano) |
+| [**Llama-3.1-Nemotron-Nano-8B-v1**](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1) | Small-footprint OCI deployments | Validated on private OKE in Phoenix with `vLLM`, OCI Bastion service, tool calling, and OpenAI-compatible `/v1` inference; provides a reproducible OCI path comparable to common AWS GPU/Kubernetes deployment patterns | [Cookbooks](./usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1) |
 | [**NVIDIA-Nemotron-Nano-12B-v2-VL**](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL) | Document intelligence and video understanding | 12B VLM, video reasoning, Efficient Video Sampling | [Cookbooks](./usage-cookbook/Nemotron-Nano2-VL/) |
 | [**Llama-3.1-Nemotron-Safety-Guard-8B-v3**](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3) | Multilingual content moderation | 9 languages, 23 safety categories | [Cookbooks](./usage-cookbook/Llama-3.1-Nemotron-Safety-Guard-V3/) |
 | **Nemotron-Parse** | Document parsing for RAG and AI agents | Table extraction, semantic segmentation | [Cookbooks](./usage-cookbook/Nemotron-Parse-v1.1/) |
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/README.md b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/README.md
new file mode 100644
index 000000000..9f669737b
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/README.md
@@ -0,0 +1,874 @@
+# Llama-3.1-Nemotron-Nano-8B-v1 on OCI OKE (Private Deployment)
+
+This cookbook documents a validated private deployment of
+`nvidia/Llama-3.1-Nemotron-Nano-8B-v1` on **Oracle Cloud Infrastructure (OCI)**
+using a private OKE cluster, a single `VM.GPU.A10.1` worker, and `vLLM` with an
+OpenAI-compatible `/v1` endpoint.
+
+Based on the [Deploy OpenAI vLLM Production Stack on OKE](https://docs.oracle.com/en/learn/deploy-vllm-production-stack-oke/index.html)
+guide, customized for the Nemotron model with tool calling support.
+
+## Tested environment
+
+- Region: `us-phoenix-1`
+- Kubernetes: OKE v1.31.10, enhanced cluster
+- GPU shape: `VM.GPU.A10.1` (NVIDIA A10, 24 GB)
+- CPU shape: `VM.Standard.E5.Flex`
+- Model: `nvidia/Llama-3.1-Nemotron-Nano-8B-v1`
+- Serving stack: `vLLM v0.19.0`
+- Helm chart: `vllm/vllm-stack` 0.1.10
+- Inference API: OpenAI-compatible `/v1`
+
+## Validated capabilities
+
+- Chat completion
+- Tool / function calling
+- Streaming
+- Async / concurrent requests
+- OpenAI-compatible model discovery via `/v1/models`
+
+## Prerequisites
+
+- OCI tenancy with GPU capacity (`VM.GPU.A10.1`)
+- `oci` CLI configured with a valid profile
+- `kubectl`, `helm`, `ssh`, `jq`
+- An SSH key pair (e.g., `~/.ssh/id_ed25519`)
+
+**Note:** The NVIDIA device plugin is pre-installed on OKE enhanced clusters.
+No manual installation is required.
+
+## Architecture
+
+```
+                          ┌─────────────────────────────────────────────────┐
+                          │                  VCN 10.0.0.0/16               │
+  You ──SSH tunnel──►     │                                                │
+  (localhost:6443)        │  ┌──────────┐     ┌──────────────────────────┐  │
+          │               │  │ Bastion  │     │   API subnet (private)   │  │
+          │               │  │ subnet   │────►│   OKE control plane      │  │
+          │               │  │ (public) │     │   :6443                  │  │
+          ▼               │  └──────────┘     └──────────────────────────┘  │
+  kubectl / curl          │                                                │
+                          │  ┌──────────────────────────────────────────┐  │
+                          │  │         Worker subnet (private)          │  │
+                          │  │                                          │  │
+                          │  │  ┌─────────────┐  ┌──────────────────┐  │  │
+                          │  │  │ CPU node    │  │ GPU node (A10)   │  │  │
+                          │  │  │ router pod  │  │ Nemotron engine  │  │  │
+                          │  │  └─────────────┘  └──────────────────┘  │  │
+                          │  └──────────────────────────────────────────┘  │
+                          └─────────────────────────────────────────────────┘
+```
+
+## Step 1: Set environment variables
+
+```bash
+export OCI_COMPARTMENT_ID="<your-compartment-ocid>"
+export OCI_REGION="us-phoenix-1"
+export OCI_PROFILE="DEFAULT"          # adjust to your OCI CLI profile
+export CLUSTER_NAME="nemotron-phx"
+export KUBERNETES_VERSION="v1.31.10"
+```
+
+## Step 2: Create VCN and networking
+
+```bash
+VCN_ID=$(oci network vcn create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --display-name "${CLUSTER_NAME}-vcn" \
+    --cidr-blocks '["10.0.0.0/16"]' \
+    --dns-label "nemotron" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+IGW_ID=$(oci network internet-gateway create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-igw" \
+    --is-enabled true \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+NAT_ID=$(oci network nat-gateway create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-nat" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+SGW_SERVICE_ID=$(oci network service list \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data[?contains(name, 'All') && contains(name, 'Services')].id | [0]" \
+    --raw-output)
+
+SGW_SERVICE_NAME=$(oci network service list \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data[?contains(name, 'All') && contains(name, 'Services')].\"cidr-block\" | [0]" \
+    --raw-output)
+
+SGW_ID=$(oci network service-gateway create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-sgw" \
+    --services "[{\"serviceId\": \"${SGW_SERVICE_ID}\"}]" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+PRIVATE_RT_ID=$(oci network route-table create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-private-rt" \
+    --route-rules "[
+        {\"cidrBlock\": \"0.0.0.0/0\", \"networkEntityId\": \"${NAT_ID}\"},
+        {\"destination\": \"${SGW_SERVICE_NAME}\", \"destinationType\": \"SERVICE_CIDR_BLOCK\", \"networkEntityId\": \"${SGW_ID}\"}
+    ]" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+PUBLIC_RT_ID=$(oci network route-table create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-public-rt" \
+    --route-rules "[{\"cidrBlock\": \"0.0.0.0/0\", \"networkEntityId\": \"${IGW_ID}\"}]" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+SL_ID=$(oci network security-list create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-sl" \
+    --egress-security-rules '[{"destination": "0.0.0.0/0", "protocol": "all", "isStateless": false}]' \
+    --ingress-security-rules '[
+        {"source": "0.0.0.0/0", "protocol": "6", "isStateless": false, "tcpOptions": {"destinationPortRange": {"min": 22, "max": 22}}},
+        {"source": "10.0.0.0/16", "protocol": "all", "isStateless": false},
+        {"source": "10.244.0.0/16", "protocol": "all", "isStateless": false},
+        {"source": "10.96.0.0/16", "protocol": "all", "isStateless": false},
+        {"source": "0.0.0.0/0", "protocol": "1", "isStateless": false, "icmpOptions": {"type": 3, "code": 4}}
+    ]' \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+```
+
+Create four subnets:
+
+```bash
+API_SUBNET_ID=$(oci network subnet create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-api-subnet" \
+    --cidr-block "10.0.0.0/28" \
+    --route-table-id "${PRIVATE_RT_ID}" \
+    --security-list-ids "[\"${SL_ID}\"]" \
+    --dns-label "kubeapi" \
+    --prohibit-public-ip-on-vnic true \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+WORKER_SUBNET_ID=$(oci network subnet create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-worker-subnet" \
+    --cidr-block "10.0.10.0/24" \
+    --route-table-id "${PRIVATE_RT_ID}" \
+    --security-list-ids "[\"${SL_ID}\"]" \
+    --dns-label "workers" \
+    --prohibit-public-ip-on-vnic true \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+LB_SUBNET_ID=$(oci network subnet create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-lb-subnet" \
+    --cidr-block "10.0.20.0/24" \
+    --route-table-id "${PUBLIC_RT_ID}" \
+    --security-list-ids "[\"${SL_ID}\"]" \
+    --dns-label "loadbalancers" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+BASTION_SUBNET_ID=$(oci network subnet create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --vcn-id "${VCN_ID}" \
+    --display-name "${CLUSTER_NAME}-bastion-subnet" \
+    --cidr-block "10.0.30.0/24" \
+    --route-table-id "${PUBLIC_RT_ID}" \
+    --security-list-ids "[\"${SL_ID}\"]" \
+    --dns-label "bastion" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+```
+
+## Step 3: Create private OKE cluster
+
+```bash
+oci ce cluster create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --name "${CLUSTER_NAME}" \
+    --vcn-id "${VCN_ID}" \
+    --kubernetes-version "${KUBERNETES_VERSION}" \
+    --endpoint-subnet-id "${API_SUBNET_ID}" \
+    --service-lb-subnet-ids "[\"${LB_SUBNET_ID}\"]" \
+    --endpoint-public-ip-enabled false \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+```
+
+Wait for the cluster to become ACTIVE (~10 minutes):
+
+```bash
+# Poll until ACTIVE
+CLUSTER_ID=$(oci ce cluster list \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --name "${CLUSTER_NAME}" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query 'data[0].id' --raw-output)
+
+watch -n 30 "oci ce cluster get --cluster-id ${CLUSTER_ID} \
+    --profile ${OCI_PROFILE} --region ${OCI_REGION} \
+    --query 'data.\"lifecycle-state\"' --raw-output"
+```
+
+**Do not proceed to Step 5 until the cluster is ACTIVE.**
+
+## Step 4: Create OCI Bastion
+
+The bastion is placed on the public bastion subnet so the OCI Bastion managed
+service can accept inbound SSH connections. The port-forwarding session then
+tunnels traffic to the private API endpoint over VCN-internal routing.
+
+> Known issue on OpenSSH 10.x (macOS 15+, some recent Linux): port-forwarding
+> sessions close immediately after auth. If `ssh -V` reports 10.x, use
+> [Appendix A](#appendix-a-jump-host-vm-alternative-openssh-10x) instead.
+
+```bash
+BASTION_ID=$(oci bastion bastion create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --bastion-type STANDARD \
+    --target-subnet-id "${BASTION_SUBNET_ID}" \
+    --name "${CLUSTER_NAME}-bastion" \
+    --client-cidr-list "[\"$(curl -s https://ifconfig.me)/32\"]" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+```
+
+## Step 5: Create node pools
+
+Find the GPU-compatible node image and create both pools:
+
+```bash
+GPU_IMAGE_ID=$(oci ce node-pool-options get \
+    --node-pool-option-id all \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.sources[?contains(\"source-name\", 'GPU') && \
+             contains(\"source-name\", 'OKE-${KUBERNETES_VERSION#v}')].\"image-id\" | [0]" \
+    --raw-output)
+
+CPU_IMAGE_ID=$(oci ce node-pool-options get \
+    --node-pool-option-id all \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.sources[?contains(\"source-name\", 'OKE-${KUBERNETES_VERSION#v}') && \
+             !contains(\"source-name\", 'GPU') && \
+             contains(\"source-name\", 'aarch64')==\`false\`].\"image-id\" | [0]" \
+    --raw-output)
+
+# Verify both image IDs were found
+echo "GPU image: ${GPU_IMAGE_ID}"
+echo "CPU image: ${CPU_IMAGE_ID}"
+# If either is empty, list available images and pick manually:
+# oci ce node-pool-options get --node-pool-option-id all \
+#     --compartment-id "${OCI_COMPARTMENT_ID}" \
+#     --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+#     --query "data.sources[?contains(\"source-name\", 'OKE-${KUBERNETES_VERSION#v}')].{name:\"source-name\",id:\"image-id\"}" \
+#     --output table
+
+# Pick an availability domain with A10 capacity.
+# Iterate through ADs and use the first one with capacity available.
+AD=""
+for CANDIDATE in $(oci iam availability-domain list \
+        --compartment-id "${OCI_COMPARTMENT_ID}" \
+        --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+        --query 'data[].name' --raw-output | jq -r '.[]'); do
+    AVAIL=$(oci limits resource-availability get \
+        --compartment-id "${OCI_COMPARTMENT_ID}" \
+        --service-name compute --limit-name gpu-a10-count \
+        --availability-domain "${CANDIDATE}" \
+        --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+        --query 'data.available' --raw-output 2>/dev/null)
+    if [[ "${AVAIL}" =~ ^[0-9]+$ ]] && (( AVAIL > 0 )); then
+        AD="${CANDIDATE}"
+        echo "Selected AD with ${AVAIL} A10s available: ${AD}"
+        break
+    fi
+done
+[[ -z "${AD}" ]] && { echo "No AD with A10 capacity in ${OCI_REGION}"; exit 1; }
+
+# CPU node pool (boot volume >= 100 GB for the router image)
+oci ce node-pool create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --cluster-id "${CLUSTER_ID}" \
+    --name "cpu-pool" \
+    --kubernetes-version "${KUBERNETES_VERSION}" \
+    --node-shape "VM.Standard.E5.Flex" \
+    --node-shape-config '{"ocpus": 2, "memoryInGBs": 16}' \
+    --node-image-id "${CPU_IMAGE_ID}" \
+    --node-boot-volume-size-in-gbs 100 \
+    --size 1 \
+    --placement-configs "[{\"availabilityDomain\": \"${AD}\", \"subnetId\": \"${WORKER_SUBNET_ID}\"}]" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+
+# GPU node pool (boot volume 200 GB)
+oci ce node-pool create \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --cluster-id "${CLUSTER_ID}" \
+    --name "gpu-pool" \
+    --kubernetes-version "${KUBERNETES_VERSION}" \
+    --node-shape "VM.GPU.A10.1" \
+    --node-image-id "${GPU_IMAGE_ID}" \
+    --node-boot-volume-size-in-gbs 200 \
+    --size 1 \
+    --placement-configs "[{\"availabilityDomain\": \"${AD}\", \"subnetId\": \"${WORKER_SUBNET_ID}\"}]" \
+    --initial-node-labels '[{"key": "app", "value": "gpu"}, {"key": "nvidia.com/gpu", "value": "true"}]' \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+```
+
+Wait for both node pools to show nodes as ACTIVE (~10 minutes):
+
+```bash
+watch -n 30 "oci ce node-pool list \
+    --compartment-id ${OCI_COMPARTMENT_ID} \
+    --cluster-id ${CLUSTER_ID} \
+    --profile ${OCI_PROFILE} --region ${OCI_REGION} \
+    --query 'data[].{name:name,nodes:nodes[].{ip:\"private-ip\",state:\"lifecycle-state\"}}'"
+```
+
+**Do not proceed to Step 6 until both node pools show nodes as ACTIVE.**
+
+**Important:** The CPU boot volume must be at least **100 GB**. The vLLM router
+image is ~10.5 GB and the default 47 GB boot volume causes pod eviction.
+
+## Step 6: Connect to the private cluster
+
+Download kubeconfig and configure for tunnel access:
+
+```bash
+oci ce cluster create-kubeconfig \
+    --cluster-id "${CLUSTER_ID}" \
+    --file ~/.kube/config-nemotron \
+    --region "${OCI_REGION}" \
+    --token-version 2.0.0 \
+    --kube-endpoint PRIVATE_ENDPOINT \
+    --profile "${OCI_PROFILE}" --overwrite
+
+export KUBECONFIG=~/.kube/config-nemotron
+
+# Get the private endpoint IP
+PRIVATE_IP=$(oci ce cluster get --cluster-id "${CLUSTER_ID}" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query 'data.endpoints."private-endpoint"' --raw-output | cut -d: -f1)
+
+# Update kubeconfig to use localhost tunnel
+CLUSTER_CTX=$(kubectl config view --minify -o jsonpath='{.clusters[0].name}')
+kubectl config set-cluster "${CLUSTER_CTX}" \
+    --server=https://127.0.0.1:6443 \
+    --insecure-skip-tls-verify=true
+```
+
+If your OCI CLI profile is not `DEFAULT`, add it to the kubeconfig:
+
+```yaml
+# In the users[].user.exec section, replace env: [] with:
+env:
+  - name: OCI_CLI_PROFILE
+    value: YOUR_PROFILE
+```
+
+Create a Bastion session and start the SSH tunnel:
+
+```bash
+SESSION_ID=$(oci bastion session create-port-forwarding \
+    --bastion-id "${BASTION_ID}" \
+    --target-private-ip "${PRIVATE_IP}" \
+    --target-port 6443 \
+    --session-ttl 10800 \
+    --display-name "nemotron-kubectl" \
+    --ssh-public-key-file ~/.ssh/id_ed25519.pub \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query "data.id" --raw-output)
+
+# Wait for session to become ACTIVE, then start tunnel
+ssh -i ~/.ssh/id_ed25519 -N -L 6443:${PRIVATE_IP}:6443 \
+    -p 22 -o StrictHostKeyChecking=no -o ServerAliveInterval=30 \
+    ${SESSION_ID}@host.bastion.${OCI_REGION}.oci.oraclecloud.com &
+
+# Verify
+kubectl get nodes
+```
+
+**Note:** Bastion sessions expire after the TTL (default 3 hours). Create a
+new session and restart the tunnel when access drops.
+
+## Step 7: Expand boot volume filesystems
+
+OCI boot volumes provision only ~47 GB of usable root filesystem regardless
+of the requested size. Both nodes must be expanded.
+
+**Why this matters:** The vLLM engine image is ~10 GB, the router image is
+~10.5 GB, and the model weights are ~16 GB. Without expansion, pods get
+evicted for low ephemeral storage.
+
+For each node, run the following (use a unique pod name per node):
+
+```bash
+NODE_IP=<node-internal-ip>
+POD_NAME=expand-$(echo $NODE_IP | tr '.' '-')
+
+kubectl run ${POD_NAME} --restart=Never \
+  --image=busybox:latest \
+  --overrides="{
+    \"spec\":{
+      \"nodeName\":\"${NODE_IP}\",
+      \"tolerations\":[{\"operator\":\"Exists\"}],
+      \"containers\":[{
+        \"name\":\"expand\",
+        \"image\":\"busybox:latest\",
+        \"command\":[\"sleep\",\"600\"],
+        \"securityContext\":{\"privileged\":true},
+        \"volumeMounts\":[{\"name\":\"host\",\"mountPath\":\"/host\"}]
+      }],
+      \"volumes\":[{\"name\":\"host\",\"hostPath\":{\"path\":\"/\"}}]
+    }
+  }"
+
+kubectl wait --for=condition=Ready pod/${POD_NAME} --timeout=60s
+
+kubectl exec ${POD_NAME} -- chroot /host bash -c '
+  growpart /dev/sda 3
+  sleep 3
+  pvresize /dev/sda3
+  lvextend -l +100%FREE /dev/ocivolume/root
+  xfs_growfs /
+  df -h /
+'
+
+kubectl delete pod ${POD_NAME} --force
+```
+
+Repeat for each node. Expected results:
+
+- GPU node (200 GB boot volume): 36 GB → ~189 GB usable
+- CPU node (100 GB boot volume): 36 GB → ~89 GB usable
+
+Kubelet caches capacity at startup — in-place `systemctl restart kubelet`
+does not refresh it. See Step 7b.
+
+## Step 7b: Soft-reset each node so kubelet re-reads disk capacity
+
+Drain each node, soft-reset the VM, wait for Ready, uncordon:
+
+```bash
+for NODE_IP in <cpu-node-ip> <gpu-node-ip>; do
+    # Resolve the OCI instance OCID via the node's providerID, which OKE
+    # sets to oci://<instance-ocid>. (`oci ce node-pool list` does not
+    # populate the nested `nodes` array, so a list-based lookup returns
+    # null; `get` per pool also works but is noisier.)
+    INSTANCE_ID=$(kubectl get node "${NODE_IP}" \
+        -o jsonpath='{.spec.providerID}' | sed 's|^oci://||')
+
+    kubectl cordon "${NODE_IP}"
+    kubectl drain "${NODE_IP}" --ignore-daemonsets --delete-emptydir-data \
+        --force --grace-period=30 --timeout=120s || true
+
+    oci compute instance action \
+        --instance-id "${INSTANCE_ID}" --action SOFTRESET \
+        --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+
+    # Wait for VM RUNNING, then for node Ready
+    until [[ "$(oci compute instance get --instance-id "${INSTANCE_ID}" \
+            --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+            --query 'data."lifecycle-state"' --raw-output)" == "RUNNING" ]]; do
+        sleep 15
+    done
+    until kubectl get node "${NODE_IP}" \
+            -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}' \
+            | grep -q True; do
+        sleep 15
+    done
+
+    kubectl uncordon "${NODE_IP}"
+done
+```
+
+Verify kubelet picked up the expanded capacity before continuing:
+
+```bash
+for NODE in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
+    CAP=$(kubectl get node "${NODE}" \
+        -o jsonpath='{.status.capacity.ephemeral-storage}')
+    echo "${NODE}: ${CAP}"
+done
+```
+
+Expected: CPU node ~`93476416Ki` (~89 GiB), GPU node ~`198056192Ki` (~189 GiB).
+If either still shows ~`37206272Ki`, rerun the soft-reset for that node.
+
+## Step 8: Create StorageClasses
+
+```bash
+kubectl apply -f - <<'EOF'
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: oci-block-storage-enc
+provisioner: blockvolume.csi.oraclecloud.com
+parameters:
+  vpusPerGB: "10"
+reclaimPolicy: Delete
+volumeBindingMode: WaitForFirstConsumer
+allowVolumeExpansion: true
+EOF
+```
+
+## Step 9: Patch CoreDNS for GPU tolerations
+
+```bash
+kubectl patch deployment coredns -n kube-system --type='json' \
+  -p='[{"op":"add","path":"/spec/template/spec/tolerations/-",
+        "value":{"key":"nvidia.com/gpu","operator":"Exists","effect":"NoSchedule"}}]'
+
+kubectl patch deployment kube-dns-autoscaler -n kube-system --type='json' \
+  -p='[{"op":"add","path":"/spec/template/spec/tolerations/-",
+        "value":{"key":"nvidia.com/gpu","operator":"Exists","effect":"NoSchedule"}}]'
+```
+
+## Step 10: Create the templates PVC
+
+The `vllm-stack` chart (0.1.10) mounts a `vllm-templates-pvc` volume in every
+engine pod. This PVC must exist before deploying:
+
+```bash
+kubectl apply -f - <<'EOF'
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: vllm-templates-pvc
+  namespace: default
+spec:
+  accessModes:
+    - ReadWriteOnce
+  storageClassName: oci-block-storage-enc
+  resources:
+    requests:
+      storage: 1Gi
+EOF
+```
+
+## Step 11: Deploy vLLM
+
+The checked-in values file
+[`vllm_oke_phoenix_private_values.yaml`](./vllm_oke_phoenix_private_values.yaml)
+contains the validated configuration for this deployment.
+
+```bash
+helm repo add vllm https://vllm-project.github.io/production-stack
+helm repo update
+
+helm upgrade --install vllm vllm/vllm-stack \
+  -n default \
+  -f vllm_oke_phoenix_private_values.yaml
+```
+
+**Do not** pass `--wait` to Helm. The engine pod takes several minutes to pull
+the image (~10 GB) and download the model.
+
+Monitor progress:
+
+```bash
+kubectl get pods -n default -w
+```
+
+Wait for both pods to show `1/1 Running`:
+
+- `vllm-deployment-router-*` — request router (CPU node)
+- `vllm-llama31-nemotron-nano-8b-deployment-vllm-*` — model engine (GPU node)
+
+## Step 12: Validate
+
+```bash
+kubectl -n default port-forward svc/vllm-router-service 8080:80
+```
+
+Health check:
+
+```bash
+curl -s http://127.0.0.1:8080/health
+# {"status":"healthy"}
+```
+
+Model discovery:
+
+```bash
+curl -s http://127.0.0.1:8080/v1/models | jq .
+```
+
+Chat completion:
+
+```bash
+curl -s http://127.0.0.1:8080/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "nvidia/Llama-3.1-Nemotron-Nano-8B-v1",
+    "messages": [{"role": "user", "content": "Reply with NEMOTRON_OK"}]
+  }'
+```
+
+Tool-calling smoke test:
+
+```bash
+curl -s http://127.0.0.1:8080/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "nvidia/Llama-3.1-Nemotron-Nano-8B-v1",
+    "messages": [{"role": "user", "content": "What time is it in UTC?"}],
+    "tools": [{
+      "type": "function",
+      "function": {
+        "name": "get_utc_time",
+        "description": "Return the current UTC time",
+        "parameters": {"type": "object", "properties": {}, "required": []}
+      }
+    }]
+  }'
+```
+
+Expected: `finish_reason` set to `tool_calls`.
+
+## Key vLLM settings
+
+| Setting | Value | Why |
+|---------|-------|-----|
+| `tag` | `v0.19.0` | Pinned to validated vLLM version |
+| `maxModelLen` | `4096` | Conservative context to fit single A10 (24 GB) |
+| `gpuMemoryUtilization` | `0.95` | Maximize GPU memory for KV cache |
+| `enableTool` | `true` | Enable tool / function calling |
+| `toolCallParser` | `llama3_json` | Parser matching Nemotron's tool format |
+| `extraArgs` | `--chat-template=...` | Template passed as CLI arg (chart's `chatTemplate` field prepends `/templates/`) |
+| `storageClass` | `oci-block-storage-enc` | OCI Block Volume with balanced performance |
+
+## Troubleshooting
+
+### Pods evicted for ephemeral storage
+
+OCI boot volumes provision only ~47 GB of usable filesystem by default.
+Follow Step 7 to expand. If the boot volume itself is too small (default
+47 GB), resize it first via the OCI CLI, then rescan the block device before
+running `growpart`:
+
+```bash
+echo 1 > /sys/class/block/sda/device/rescan
+```
+
+### Engine pod evicted mid image pull despite Step 7 reporting success
+
+Symptoms: engine pod reaches `ContainerCreating`, then kubelet evicts it with
+`The node was low on resource: ephemeral-storage` (or `inodes`), and
+`FreeDiskSpaceFailed: ... but only found 0 bytes eligible to free`.
+
+Cause: kubelet's `Node.Capacity.ephemeral-storage` is cached at startup. Even
+after Step 7 expands the filesystem to ~189 GiB, kubelet continues to report
+the original ~37 GiB and triggers eviction thresholds against the stale value.
+Confirm with:
+
+```bash
+kubectl describe node <node-ip> | grep "ephemeral-storage:"
+```
+
+If the value is ~`37206272Ki`, apply Step 7b (soft-reset the VM). An in-place
+`systemctl restart kubelet` does **not** refresh the capacity.
+
+### SSH tunnel to OCI Bastion closes immediately after authentication
+
+Symptoms: `ssh -N -L 6443:... <session-id>@host.bastion.<region>.oci.oraclecloud.com`
+completes publickey auth, reports
+`Local forwarding listening on 127.0.0.1 port 6443`, then:
+`Connection to host.bastion.<region>.oci.oraclecloud.com closed by remote host.`
+Port 6443 never stays open on the client.
+
+Cause: OpenSSH 10.x (shipped on macOS 15+ and recent Linux distros) is
+incompatible with OCI Bastion's Go SSH server implementation for
+port-forwarding sessions.
+
+Workaround: use the jump-host VM path in
+[Appendix A](#appendix-a-jump-host-vm-alternative-openssh-10x). Downgrading
+the client to OpenSSH 9.x also works but is typically impractical on macOS.
+
+### Engine pod stays Pending with PVC not found
+
+The `vllm-stack` chart (0.1.10) requires `vllm-templates-pvc` to exist
+before the engine pod can schedule. See Step 10.
+
+### Engine pod crashes with chat template error
+
+The chart's `chatTemplate` field prepends `/templates/` to the path. Pass
+the template via `vllmConfig.extraArgs` instead:
+
+```yaml
+vllmConfig:
+  extraArgs:
+    - "--chat-template=/vllm-workspace/examples/tool_chat_template_llama3.1_json.jinja"
+```
+
+### Tool calling does not work
+
+Ensure all of these are set in the values file:
+
+- `enableTool: true`
+- `toolCallParser: llama3_json`
+- `--chat-template=...` in `vllmConfig.extraArgs`
+
+### `kubectl` cannot reach the cluster
+
+Re-establish the Bastion tunnel. Sessions expire after the configured TTL.
+
+### Helm upgrade fails with field manager conflict
+
+Uninstall and reinstall:
+
+```bash
+helm uninstall vllm -n default
+helm install vllm vllm/vllm-stack -n default -f vllm_oke_phoenix_private_values.yaml
+```
+
+## Cleanup
+
+To tear down all resources:
+
+```bash
+# 1. Uninstall Helm release and PVCs
+helm uninstall vllm -n default
+kubectl delete pvc --all -n default
+
+# 2. List and delete node pools
+oci ce node-pool list --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --cluster-id "${CLUSTER_ID}" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query 'data[].{name:name,id:id}' --output table
+
+oci ce node-pool delete --node-pool-id <cpu-pool-id> --force \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+oci ce node-pool delete --node-pool-id <gpu-pool-id> --force \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+
+# 3. Wait for node pools, then delete cluster
+oci ce cluster delete --cluster-id "${CLUSTER_ID}" --force \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+
+# 4. Delete bastion
+oci bastion bastion delete --bastion-id "${BASTION_ID}" --force \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+
+# 5. Wait for cluster deletion, then delete networking
+#    Delete subnets first, then route tables, gateways, and VCN
+for SUBNET_ID in "${API_SUBNET_ID}" "${WORKER_SUBNET_ID}" \
+                  "${LB_SUBNET_ID}" "${BASTION_SUBNET_ID}"; do
+    oci network subnet delete --subnet-id "${SUBNET_ID}" --force \
+        --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+done
+
+# Delete non-default route tables, security lists, then gateways, then VCN
+```
+
+## Alternative: Terraform
+
+A Terraform sample using the `oracle-terraform-modules/oke/oci` module is
+available in [`terraform/`](./terraform/) for reference. Note that the
+module's NSG configuration requires its built-in bastion compute host
+(`create_bastion = true`) for OCI Bastion port-forwarding to work. The
+manual CLI approach above is recommended for initial deployments.
+
+## Appendix A: Jump-host VM alternative (OpenSSH 10.x)
+
+Use this when `ssh -V` reports OpenSSH 10.x. Replaces Step 4 and the
+bastion-session block in Step 6.
+
+Trade-off: this is a public-IP VM, not OCI's managed bastion service.
+Terminate it during cleanup.
+
+### A.1 Launch the jump-host VM (replaces Step 4)
+
+```bash
+OL_IMAGE_ID=$(oci compute image list \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --operating-system "Oracle Linux" --operating-system-version "9" \
+    --shape "VM.Standard.E5.Flex" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query 'data[?"lifecycle-state"==`AVAILABLE`] | sort_by(@, &"time-created") | [-1].id' \
+    --raw-output)
+
+SSH_PUB=$(cat ~/.ssh/id_ed25519.pub)
+METADATA=$(jq -cn --arg k "${SSH_PUB}" '{"ssh_authorized_keys": $k}')
+
+oci compute instance launch \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --availability-domain "${AD}" \
+    --display-name "${CLUSTER_NAME}-jumphost" \
+    --shape "VM.Standard.E5.Flex" \
+    --shape-config '{"ocpus":1,"memoryInGBs":8}' \
+    --image-id "${OL_IMAGE_ID}" \
+    --subnet-id "${BASTION_SUBNET_ID}" \
+    --assign-public-ip true \
+    --metadata "${METADATA}" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --wait-for-state RUNNING
+
+JUMP_HOST_ID=$(oci compute instance list \
+    --compartment-id "${OCI_COMPARTMENT_ID}" \
+    --display-name "${CLUSTER_NAME}-jumphost" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query 'data[?"lifecycle-state"==`RUNNING`] | [0].id' --raw-output)
+
+VNIC_ID=$(oci compute vnic-attachment list \
+    --compartment-id "${OCI_COMPARTMENT_ID}" --instance-id "${JUMP_HOST_ID}" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query 'data[0]."vnic-id"' --raw-output)
+
+JUMP_HOST_IP=$(oci network vnic get --vnic-id "${VNIC_ID}" \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}" \
+    --query 'data."public-ip"' --raw-output)
+
+echo "Jump host public IP: ${JUMP_HOST_IP}"
+
+# cloud-init may still be copying authorized_keys when the VM first reports
+# RUNNING — wait for port 22 to accept connections before using ssh.
+until nc -z -G 3 "${JUMP_HOST_IP}" 22 2>/dev/null; do sleep 2; done
+```
+
+### A.2 Open the tunnel through the jump-host (replaces Step 6 bastion block)
+
+Run Step 6 up through the `kubectl config set-cluster` server-URL rewrite,
+then skip the `oci bastion session` block and tunnel directly:
+
+```bash
+nohup ssh -f -N -L 6443:${PRIVATE_IP}:6443 \
+    -i ~/.ssh/id_ed25519 \
+    -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
+    -o IdentitiesOnly=yes -o ServerAliveInterval=30 \
+    -o ExitOnForwardFailure=yes \
+    opc@${JUMP_HOST_IP} < /dev/null > /tmp/nemotron-ssh-tunnel.log 2>&1
+
+nc -z 127.0.0.1 6443 && echo "tunnel up" || echo "tunnel failed"
+kubectl get nodes
+```
+
+No session TTL; restart the tunnel after a laptop sleep or network change.
+
+### A.3 Cleanup addition
+
+When running the cleanup steps, also terminate the jump-host:
+
+```bash
+oci compute instance terminate --instance-id "${JUMP_HOST_ID}" --force \
+    --preserve-boot-volume false \
+    --profile "${OCI_PROFILE}" --region "${OCI_REGION}"
+```
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/.gitignore b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/.gitignore
new file mode 100644
index 000000000..1a22d40bb
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/.gitignore
@@ -0,0 +1,5 @@
+.terraform/
+terraform.tfvars
+terraform.tfstate
+terraform.tfstate.*
+tfplan
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/.terraform.lock.hcl b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/.terraform.lock.hcl
new file mode 100644
index 000000000..a539a5863
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/.terraform.lock.hcl
@@ -0,0 +1,145 @@
+# This file is maintained automatically by "terraform init".
+# Manual edits may be lost in future updates.
+
+provider "registry.terraform.io/hashicorp/cloudinit" {
+  version     = "2.3.7"
+  constraints = ">= 2.2.0"
+  hashes = [
+    "h1:M9TpQxKAE/hyOwytdX9MUNZw30HoD/OXqYIug5fkqH8=",
+    "zh:06f1c54e919425c3139f8aeb8fcf9bceca7e560d48c9f0c1e3bb0a8ad9d9da1e",
+    "zh:0e1e4cf6fd98b019e764c28586a386dc136129fef50af8c7165a067e7e4a31d5",
+    "zh:1871f4337c7c57287d4d67396f633d224b8938708b772abfc664d1f80bd67edd",
+    "zh:2b9269d91b742a71b2248439d5e9824f0447e6d261bfb86a8a88528609b136d1",
+    "zh:3d8ae039af21426072c66d6a59a467d51f2d9189b8198616888c1b7fc42addc7",
+    "zh:3ef4e2db5bcf3e2d915921adced43929214e0946a6fb11793085d9a48995ae01",
+    "zh:42ae54381147437c83cbb8790cc68935d71b6357728a154109d3220b1beb4dc9",
+    "zh:4496b362605ae4cbc9ef7995d102351e2fe311897586ffc7a4a262ccca0c782a",
+    "zh:652a2401257a12706d32842f66dac05a735693abcb3e6517d6b5e2573729ba13",
+    "zh:7406c30806f5979eaed5f50c548eced2ea18ea121e01801d2f0d4d87a04f6a14",
+    "zh:7848429fd5a5bcf35f6fee8487df0fb64b09ec071330f3ff240c0343fe2a5224",
+    "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
+  ]
+}
+
+provider "registry.terraform.io/hashicorp/helm" {
+  version     = "3.1.1"
+  constraints = ">= 3.0.1"
+  hashes = [
+    "h1:47CqNwkxctJtL/N/JuEj+8QMg8mRNI/NWeKO5/ydfZU=",
+    "zh:1a6d5ce931708aec29d1f3d9e360c2a0c35ba5a54d03eeaff0ce3ca597cd0275",
+    "zh:3411919ba2a5941801e677f0fea08bdd0ae22ba3c9ce3309f55554699e06524a",
+    "zh:81b36138b8f2320dc7f877b50f9e38f4bc614affe68de885d322629dd0d16a29",
+    "zh:95a2a0a497a6082ee06f95b38bd0f0d6924a65722892a856cfd914c0d117f104",
+    "zh:9d3e78c2d1bb46508b972210ad706dd8c8b106f8b206ecf096cd211c54f46990",
+    "zh:a79139abf687387a6efdbbb04289a0a8e7eaca2bd91cdc0ce68ea4f3286c2c34",
+    "zh:aaa8784be125fbd50c48d84d6e171d3fb6ef84a221dbc5165c067ce05faab4c8",
+    "zh:afecd301f469975c9d8f350cc482fe656e082b6ab0f677d1a816c3c615837cc1",
+    "zh:c54c22b18d48ff9053d899d178d9ffef7d9d19785d9bf310a07d648b7aac075b",
+    "zh:db2eefd55aea48e73384a555c72bac3f7d428e24147bedb64e1a039398e5b903",
+    "zh:ee61666a233533fd2be971091cecc01650561f1585783c381b6f6e8a390198a4",
+    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
+  ]
+}
+
+provider "registry.terraform.io/hashicorp/http" {
+  version     = "3.5.0"
+  constraints = ">= 3.2.1"
+  hashes = [
+    "h1:dl73+8wzQR++HFGoJgDqY3mj3pm14HUuH/CekVyOj5s=",
+    "zh:047c5b4920751b13425efe0d011b3a23a3be97d02d9c0e3c60985521c9c456b7",
+    "zh:157866f700470207561f6d032d344916b82268ecd0cf8174fb11c0674c8d0736",
+    "zh:1973eb9383b0d83dd4fd5e662f0f16de837d072b64a6b7cd703410d730499476",
+    "zh:212f833a4e6d020840672f6f88273d62a564f44acb0c857b5961cdb3bbc14c90",
+    "zh:2c8034bc039fffaa1d4965ca02a8c6d57301e5fa9fff4773e684b46e3f78e76a",
+    "zh:5df353fc5b2dd31577def9cc1a4ebf0c9a9c2699d223c6b02087a3089c74a1c6",
+    "zh:672083810d4185076c81b16ad13d1224b9e6ea7f4850951d2ab8d30fa6e41f08",
+    "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
+    "zh:7b4200f18abdbe39904b03537e1a78f21ebafe60f1c861a44387d314fda69da6",
+    "zh:843feacacd86baed820f81a6c9f7bd32cf302db3d7a0f39e87976ebc7a7cc2ee",
+    "zh:a9ea5096ab91aab260b22e4251c05f08dad2ed77e43e5e4fadcdfd87f2c78926",
+    "zh:d02b288922811739059e90184c7f76d45d07d3a77cc48d0b15fd3db14e928623",
+  ]
+}
+
+provider "registry.terraform.io/hashicorp/null" {
+  version     = "3.2.4"
+  constraints = ">= 3.2.1"
+  hashes = [
+    "h1:L5V05xwp/Gto1leRryuesxjMfgZwjb7oool4WS1UEFQ=",
+    "zh:59f6b52ab4ff35739647f9509ee6d93d7c032985d9f8c6237d1f8a59471bbbe2",
+    "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
+    "zh:795c897119ff082133150121d39ff26cb5f89a730a2c8c26f3a9c1abf81a9c43",
+    "zh:7b9c7b16f118fbc2b05a983817b8ce2f86df125857966ad356353baf4bff5c0a",
+    "zh:85e33ab43e0e1726e5f97a874b8e24820b6565ff8076523cc2922ba671492991",
+    "zh:9d32ac3619cfc93eb3c4f423492a8e0f79db05fec58e449dee9b2d5873d5f69f",
+    "zh:9e15c3c9dd8e0d1e3731841d44c34571b6c97f5b95e8296a45318b94e5287a6e",
+    "zh:b4c2ab35d1b7696c30b64bf2c0f3a62329107bd1a9121ce70683dec58af19615",
+    "zh:c43723e8cc65bcdf5e0c92581dcbbdcbdcf18b8d2037406a5f2033b1e22de442",
+    "zh:ceb5495d9c31bfb299d246ab333f08c7fb0d67a4f82681fbf47f2a21c3e11ab5",
+    "zh:e171026b3659305c558d9804062762d168f50ba02b88b231d20ec99578a6233f",
+    "zh:ed0fe2acdb61330b01841fa790be00ec6beaac91d41f311fb8254f74eb6a711f",
+  ]
+}
+
+provider "registry.terraform.io/hashicorp/random" {
+  version     = "3.8.1"
+  constraints = ">= 3.4.3"
+  hashes = [
+    "h1:u8AKlWVDTH5r9YLSeswoVEjiY72Rt4/ch7U+61ZDkiQ=",
+    "zh:08dd03b918c7b55713026037c5400c48af5b9f468f483463321bd18e17b907b4",
+    "zh:0eee654a5542dc1d41920bbf2419032d6f0d5625b03bd81339e5b33394a3e0ae",
+    "zh:229665ddf060aa0ed315597908483eee5b818a17d09b6417a0f52fd9405c4f57",
+    "zh:2469d2e48f28076254a2a3fc327f184914566d9e40c5780b8d96ebf7205f8bc0",
+    "zh:37d7eb334d9561f335e748280f5535a384a88675af9a9eac439d4cfd663bcb66",
+    "zh:741101426a2f2c52dee37122f0f4a2f2d6af6d852cb1db634480a86398fa3511",
+    "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
+    "zh:a902473f08ef8df62cfe6116bd6c157070a93f66622384300de235a533e9d4a9",
+    "zh:b85c511a23e57a2147355932b3b6dce2a11e856b941165793a0c3d7578d94d05",
+    "zh:c5172226d18eaac95b1daac80172287b69d4ce32750c82ad77fa0768be4ea4b8",
+    "zh:dab4434dba34aad569b0bc243c2d3f3ff86dd7740def373f2a49816bd2ff819b",
+    "zh:f49fd62aa8c5525a5c17abd51e27ca5e213881d58882fd42fec4a545b53c9699",
+  ]
+}
+
+provider "registry.terraform.io/hashicorp/time" {
+  version     = "0.13.1"
+  constraints = ">= 0.9.1"
+  hashes = [
+    "h1:ZT5ppCNIModqk3iOkVt5my8b8yBHmDpl663JtXAIRqM=",
+    "zh:02cb9aab1002f0f2a94a4f85acec8893297dc75915f7404c165983f720a54b74",
+    "zh:04429b2b31a492d19e5ecf999b116d396dac0b24bba0d0fb19ecaefe193fdb8f",
+    "zh:26f8e51bb7c275c404ba6028c1b530312066009194db721a8427a7bc5cdbc83a",
+    "zh:772ff8dbdbef968651ab3ae76d04afd355c32f8a868d03244db3f8496e462690",
+    "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
+    "zh:898db5d2b6bd6ca5457dccb52eedbc7c5b1a71e4a4658381bcbb38cedbbda328",
+    "zh:8de913bf09a3fa7bedc29fec18c47c571d0c7a3d0644322c46f3aa648cf30cd8",
+    "zh:9402102c86a87bdfe7e501ffbb9c685c32bbcefcfcf897fd7d53df414c36877b",
+    "zh:b18b9bb1726bb8cfbefc0a29cf3657c82578001f514bcf4c079839b6776c47f0",
+    "zh:b9d31fdc4faecb909d7c5ce41d2479dd0536862a963df434be4b16e8e4edc94d",
+    "zh:c951e9f39cca3446c060bd63933ebb89cedde9523904813973fbc3d11863ba75",
+    "zh:e5b773c0d07e962291be0e9b413c7a22c044b8c7b58c76e8aa91d1659990dfb5",
+  ]
+}
+
+provider "registry.terraform.io/oracle/oci" {
+  version     = "8.5.0"
+  constraints = ">= 4.67.3, >= 7.30.0"
+  hashes = [
+    "h1:YGSTTLRk0vpD4P0dJFt2lZ2XphT2skF9AxBGCkM04z4=",
+    "zh:0289ba575d3749068fc12fdbfa3f44b9780b21a23315eb2ca5bcf73065cc4fe7",
+    "zh:1152fd8451c2b74d87594fda1aa69e6a3f772189b902a592e91fcc57dfe3c48f",
+    "zh:3e4b1a2e345263e48d6be4d6d01fd5976b09af585e4a9314d318ab216304b8f1",
+    "zh:6b88ebb0ed7de80e324124511251561072c8a5f1ae222aa588063a1652ff72e8",
+    "zh:8ef61c735f19e1be9abeeb79debbeacd91e5996b4be5719d61323244e19ebe3d",
+    "zh:8fcdc6701173b59d78f076f8ce4ce01ef127bf5bf65323340e23c0b14da02f9d",
+    "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
+    "zh:a03e6f788876b7408d811eb21056986e15c46876983637e7e5e645fff28d0587",
+    "zh:b1149065247943c0937359e0f2ed5fdce9c2a588e32e90b9c13be64f709f8121",
+    "zh:b375612ef300e7f53797552521d3ec10f3d9465ccbe6d96519314e32d6611c93",
+    "zh:daf49947168641d170f59907b2592f020ab17f5443e8f5a96174219112d51fe2",
+    "zh:e9649887105493b311cbaf180ba635186e1a4c3b5fe7e26ea9bfd06a52aa76f3",
+    "zh:f593bb15d46c5c998401fea9cc3fdf7950b81a53632ecb1bea8d2cc41971ccca",
+    "zh:f7f1f4d0c5922bd0403b989ebed168577164dbfc45181b2e19dcb888e1fc9df7",
+    "zh:fafce2b47e3227dc8068db4f2bf223c4a4b8fefe39f50aeced467eed1bd901e3",
+  ]
+}
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/README.md b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/README.md
new file mode 100644
index 000000000..9d0fe219f
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/README.md
@@ -0,0 +1,97 @@
+# Terraform: Private OCI OKE for Llama-3.1-Nemotron-Nano-8B-v1
+
+This Terraform example provisions the **private-only** OCI infrastructure for
+the validated Phoenix deployment described in the parent cookbook.
+
+It is intended to give Nemotron users a reproducible OCI path for NVIDIA model
+serving that highlights Oracle Cloud's operational strengths: private OKE,
+managed Bastion access, and a clean infrastructure-as-code path for GPU-backed
+Nemotron deployments.
+
+It creates:
+
+- a VCN
+- a **private** OKE cluster
+- a private CPU node pool
+- a private GPU node pool targeting `VM.GPU.A10.1`
+- an **OCI Bastion service** resource for private access
+
+It does **not** create:
+
+- a public Kubernetes API endpoint
+- public worker-node IPs
+- a public bastion host
+- a public inference endpoint
+
+## Bastion note
+
+This sample provisions the **OCI Bastion service** so that private-cluster
+access is reproducible from Terraform.
+
+That is intentionally different from creating a public bastion VM:
+
+- no public bastion compute instance is created
+- no worker node receives a public IP
+- the Kubernetes API remains private
+
+If your environment already manages private-cluster access through a separate
+operator workflow, you can remove the `oci_bastion_bastion` resource and keep
+the rest of the sample unchanged.
+
+## Module choice
+
+This wrapper intentionally uses Oracle's official OKE Terraform module:
+
+- `oracle-terraform-modules/oke/oci`
+
+The Nemotron-specific layer in this directory adds:
+
+- the Phoenix defaults
+- the no-public-IP constraints
+- the A10-focused worker pool defaults
+- the OCI Bastion service resource required for private access
+
+## Files
+
+- [`main.tf`](./main.tf) - private OKE cluster, worker pools, OCI Bastion
+- [`variables.tf`](./variables.tf) - deployment inputs
+- [`outputs.tf`](./outputs.tf) - useful IDs and private endpoint information
+- [`terraform.tfvars.example`](./terraform.tfvars.example) - starting point
+
+## Usage
+
+```bash
+cp terraform.tfvars.example terraform.tfvars
+terraform init
+terraform plan
+terraform apply
+```
+
+The validated live run completed successfully in `us-phoenix-1`, including:
+
+- private OKE cluster creation
+- OCI Bastion service creation
+- CPU node pool creation
+- GPU node pool creation on `VM.GPU.A10.1` in `PHX-AD-2`
+
+After the infrastructure is ready:
+
+1. create an OCI Bastion session to reach the private cluster
+2. deploy the model with:
+   - [`../vllm_oke_phoenix_private_values.yaml`](../vllm_oke_phoenix_private_values.yaml)
+3. validate:
+   - `/health`
+   - `/v1/models`
+   - chat completion
+   - tool calling
+   - streaming
+
+## Notes
+
+- The validated live deployment used `us-phoenix-1`.
+- The validated GPU pool used Phoenix `AD-2`, exposed as `gpu_placement_ads`.
+- The Bastion resource here is the OCI managed Bastion service, not a public
+  bastion VM.
+- `ssh_public_key_path` must point to an actual OpenSSH public key file; the
+  wrapper reads the file contents with Terraform's `file()` function before
+  passing it to OKE.
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/main.tf b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/main.tf
new file mode 100644
index 000000000..e9b070784
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/main.tf
@@ -0,0 +1,112 @@
+provider "oci" {
+  config_file_profile = var.config_file_profile
+  tenancy_ocid        = var.tenancy_ocid
+  region              = var.region
+}
+
+locals {
+  common_tags = merge(var.freeform_tags, {
+    model      = "nvidia/Llama-3.1-Nemotron-Nano-8B-v1"
+    deployment = "private-oke"
+    region     = var.region
+  })
+}
+
+module "oke" {
+  source  = "oracle-terraform-modules/oke/oci"
+  version = "5.4.1"
+
+  providers = {
+    oci.home = oci
+  }
+
+  tenancy_id     = var.tenancy_ocid
+  compartment_id = var.compartment_ocid
+  region         = var.region
+
+  cluster_name                      = var.cluster_name
+  kubernetes_version                = var.kubernetes_version
+  cluster_type                      = "enhanced"
+  cni_type                          = "flannel"
+  pods_cidr                         = var.pods_cidr
+  services_cidr                     = var.services_cidr
+  vcn_cidrs                         = var.vcn_cidrs
+  ssh_public_key                    = file(var.ssh_public_key_path)
+  output_detail                     = true
+  create_vcn                        = true
+  create_bastion                    = false
+  create_operator                   = false
+  control_plane_is_public           = false
+  assign_public_ip_to_control_plane = false
+  worker_is_public                  = false
+  allow_worker_internet_access      = true
+  allow_pod_internet_access         = true
+  allow_worker_ssh_access           = false
+  preferred_load_balancer           = "internal"
+  load_balancers                    = "internal"
+  freeform_tags                     = { all = local.common_tags }
+
+  subnets = {
+    cp = {
+      create  = "always"
+      newbits = 13
+      netnum  = 2
+    }
+    workers = {
+      create  = "always"
+      newbits = 2
+      netnum  = 1
+    }
+    pods = {
+      create  = "always"
+      newbits = 2
+      netnum  = 2
+    }
+    int_lb = {
+      create  = "always"
+      newbits = 11
+      netnum  = 16
+    }
+    pub_lb = {
+      create = "never"
+    }
+    bastion = {
+      create = "never"
+    }
+    operator = {
+      create = "never"
+    }
+  }
+
+  worker_pool_mode = "node-pool"
+  worker_pool_size = 1
+  worker_pools = {
+    cpu = {
+      size             = var.cpu_pool_size
+      shape            = var.cpu_shape
+      ocpus            = var.cpu_ocpus
+      memory           = var.cpu_memory_gbs
+      boot_volume_size = 100
+      assign_public_ip = false
+      create           = true
+    }
+    gpu = {
+      size             = var.gpu_pool_size
+      shape            = var.gpu_shape
+      boot_volume_size = var.gpu_boot_volume_size
+      assign_public_ip = false
+      create           = true
+      placement_ads    = var.gpu_placement_ads
+    }
+  }
+}
+
+resource "oci_bastion_bastion" "oci_bastion" {
+  compartment_id               = var.compartment_ocid
+  bastion_type                 = "STANDARD"
+  target_subnet_id             = module.oke.worker_subnet_id
+  client_cidr_block_allow_list = var.bastion_client_cidrs
+  max_session_ttl_in_seconds   = 10800
+  name                         = "${var.cluster_name}-bastion"
+  freeform_tags                = local.common_tags
+}
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/outputs.tf b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/outputs.tf
new file mode 100644
index 000000000..c39a82eed
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/outputs.tf
@@ -0,0 +1,34 @@
+output "cluster_id" {
+  description = "OKE cluster OCID."
+  value       = module.oke.cluster_id
+}
+
+output "cluster_endpoints" {
+  description = "Cluster endpoints; private endpoint should be used."
+  value       = module.oke.cluster_endpoints
+}
+
+output "apiserver_private_host" {
+  description = "Private control-plane host."
+  value       = module.oke.apiserver_private_host
+}
+
+output "vcn_id" {
+  description = "VCN used by the Nemotron deployment."
+  value       = module.oke.vcn_id
+}
+
+output "control_plane_subnet_id" {
+  description = "Private control-plane subnet."
+  value       = module.oke.control_plane_subnet_id
+}
+
+output "worker_subnet_id" {
+  description = "Private worker subnet."
+  value       = module.oke.worker_subnet_id
+}
+
+output "oci_bastion_id" {
+  description = "OCI Bastion service OCID for creating private sessions."
+  value       = oci_bastion_bastion.oci_bastion.id
+}
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/terraform.tfvars.example b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/terraform.tfvars.example
new file mode 100644
index 000000000..9a2bab0ce
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/terraform.tfvars.example
@@ -0,0 +1,12 @@
+tenancy_ocid        = "ocid1.tenancy.oc1..exampleuniqueID"
+compartment_ocid    = "ocid1.compartment.oc1..exampleuniqueID"
+config_file_profile = "API_KEY_AUTH"
+region              = "us-phoenix-1"
+cluster_name        = "nemotron-phx-private"
+ssh_public_key_path = "~/.ssh/id_ed25519.pub"
+
+# Restrict Bastion session creation to your current client egress CIDR.
+bastion_client_cidrs = ["203.0.113.10/32"]
+
+# The validated deployment used Phoenix AD-2 for the A10 node pool.
+gpu_placement_ads = [2]
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/variables.tf b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/variables.tf
new file mode 100644
index 000000000..165cabf57
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/variables.tf
@@ -0,0 +1,115 @@
+variable "tenancy_ocid" {
+  description = "OCI tenancy OCID."
+  type        = string
+}
+
+variable "compartment_ocid" {
+  description = "Compartment where the OKE cluster and Bastion service will be created."
+  type        = string
+}
+
+variable "region" {
+  description = "OCI region for the deployment."
+  type        = string
+  default     = "us-phoenix-1"
+}
+
+variable "config_file_profile" {
+  description = "OCI CLI config profile name."
+  type        = string
+  default     = "DEFAULT"
+}
+
+variable "cluster_name" {
+  description = "Name prefix for the private Nemotron OKE deployment."
+  type        = string
+  default     = "nemotron-oci-phx"
+}
+
+variable "ssh_public_key_path" {
+  description = "Path to the OpenSSH public key file used for private worker access."
+  type        = string
+}
+
+variable "vcn_cidrs" {
+  description = "VCN CIDR blocks for the deployment."
+  type        = list(string)
+  default     = ["10.0.0.0/16"]
+}
+
+variable "pods_cidr" {
+  description = "Kubernetes pods CIDR."
+  type        = string
+  default     = "10.244.0.0/16"
+}
+
+variable "services_cidr" {
+  description = "Kubernetes services CIDR."
+  type        = string
+  default     = "10.96.0.0/16"
+}
+
+variable "kubernetes_version" {
+  description = "OKE Kubernetes version."
+  type        = string
+  default     = "v1.33.1"
+}
+
+variable "cpu_pool_size" {
+  description = "Number of CPU worker nodes."
+  type        = number
+  default     = 1
+}
+
+variable "cpu_shape" {
+  description = "Shape for the CPU worker pool."
+  type        = string
+  default     = "VM.Standard.E5.Flex"
+}
+
+variable "cpu_ocpus" {
+  description = "OCPUs for each CPU worker if using a flex shape."
+  type        = number
+  default     = 2
+}
+
+variable "cpu_memory_gbs" {
+  description = "Memory in GB for each CPU worker if using a flex shape."
+  type        = number
+  default     = 16
+}
+
+variable "gpu_pool_size" {
+  description = "Number of GPU worker nodes."
+  type        = number
+  default     = 1
+}
+
+variable "gpu_shape" {
+  description = "Shape for the GPU worker pool."
+  type        = string
+  default     = "VM.GPU.A10.1"
+}
+
+variable "gpu_boot_volume_size" {
+  description = "Boot volume size for GPU workers."
+  type        = number
+  default     = 200
+}
+
+variable "gpu_placement_ads" {
+  description = "Availability domains to target for the GPU node pool. Phoenix AD-2 is `[2]`."
+  type        = list(number)
+  default     = [2]
+}
+
+variable "bastion_client_cidrs" {
+  description = "CIDR blocks allowed to create OCI Bastion sessions."
+  type        = list(string)
+}
+
+variable "freeform_tags" {
+  description = "Optional freeform tags."
+  type        = map(string)
+  default     = {}
+}
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/versions.tf b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/versions.tf
new file mode 100644
index 000000000..1c9c02641
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/versions.tf
@@ -0,0 +1,10 @@
+terraform {
+  required_version = ">= 1.5.0"
+
+  required_providers {
+    oci = {
+      source  = "oracle/oci"
+      version = ">= 7.30.0"
+    }
+  }
+}
diff --git a/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/vllm_oke_phoenix_private_values.yaml b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/vllm_oke_phoenix_private_values.yaml
new file mode 100644
index 000000000..5b4538c37
--- /dev/null
+++ b/usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/vllm_oke_phoenix_private_values.yaml
@@ -0,0 +1,37 @@
+# Validated private OCI OKE deployment values for
+# nvidia/Llama-3.1-Nemotron-Nano-8B-v1 on a single VM.GPU.A10.1 node.
+#
+# Chart: vllm/vllm-stack 0.1.10
+# Validated: 2026-04-15 on OKE v1.31.10, Phoenix (us-phoenix-1)
+#
+# IMPORTANT: Before deploying, you must create the vllm-templates-pvc
+# (see prerequisites in README.md).
+
+servingEngineSpec:
+  runtimeClassName: ""
+  modelSpec:
+    - name: "llama31-nemotron-nano-8b"
+      repository: "vllm/vllm-openai"
+      tag: "v0.19.0"
+      modelURL: "nvidia/Llama-3.1-Nemotron-Nano-8B-v1"
+      enableTool: true
+      toolCallParser: "llama3_json"
+      replicaCount: 1
+      requestCPU: 4
+      requestMemory: "24Gi"
+      requestGPU: 1
+      pvcStorage: "120Gi"
+      pvcAccessMode:
+        - ReadWriteOnce
+      storageClass: "oci-block-storage-enc"
+      nodeSelector:
+        app: gpu
+      tolerations:
+        - key: "nvidia.com/gpu"
+          operator: "Exists"
+          effect: "NoSchedule"
+      vllmConfig:
+        maxModelLen: 4096
+        gpuMemoryUtilization: 0.95
+        extraArgs:
+          - "--chat-template=/vllm-workspace/examples/tool_chat_template_llama3.1_json.jinja"
diff --git a/usage-cookbook/README.md b/usage-cookbook/README.md
index f7d79b5ca..001121f60 100644
--- a/usage-cookbook/README.md
+++ b/usage-cookbook/README.md
@@ -13,5 +13,4 @@ This directory contains cookbook-style guides showing how to deploy and use the
 - **SGLang Deployment** - Tutorials on serving and interacting with Nemotron via SGLang
 - **NIM Microservice** - Guide to deploying Nemotron as scalable, production-ready endpoints using NVIDIA Inference Microservices (NIM).
 - **Hugging Face Transformers** - Direct loading and inference of Nemotron models with Hugging Face Transformers
-
-
+- **OCI OKE Private Deployment** - A Phoenix-only private deployment guide for `nvidia/Llama-3.1-Nemotron-Nano-8B-v1` using OKE, OCI Bastion service, and `vLLM`, providing a reproducible OCI path comparable to common AWS GPU/Kubernetes deployment patterns.