11<!--
2- # Copyright 2020-2024 , NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+ # Copyright 2020-2025 , NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33#
44# Redistribution and use in source and binary forms, with or without
55# modification, are permitted provided that the following conditions
@@ -81,8 +81,8 @@ Currently, Triton requires that a specially patched version of
8181PyTorch be used with the PyTorch backend. The full source for
8282these PyTorch versions are available as Docker images from
8383[ NGC] ( https://ngc.nvidia.com ) . For example, the PyTorch version
84- compatible with the 22.12 release of Triton is available as
85- nvcr.io/nvidia/pytorch:22.12 -py3.
84+ compatible with the 25.09 release of Triton is available as
85+ nvcr.io/nvidia/pytorch:25.09 -py3.
8686
8787Copy over the LibTorch and Torchvision headers and libraries from the
8888[ PyTorch NGC container] ( https://ngc.nvidia.com/catalog/containers/nvidia:pytorch )
@@ -306,50 +306,7 @@ instance in the
306306to ensure that the model instance and the tensors used for inference are
307307assigned to the same GPU device as on which the model was traced.
308308
309- # PyTorch 2.0 Backend \[ Experimental\]
310-
311- > [ !WARNING]
312- > * This feature is subject to change and removal.*
313-
314- Starting from 24.01, PyTorch models can be served directly via
315- [ Python runtime] ( src/model.py ) . By default, Triton will use the
316- [ LibTorch runtime] ( #pytorch-libtorch-backend ) for PyTorch models. To use Python
317- runtime, provide the following
318- [ runtime setting] ( https://github.com/triton-inference-server/backend/blob/main/README.md#backend-shared-library )
319- in the model configuration:
320-
321- ```
322- runtime: "model.py"
323- ```
324-
325- ## Dependencies
326-
327- ### Python backend dependency
328-
329- This feature depends on
330- [ Python backend] ( https://github.com/triton-inference-server/python_backend ) ,
331- see
332- [ Python-based Backends] ( https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md )
333- for more details.
334-
335- ### PyTorch dependency
336-
337- This feature will take advantage of the
338- [ ` torch.compile ` ] ( https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile )
339- optimization, make sure the
340- [ PyTorch 2.0+ pip package] ( https://pypi.org/project/torch ) is available in the
341- same Python environment.
342-
343- Alternatively, a [ Python Execution Environment] ( #using-custom-python-execution-environments )
344- with the PyTorch dependency may be used. It can be created with the
345- [ provided script] ( tools/gen_pb_exec_env.sh ) . The resulting
346- ` pb_exec_env_model.py.tar.gz ` file should be placed at the same
347- [ backend shared library] ( https://github.com/triton-inference-server/backend/blob/main/README.md#backend-shared-library )
348- directory as the [ Python runtime] ( src/model.py ) .
349-
350- ## Model Layout
351-
352- ### PyTorch 2.0 models
309+ ## PyTorch 2.0 models
353310
354311The model repository should look like:
355312
@@ -369,7 +326,7 @@ The `model.pt` may be optionally provided which contains the saved
369326[ ` state_dict ` ] ( https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference )
370327of the model.
371328
372- ### TorchScript models
329+ ## TorchScript models
373330
374331The model repository should look like:
375332
0 commit comments