[OV GPU] Set the inference precision or execution mode for GPU separately #159

mingmingtasd · 2025-02-25T05:09:16Z

https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/precision-control.html

OV EP supported precision for devices are CPU=FP32, GPU=FP32,FP16, NPU=FP16

The text was updated successfully, but these errors were encountered:

mingmingtasd · 2025-02-25T05:12:25Z

Execution Mode
ov::hint::execution_mode is a high-level hint to control whether the user wants to keep the best accuracy (ACCURACY mode) or if the device can do some optimizations that may lower the accuracy for performance reasons (PERFORMANCE mode)

In ACCURACY mode, the device cannot convert floating point tensors to a smaller floating point type, so devices try to keep the accuracy metrics as close as possible to the original values obtained after training relative to the device’s real capabilities. This means that most devices will infer with f32 precision if your device supports it. In this mode, the Dynamic Quantization is disabled.

In PERFORMANCE mode, the device can convert to smaller data types and apply other optimizations that may have some impact on accuracy rates, although we still try to minimize accuracy loss and may use mixed precision execution in some cases.

Inference Precision
ov::hint::inference_precision precision is a lower-level property that allows you to specify the exact precision the user wants, but is less portable. For example, CPU supports f32 inference precision and bf16 on some platforms, GPU supports f32 and f16, so if a user wants to an application that uses multiple devices, they have to handle all these combinations manually or let OV do it automatically by using higher level execution_mode property.

huningxin · 2025-02-25T07:37:46Z

Does OV EP supports setting execution mode?

mingmingtasd · 2025-02-26T01:49:39Z

Does OV EP supports setting execution mode?

This API SessionOptionsAppendExecutionProvider_OpenVINO_V2 should be able to set all OV options including the inference precision or execution mode, we needn't hard code in ORT. I will double check and send PR to use it.

mingmingtasd · 2025-02-26T07:42:49Z

Correct my comment above, by SessionOptionsAppendExecutionProvider_OpenVINO_V2, we now can only set the options keys as below (not all OV options), because it's restricted by the the implementation of the ORT OpenVINO EP:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/openvino/openvino_provider_factory.cc

mingmingtasd added the enhancement New feature or request label Feb 25, 2025

mingmingtasd self-assigned this Feb 25, 2025

mingmingtasd changed the title ~~[OV GPU] Set the inference precision for GPU execution separately~~ [OV GPU] Set the inference precision or execution mode for GPU separately Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OV GPU] Set the inference precision or execution mode for GPU separately #159

[OV GPU] Set the inference precision or execution mode for GPU separately #159

mingmingtasd commented Feb 25, 2025

mingmingtasd commented Feb 25, 2025

huningxin commented Feb 25, 2025

mingmingtasd commented Feb 26, 2025

mingmingtasd commented Feb 26, 2025

[OV GPU] Set the inference precision or execution mode for GPU separately #159

[OV GPU] Set the inference precision or execution mode for GPU separately #159

Comments

mingmingtasd commented Feb 25, 2025

mingmingtasd commented Feb 25, 2025

huningxin commented Feb 25, 2025

mingmingtasd commented Feb 26, 2025

mingmingtasd commented Feb 26, 2025