Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OV GPU] Set the inference precision or execution mode for GPU separately #159

Open
mingmingtasd opened this issue Feb 25, 2025 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@mingmingtasd
Copy link
Collaborator

https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/precision-control.html

OV EP supported precision for devices are CPU=FP32, GPU=FP32,FP16, NPU=FP16

@mingmingtasd mingmingtasd added the enhancement New feature or request label Feb 25, 2025
@mingmingtasd mingmingtasd self-assigned this Feb 25, 2025
@mingmingtasd
Copy link
Collaborator Author

Execution Mode
ov::hint::execution_mode is a high-level hint to control whether the user wants to keep the best accuracy (ACCURACY mode) or if the device can do some optimizations that may lower the accuracy for performance reasons (PERFORMANCE mode)

In ACCURACY mode, the device cannot convert floating point tensors to a smaller floating point type, so devices try to keep the accuracy metrics as close as possible to the original values obtained after training relative to the device’s real capabilities. This means that most devices will infer with f32 precision if your device supports it. In this mode, the Dynamic Quantization is disabled.

In PERFORMANCE mode, the device can convert to smaller data types and apply other optimizations that may have some impact on accuracy rates, although we still try to minimize accuracy loss and may use mixed precision execution in some cases.

Inference Precision
ov::hint::inference_precision precision is a lower-level property that allows you to specify the exact precision the user wants, but is less portable. For example, CPU supports f32 inference precision and bf16 on some platforms, GPU supports f32 and f16, so if a user wants to an application that uses multiple devices, they have to handle all these combinations manually or let OV do it automatically by using higher level execution_mode property.

@mingmingtasd mingmingtasd changed the title [OV GPU] Set the inference precision for GPU execution separately [OV GPU] Set the inference precision or execution mode for GPU separately Feb 25, 2025
@huningxin
Copy link
Collaborator

Does OV EP supports setting execution mode?

@mingmingtasd
Copy link
Collaborator Author

Does OV EP supports setting execution mode?

This API SessionOptionsAppendExecutionProvider_OpenVINO_V2 should be able to set all OV options including the inference precision or execution mode, we needn't hard code in ORT. I will double check and send PR to use it.

Image

@mingmingtasd
Copy link
Collaborator Author

Correct my comment above, by SessionOptionsAppendExecutionProvider_OpenVINO_V2, we now can only set the options keys as below (not all OV options), because it's restricted by the the implementation of the ORT OpenVINO EP:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/openvino/openvino_provider_factory.cc

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants