-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OV GPU] Set the inference precision or execution mode for GPU separately #159
Comments
Execution Mode In ACCURACY mode, the device cannot convert floating point tensors to a smaller floating point type, so devices try to keep the accuracy metrics as close as possible to the original values obtained after training relative to the device’s real capabilities. This means that most devices will infer with f32 precision if your device supports it. In this mode, the Dynamic Quantization is disabled. In PERFORMANCE mode, the device can convert to smaller data types and apply other optimizations that may have some impact on accuracy rates, although we still try to minimize accuracy loss and may use mixed precision execution in some cases. Inference Precision |
Does OV EP supports setting execution mode? |
Correct my comment above, by |
https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/precision-control.html
OV EP supported precision for devices are
CPU=FP32, GPU=FP32,FP16, NPU=FP16
The text was updated successfully, but these errors were encountered: