Skip to content

Optimize CPU time spent in inference path #682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: ovep-develop
Choose a base branch
from

Conversation

ericcraw
Copy link

@ericcraw ericcraw commented May 1, 2025

Move input/output name to ort/ov input output bindings to compilation.
Reduce tensor lookups by name in favor of index look ups.

@ericcraw ericcraw changed the base branch from master to ovep-develop May 1, 2025 18:23
Move input/output name to ort/ov input output bindings to compilation.
Reduce tensor lookups by name in favor of index look ups.
@@ -121,15 +121,15 @@ std::istream& operator>>(std::istream& stream, SharedContext::SharedWeights::Met
namespace backend_utils {

bool IsDebugEnabled() {
const std::string env_name = onnxruntime::GetEnvironmentVar("ORT_OPENVINO_ENABLE_DEBUG");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion, can we pull these into the context or subcontext instead of checking all over the place? It's fine if we do this change for not just wondering.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see a reason why it can't be moved. Though I'll defer that for now since making it static effectively removes the environment var checks anyway.

ORT_ENFORCE(!input_name.empty(), log_tag,
"Input names mismatch between OpenVINO and ONNX. ", onnx_input_name,
" doesn't exist in the list of OpenVINO input tensor names");
bool cpu_or_gpu = (session_context_.device_type.find("CPU") != std::string::npos ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm spreading my musings here I might as well mention that I'd like to avoid doing these string compare all over the place and have a test for selected devices that can be easily and quickly be tested for.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I was tempted to do that as well, but thinking about the meta devices (auto, multi, hetero, etc) complicated just enough that I didn't want to go down that rabbit hole (yet). 😄

@ericcraw
Copy link
Author

@sfatimar would you mind helping this PR along? I have a follow up I want to make on top of this to benefit other devices.

@sfatimar sfatimar requested a review from MayureshV1 May 13, 2025 07:23
@sfatimar
Copy link

@MayureshV1 please follow up.

@MayureshV1
Copy link

Spoke to Eric and reviewed the changes. These look good to me.

Unknowns: Impact on abstract devices like AUTO. This is not an artifact of Eric's changes but we might have some abstract devices that might not get the full benefit because of device checks in basic_backend. @preetha-intel , need to look at that aspect.

Eric has spot tested but we need to validate for functionality and perf impact across below flows:
Inference w/ ONNX model, w/ OV Cache_DIR and w/ EPCtx

@jatinwadhwa921
Copy link

7 unit test are failing, please fix them

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants