Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openvino backend for Executorch to enable inference on Intel CPUs, GPUs, NPUs #8573

Open
wants to merge 122 commits into
base: main
Choose a base branch
from

Conversation

ynimmaga
Copy link

@ynimmaga ynimmaga commented Feb 19, 2025

Summary

This PR introduces support for the OpenVINO backend in Executorch, enabling accelerated inference on Intel hardware, including CPU, GPU, and NPU devices. OpenVINO optimizes deep learning model performance by leveraging hardware-specific enhancements. The PR also introduces the OpenVINO quantizer with NNCF (Neural Network Compression Framework) for model optimization. The functionality has been tested on several torchvision and timm models, with plans to test and enable support for additional model types in the future.

Below is a description of the features:

  • OpenVINO Backend Integration: The backends/openvino directory includes build scripts, AOT components (partitioner, preprocesser), OpenVINO Quantizer, and runtime backend files that register the OpenVINO backend, manage OpenVINO’s inference engine interactions, including model execution, device-specific optimizations, and backend initialization. It also contains tests for layers and models. See backends/openvino/README.md for usage.

  • OpenVINO Examples: The examples/openvino directory provides scripts for AOT optimization, quantization, and C++ executor examples. It includes instructions for optimizing the models, quantizing them, and exporting Executorch programs with OpenVINO optimizations. Refer to examples/openvino/README.md for details.

  • E2E Tutorial: Added an end-to-end tutorial in docs/source/build-run-openvino.md.

Test plan

This PR is tested with OpenVINO backend on Intel Core Ultra 7 processors for CPU, GPU, and NPU devices. To run the layer tests and model tests, please refer to backends/openvino/tests/README.md

cc: @yury-gorbachev @alexsu52 @cavusmustafa @daniil-lyakhov @suryasidd @AlexKoff88 @MaximProshin @AlexanderDokuchaev

ynimmaga and others added 30 commits November 5, 2024 20:46
Handling multiple inputs/outputs with zero-copy
Added fallback with portable kernels
Enhancements to openvino example
else torch.per_tensor_affine
)
if is_weight:
observer = PerChannelMinMaxObserver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there support for 4 bit?

def transform_for_annotation(
self, model: torch.fx.GraphModule
) -> torch.fx.GraphModule:
nncf_fx.transformations.fold_constant_except_qdq(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comment on what this does? why do you need to constant fold here? It doesnt sound like the right thing for this API which is really meant for decomposing ops when decomposed op can be quantized but not undecoposed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done to avoid constant branches quantization (which leads to overhead). A comment with a description will be added soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not avoid doing those in the quantizer?

@@ -0,0 +1,9 @@
datasets
huggingface-hub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why hf hub is a requirement?

Comment on lines +4 to +5
sentencepiece
tokenizers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which of these requirements need to be in this folder?

@@ -0,0 +1,205 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cccclai you added a new API to get backend names, right? Does this file need to account for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it should be fine. We are at beta now and need to account for BC/FC :)

ET_LOG(Error, "OpenVINO is not available: %s", e.what());
} catch (...) {
// Handle any unexpected errors
ET_LOG(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we just logging error instead of actually raising it?

}

// Import the model
auto compiled_model = core.import_model(compiled_stream, device);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this interface requires stream?

ExecutionHandle* handle =
ET_ALLOCATE_INSTANCE_OR_RETURN_ERROR(allocator, ExecutionHandle);
handle->compiled_model = std::make_shared<ov::CompiledModel>(compiled_model);
handle->infer_request = infer_request;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not want to free processed once consumed?

@@ -0,0 +1,69 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our naming convention on files is .h/.cpp

Comment on lines +22 to +34
using namespace std;
using executorch::aten::ScalarType;
using executorch::runtime::ArrayRef;
using executorch::runtime::Backend;
using executorch::runtime::BackendExecutionContext;
using executorch::runtime::BackendInitContext;
using executorch::runtime::CompileSpec;
using executorch::runtime::DelegateHandle;
using executorch::runtime::Error;
using executorch::runtime::EValue;
using executorch::runtime::FreeableBuffer;
using executorch::runtime::MemoryAllocator;
using executorch::runtime::Result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically I would prefer to alias namespace if you want to shorten typing but blanket using entire namespace reduces readability



# Build the project
cmake --build cmake-openvino-out --target install --config Release -j5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you building this in a separate outut directory?

`test_runner.py` allows to run op or model tests for openvino backend.

### **Arguments**
- **`--build_folder`** (required):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed because you need to load library? I would consider making this part of pip package installed in site-packages.

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 21, 2025
Comment on lines +21 to +22
atol = 1e-1
rtol = 1e-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like extremely low rtol/atol?

Comment on lines +82 to +83
with open(pte_fname, "wb") as file:
exec_prog.write_to_file(file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python has temporary directory/file apis. Plus you dont need to write files, actually doing this under test is extremely flaky so I advise against it. ET is load_from_buffer APIs

Comment on lines +102 to +107
env = dict(os.environ)
proc = subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
env=env,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also dont quite like doing this unit tests. DOesnt sound right to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this done executor runner sanity check?

Comment on lines +34 to +63
## Directory Structure

```
executorch
├── backends
│ └── openvino
│ ├── runtime
│ ├── OpenvinoBackend.cpp
│ └── OpenvinoBackend.hpp
│ ├── scripts
│ └── openvino_build.sh
│ ├── tests
│ ├── CMakeLists.txt
│ ├── README.md
│ ├── __init__.py
│ ├── openvino_functions.yaml
│ ├── partitioner.py
│ ├── preprocess.py
│ └── requirements.txt
└── examples
│ └── openvino
│ ├── aot
│ ├── README.md
│ └── aot_openvino_compiler.py
│ └── executor_runner
│ └── openvino_executor_runner.cpp
│ ├── CMakeLists.txt
│ ├── README.md
└── └── openvino_build_example.sh
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis does not need to be here in a tutorial doc

## Build Instructions for Examples

### AOT step:
Refer to the [README.md](aot/README.md) in the `aot` folder for detailed instructions on exporting deep learning models from various model suites (TIMM, Torchvision, Hugging Face) to openvino backend using Executorch. Users can dynamically specify the model, input shape, and target device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this aot folder under examples?

Below is an example to export a ResNet50 model from Torchvision model suite for CPU device with an input shape of `[1, 3, 256, 256]`

```bash
cd aot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd aot from where?

Comment on lines +42 to +45
generate_bindings_for_kernels(
LIB_NAME "openvino_portable_ops_lib" FUNCTIONS_YAML
${EXECUTORCH_ROOT}/backends/openvino/openvino_functions.yaml
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont quite follow why you need to build this? or have openvino_functions.yaml in the first place

@@ -0,0 +1,115 @@
# **Model Export Script for Executorch**

This script allows users to export deep learning models from various model suites (TIMM, Torchvision, Hugging Face) to a openvino backend using **Executorch**. Users can dynamically specify the model, input shape, and target device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which script? This seems like readme file. Is this needed btw or you have the examples/openvino/readme.md covering all that you need

Supported values:
- `timm` (e.g., VGG16, ResNet50)
- `torchvision` (e.g., resnet18, mobilenet_v2)
- `huggingface` (e.g., bert-base-uncased). NB: Quantization and validation is not supported yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why quant is not supported on hf models?

Comment on lines +44 to +45
- `nncf`: `nncf quantize_pt2e` API (default)
- `pt2e`: torch ao quantization pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between these two? is nncf quantizer not compatible with pt2e flow?

# OpenVINO Backend for ExecuTorch
The OpenVINO backend enables optimized execution of deep learning models on Intel hardware, leveraging Intel's [OpenVINO toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html) for inference acceleration.

## Supported Hardware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth listing examples of such hardwrae available in the market

@@ -0,0 +1,325 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a separate runner? can the executor_runner available not be used? cc @tarun292

Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I left a bunch of comments. Mostly for readme files and whatever I randomly glanced.

I would like someone from Intel to do detail review on actual tests perhaps. and maybe the partitioner and quantizer code.

At high level:

  • Is there a way to add CI test?
  • Can we add performance numbers in the summary for the PR?
  • Optionally compare them with XNNPACK backend

I also want to see if we need to add these files to lint and have CODEOWNERS appropriately attribute openvino stuff to right owners. @mergennachin can you help with these 2?

@ynimmaga
Copy link
Author

@kimishpatel, thank you for the review and comments. We will start addressing them soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: intel For backend delegation, kernels, demo, etc. from the 3rd-party partner, Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants