-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openvino backend for Executorch to enable inference on Intel CPUs, GPUs, NPUs #8573
base: main
Are you sure you want to change the base?
Conversation
…orch into openvino_backend
Handling multiple inputs/outputs with zero-copy
Added fallback with portable kernels
Enhancements to openvino example
Initial unit tests for OpenVINO backend
else torch.per_tensor_affine | ||
) | ||
if is_weight: | ||
observer = PerChannelMinMaxObserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there support for 4 bit?
def transform_for_annotation( | ||
self, model: torch.fx.GraphModule | ||
) -> torch.fx.GraphModule: | ||
nncf_fx.transformations.fold_constant_except_qdq(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add comment on what this does? why do you need to constant fold here? It doesnt sound like the right thing for this API which is really meant for decomposing ops when decomposed op can be quantized but not undecoposed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to avoid constant branches quantization (which leads to overhead). A comment with a description will be added soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not avoid doing those in the quantizer?
@@ -0,0 +1,9 @@ | |||
datasets | |||
huggingface-hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why hf hub is a requirement?
sentencepiece | ||
tokenizers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which of these requirements need to be in this folder?
@@ -0,0 +1,205 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cccclai you added a new API to get backend names, right? Does this file need to account for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it should be fine. We are at beta now and need to account for BC/FC :)
ET_LOG(Error, "OpenVINO is not available: %s", e.what()); | ||
} catch (...) { | ||
// Handle any unexpected errors | ||
ET_LOG( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we just logging error instead of actually raising it?
} | ||
|
||
// Import the model | ||
auto compiled_model = core.import_model(compiled_stream, device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this interface requires stream?
ExecutionHandle* handle = | ||
ET_ALLOCATE_INSTANCE_OR_RETURN_ERROR(allocator, ExecutionHandle); | ||
handle->compiled_model = std::make_shared<ov::CompiledModel>(compiled_model); | ||
handle->infer_request = infer_request; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you not want to free processed
once consumed?
@@ -0,0 +1,69 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our naming convention on files is .h/.cpp
using namespace std; | ||
using executorch::aten::ScalarType; | ||
using executorch::runtime::ArrayRef; | ||
using executorch::runtime::Backend; | ||
using executorch::runtime::BackendExecutionContext; | ||
using executorch::runtime::BackendInitContext; | ||
using executorch::runtime::CompileSpec; | ||
using executorch::runtime::DelegateHandle; | ||
using executorch::runtime::Error; | ||
using executorch::runtime::EValue; | ||
using executorch::runtime::FreeableBuffer; | ||
using executorch::runtime::MemoryAllocator; | ||
using executorch::runtime::Result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically I would prefer to alias namespace if you want to shorten typing but blanket using entire namespace reduces readability
|
||
|
||
# Build the project | ||
cmake --build cmake-openvino-out --target install --config Release -j5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you building this in a separate outut directory?
`test_runner.py` allows to run op or model tests for openvino backend. | ||
|
||
### **Arguments** | ||
- **`--build_folder`** (required): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed because you need to load library? I would consider making this part of pip package installed in site-packages.
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
atol = 1e-1 | ||
rtol = 1e-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like extremely low rtol/atol?
with open(pte_fname, "wb") as file: | ||
exec_prog.write_to_file(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python has temporary directory/file apis. Plus you dont need to write files, actually doing this under test is extremely flaky so I advise against it. ET is load_from_buffer APIs
env = dict(os.environ) | ||
proc = subprocess.run( | ||
cmd, | ||
stdout=subprocess.PIPE, | ||
stderr=subprocess.STDOUT, | ||
env=env, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also dont quite like doing this unit tests. DOesnt sound right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this done executor runner sanity check?
## Directory Structure | ||
|
||
``` | ||
executorch | ||
├── backends | ||
│ └── openvino | ||
│ ├── runtime | ||
│ ├── OpenvinoBackend.cpp | ||
│ └── OpenvinoBackend.hpp | ||
│ ├── scripts | ||
│ └── openvino_build.sh | ||
│ ├── tests | ||
│ ├── CMakeLists.txt | ||
│ ├── README.md | ||
│ ├── __init__.py | ||
│ ├── openvino_functions.yaml | ||
│ ├── partitioner.py | ||
│ ├── preprocess.py | ||
│ └── requirements.txt | ||
└── examples | ||
│ └── openvino | ||
│ ├── aot | ||
│ ├── README.md | ||
│ └── aot_openvino_compiler.py | ||
│ └── executor_runner | ||
│ └── openvino_executor_runner.cpp | ||
│ ├── CMakeLists.txt | ||
│ ├── README.md | ||
└── └── openvino_build_example.sh | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THis does not need to be here in a tutorial doc
## Build Instructions for Examples | ||
|
||
### AOT step: | ||
Refer to the [README.md](aot/README.md) in the `aot` folder for detailed instructions on exporting deep learning models from various model suites (TIMM, Torchvision, Hugging Face) to openvino backend using Executorch. Users can dynamically specify the model, input shape, and target device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this aot folder under examples?
Below is an example to export a ResNet50 model from Torchvision model suite for CPU device with an input shape of `[1, 3, 256, 256]` | ||
|
||
```bash | ||
cd aot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd aot from where?
generate_bindings_for_kernels( | ||
LIB_NAME "openvino_portable_ops_lib" FUNCTIONS_YAML | ||
${EXECUTORCH_ROOT}/backends/openvino/openvino_functions.yaml | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont quite follow why you need to build this? or have openvino_functions.yaml in the first place
@@ -0,0 +1,115 @@ | |||
# **Model Export Script for Executorch** | |||
|
|||
This script allows users to export deep learning models from various model suites (TIMM, Torchvision, Hugging Face) to a openvino backend using **Executorch**. Users can dynamically specify the model, input shape, and target device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which script? This seems like readme file. Is this needed btw or you have the examples/openvino/readme.md covering all that you need
Supported values: | ||
- `timm` (e.g., VGG16, ResNet50) | ||
- `torchvision` (e.g., resnet18, mobilenet_v2) | ||
- `huggingface` (e.g., bert-base-uncased). NB: Quantization and validation is not supported yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why quant is not supported on hf models?
- `nncf`: `nncf quantize_pt2e` API (default) | ||
- `pt2e`: torch ao quantization pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between these two? is nncf quantizer not compatible with pt2e flow?
# OpenVINO Backend for ExecuTorch | ||
The OpenVINO backend enables optimized execution of deep learning models on Intel hardware, leveraging Intel's [OpenVINO toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html) for inference acceleration. | ||
|
||
## Supported Hardware |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth listing examples of such hardwrae available in the market
@@ -0,0 +1,325 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need a separate runner? can the executor_runner available not be used? cc @tarun292
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I left a bunch of comments. Mostly for readme files and whatever I randomly glanced.
I would like someone from Intel to do detail review on actual tests perhaps. and maybe the partitioner and quantizer code.
At high level:
- Is there a way to add CI test?
- Can we add performance numbers in the summary for the PR?
- Optionally compare them with XNNPACK backend
I also want to see if we need to add these files to lint and have CODEOWNERS appropriately attribute openvino stuff to right owners. @mergennachin can you help with these 2?
@kimishpatel, thank you for the review and comments. We will start addressing them soon. |
Co-authored-by: Kimish Patel <[email protected]>
Summary
This PR introduces support for the OpenVINO backend in Executorch, enabling accelerated inference on Intel hardware, including CPU, GPU, and NPU devices. OpenVINO optimizes deep learning model performance by leveraging hardware-specific enhancements. The PR also introduces the OpenVINO quantizer with NNCF (Neural Network Compression Framework) for model optimization. The functionality has been tested on several torchvision and timm models, with plans to test and enable support for additional model types in the future.
Below is a description of the features:
OpenVINO Backend Integration: The backends/openvino directory includes build scripts, AOT components (partitioner, preprocesser), OpenVINO Quantizer, and runtime backend files that register the OpenVINO backend, manage OpenVINO’s inference engine interactions, including model execution, device-specific optimizations, and backend initialization. It also contains tests for layers and models. See backends/openvino/README.md for usage.
OpenVINO Examples: The examples/openvino directory provides scripts for AOT optimization, quantization, and C++ executor examples. It includes instructions for optimizing the models, quantizing them, and exporting Executorch programs with OpenVINO optimizations. Refer to examples/openvino/README.md for details.
E2E Tutorial: Added an end-to-end tutorial in docs/source/build-run-openvino.md.
Test plan
This PR is tested with OpenVINO backend on Intel Core Ultra 7 processors for CPU, GPU, and NPU devices. To run the layer tests and model tests, please refer to backends/openvino/tests/README.md
cc: @yury-gorbachev @alexsu52 @cavusmustafa @daniil-lyakhov @suryasidd @AlexKoff88 @MaximProshin @AlexanderDokuchaev