Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dumping model cache for OV EP #137

Open
wants to merge 60 commits into
base: ort_backend
Choose a base branch
from

Conversation

shiyi9801
Copy link
Owner

Fix #72

shiyi9801 and others added 30 commits February 11, 2025 16:40
* Pass ORT_API_VERSION to `OrtApiBase::GetApi()`

Also removes the inclusion of onnx.pb.h header.

* Add third_party/onnxruntime_headers

Import https://github.com/microsoft/onnxruntime/tree/main/include

Commit is based on microsoft/onnxruntime#23223

* Use ORT Model Builder API

* Refactor scoped ORT type ptr

1. Rename to ScopedOrtTypePtr
2. Use macros
3. Introduce `operator T*()`
4. Introduce `Release()` method
5. Rename `get_ptr()` to `Get()`
6. Rename `get_pptr()` to `GetAddressOf()`

* Remove ONNX Runtime headers from third_party/microsoft_dxheaders
* Introduce webnn_use_ort build flag and enable it for Windows

* Introduce --webnn-use-ort switch

It would override DirectML backend when it is enabled.

* Remove the non-working DML EP code path for GPU and NPU

All context options would use CPU EP for now. OpenVINO EP will be
used for GPU and NPU devices.

* Allow loading onnxruntime.dll from system folder
1. Add reduce and instance_norm ops
2. Refactor some codes including : 
rename `uint64_t NewInitializerAsRawData` ==> `std::string
CreateInitializerAsRawData`
 remove unused `ORT_ABORT_ON_ERROR`
lisa0314 and others added 26 commits February 11, 2025 16:40
Update ORT headers to the latest Model Builder API:
microsoft/onnxruntime@4e2d061

According to the latest API, the node will own attributes. This PR
releases attributes after calling `AddNode()`.

This PR also changes the `CreateAttribute()` to return a
`ScopedOrtOpAttrPtr` to simplify the code.
When `--webnn-ort-use-openvino` switch is used, OpenVINO EP will be used
for all WebNN contexts. WebNN device type will map to OpenVINO EP device
type.

With this change, developers (like me) can test OpenVINO EP on CPU. (My
dev machine doesn't have Intel GPU or NPU).

Usage:
1. Build OpenVINO EP by following:
https://onnxruntime.ai/docs/build/eps.html#openvino
     **Note**: Please use OpenVINO version >= 2024.4 (tested on 2024.6)
 3. Copy the following DLLs into Chromium build folder or version folder
```
onnxruntime.dll
onnxruntime_providers_shared.dll
onnxruntime_providers_openvino.dll
```
 4. Ensure OpenVINO environment variables are set, i.e.
```
"C:\Program Files (x86)\Intel\openvino_2024\setupvars.bat"
```
5. Need to append --no-sandbox to load nessceary DLLs into GPU process,
i.e.
```
chrome.exe --webnn-use-ort --use-redist-ort --webnn-ort-use-openvino --no-sandbox
```
1. Remove the unnecessary parameter ` OperandDataType data_type` of
`CreateInitializer` method, map the data type to onnx tensor type
2. Add a helper method `CreateScalarInitializer` to create scalar with
empty shape
)

This PR extracts environment, allocator and memory info out of
`AllocatorOrt` and eliminates the need for it at the current stage.

Environment should be initialized earlier before any other ORT API calls
(e.g. using logger) and must be released after releasing all sessions
(otherwise #75). Environment is reference counted. The first
`CreateEnv()` will create the instance and following `CreateEnv()`
increase its reference count and returns the reference to the instance.
Upon the last reference is removed, the environment instance is
released. This PR puts a reference of `OrtEnv` inside
`GraphImplOrt::Session` prior to `OrtSession` to ensure the releasing
order.

At the current stage, we only use CPU allocator, so we can just get the
pointer of the default CPU allocator. The memory info can be just CPU
memory info. It is unclear whether and how we need a custom allocator
for particular device. #65

Other changes include:
1. Introduce `TensorImplOrt::Create()` which allows to report an error
for any ORT API failures rather than crash.
2. Similarly, allow `GraphImplOrt::CreateAndBuild()` to report an error
for any ORT API failures.
3. Use scoped ORT types for `BufferContentOrt`, `OrtEnv`, `OrtSession`,
`OrtSessionOptions` and `OrtMemoryInfo`.

Fix #75
Fix #87 

1. refactor codes for inserting cast node 
2. support logical not and fix bugs for all logical operators
Fix #63

This PR renames the operands to make sure each name is unique, refactors
`ComputeResources` for using new operands names.
This PR refactors/simplifies codes fo error handling:
1. Define `ScopedOrtValuePtr` which is responsible for releasing the
`OrtStatus*`
2. Add some micro definitions, for example:
`CALL_ORT_FUNC` will convert the original `OrtStatus*` type to
`ScopedOrtValuePtr`
3. Let some methods return `ScopedOrtStatusPtr ` , for example:
`ScopedOrtStatusPtr OrtModelBuilder::AddInitializer`
 * replace base::ranges with std::ranges
 * add RankRange for some ops
* softmax requires axis so the input can't be a scalar
* split requires axis so the input can't be a scalar
* triangular only supports input rank >= 2
@shiyi9801 shiyi9801 requested a review from huningxin February 13, 2025 05:59
@shiyi9801
Copy link
Owner Author

@huningxin PTAL, thanks!

std::string cache_dir;
if (dump_directory.has_value()) {
cache_dir = base::SysWideToUTF8(dump_directory->value());
openvino_options.cache_dir = cache_dir.c_str();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we should have a separate switch to enable OV model caching. I don't think it is equivalent to SetOptimizedModelFilePath of ORT which is used for ONNX model inspection. OV model caching intends to reduce the graph compilation time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenVINO EP doesn't support dump optimized models
5 participants