Support dumping model cache for OV EP #137

shiyi9801 · 2025-02-13T05:59:07Z

…wer preference and device type

* Pass ORT_API_VERSION to `OrtApiBase::GetApi()` Also removes the inclusion of onnx.pb.h header. * Add third_party/onnxruntime_headers Import https://github.com/microsoft/onnxruntime/tree/main/include Commit is based on microsoft/onnxruntime#23223 * Use ORT Model Builder API * Refactor scoped ORT type ptr 1. Rename to ScopedOrtTypePtr 2. Use macros 3. Introduce `operator T*()` 4. Introduce `Release()` method 5. Rename `get_ptr()` to `Get()` 6. Rename `get_pptr()` to `GetAddressOf()` * Remove ONNX Runtime headers from third_party/microsoft_dxheaders

* Introduce webnn_use_ort build flag and enable it for Windows * Introduce --webnn-use-ort switch It would override DirectML backend when it is enabled. * Remove the non-working DML EP code path for GPU and NPU All context options would use CPU EP for now. OpenVINO EP will be used for GPU and NPU devices. * Allow loading onnxruntime.dll from system folder

1. Add reduce and instance_norm ops 2. Refactor some codes including : rename `uint64_t NewInitializerAsRawData` ==> `std::string CreateInitializerAsRawData` remove unused `ORT_ABORT_ON_ERROR`

Update ORT headers to the latest Model Builder API: microsoft/onnxruntime@4e2d061 According to the latest API, the node will own attributes. This PR releases attributes after calling `AddNode()`. This PR also changes the `CreateAttribute()` to return a `ScopedOrtOpAttrPtr` to simplify the code.

When `--webnn-ort-use-openvino` switch is used, OpenVINO EP will be used for all WebNN contexts. WebNN device type will map to OpenVINO EP device type. With this change, developers (like me) can test OpenVINO EP on CPU. (My dev machine doesn't have Intel GPU or NPU). Usage: 1. Build OpenVINO EP by following: https://onnxruntime.ai/docs/build/eps.html#openvino **Note**: Please use OpenVINO version >= 2024.4 (tested on 2024.6) 3. Copy the following DLLs into Chromium build folder or version folder ``` onnxruntime.dll onnxruntime_providers_shared.dll onnxruntime_providers_openvino.dll ``` 4. Ensure OpenVINO environment variables are set, i.e. ``` "C:\Program Files (x86)\Intel\openvino_2024\setupvars.bat" ``` 5. Need to append --no-sandbox to load nessceary DLLs into GPU process, i.e. ``` chrome.exe --webnn-use-ort --use-redist-ort --webnn-ort-use-openvino --no-sandbox ```

1. Remove the unnecessary parameter ` OperandDataType data_type` of `CreateInitializer` method, map the data type to onnx tensor type 2. Add a helper method `CreateScalarInitializer` to create scalar with empty shape

) This PR extracts environment, allocator and memory info out of `AllocatorOrt` and eliminates the need for it at the current stage. Environment should be initialized earlier before any other ORT API calls (e.g. using logger) and must be released after releasing all sessions (otherwise #75). Environment is reference counted. The first `CreateEnv()` will create the instance and following `CreateEnv()` increase its reference count and returns the reference to the instance. Upon the last reference is removed, the environment instance is released. This PR puts a reference of `OrtEnv` inside `GraphImplOrt::Session` prior to `OrtSession` to ensure the releasing order. At the current stage, we only use CPU allocator, so we can just get the pointer of the default CPU allocator. The memory info can be just CPU memory info. It is unclear whether and how we need a custom allocator for particular device. #65 Other changes include: 1. Introduce `TensorImplOrt::Create()` which allows to report an error for any ORT API failures rather than crash. 2. Similarly, allow `GraphImplOrt::CreateAndBuild()` to report an error for any ORT API failures. 3. Use scoped ORT types for `BufferContentOrt`, `OrtEnv`, `OrtSession`, `OrtSessionOptions` and `OrtMemoryInfo`. Fix #75

Fix #92

Fix #87 1. refactor codes for inserting cast node 2. support logical not and fix bugs for all logical operators

Fix #101

Fix #37

Fix #63 This PR renames the operands to make sure each name is unique, refactors `ComputeResources` for using new operands names.

This PR refactors/simplifies codes fo error handling: 1. Define `ScopedOrtValuePtr` which is responsible for releasing the `OrtStatus*` 2. Add some micro definitions, for example: `CALL_ORT_FUNC` will convert the original `OrtStatus*` type to `ScopedOrtValuePtr` 3. Let some methods return `ScopedOrtStatusPtr ` , for example: `ScopedOrtStatusPtr OrtModelBuilder::AddInitializer`

@shiyi9801

PTAL /cc @shiyi9801

* replace base::ranges with std::ranges * add RankRange for some ops

* softmax requires axis so the input can't be a scalar * split requires axis so the input can't be a scalar * triangular only supports input rank >= 2

shiyi9801 · 2025-02-13T05:59:46Z

@huningxin PTAL, thanks!

huningxin · 2025-02-13T07:06:39Z

services/webnn/ort/graph_impl_ort.cc

+    std::string cache_dir;
+    if (dump_directory.has_value()) {
+      cache_dir = base::SysWideToUTF8(dump_directory->value());
+      openvino_options.cache_dir = cache_dir.c_str();


I suppose we should have a separate switch to enable OV model caching. I don't think it is equivalent to SetOptimizedModelFilePath of ORT which is used for ONNX model inspection. OV model caching intends to reduce the graph compilation time.

shiyi9801 and others added 30 commits February 11, 2025 16:40

Prototype WebNN OnnxRuntime backend

181f214

load dll and createsession

f1307d2

add ops#1

dc4bffc

fix bug

86d085d

add a path loading dll from modules and append dml EP according to po…

897329c

…wer preference and device type

implement allocator and mltensor to support dispatching relu model test

4999a4c

Serialize model to Ostring and creat session from memory

cba8e5d

Serialize the model directly to String to avoid copying

16fde87

Replace protobuf with OrtGraph API

bd1beb0

Add ScopedOrtOpAttr

01fccf4

Enable queueable resource for read/write/dispatch (#10)

255fe86

add ops (#11)

e5e2234

Post graph building to thread pool (#13)

0f1e4ea

optimize codes and add services unit tests for Add op (#12)

17bd6cf

Fix bug: keep weights alive for inferencing (#16)

fc0eb49

Support conv2d and transpose ops for ORT backend (#15)

3383664

support matmul (#17)

8137824

Fix Conv2d (#18)

f92804f

enable runAsync and average pool2d (#19)

28cd534

add gemm (#20)

d1a0dbe

Add LpPool2d and MaxPool2d (#21)

16f2325

add where (#22)

b219ad4

Wrap OrtGraph API into OrtModelBuilder

ea41267

Fix a typo for matmul

d759fc2

Add reduce and instance_norm ops, and refactor codes

0b5c978

1. Add reduce and instance_norm ops 2. Refactor some codes including : rename `uint64_t NewInitializerAsRawData` ==> `std::string CreateInitializerAsRawData` remove unused `ORT_ABORT_ON_ERROR`

Support expand and slice ops for ORT backend (#47)

cd791f9

Support saving ONNX model with --webnn-ort-dump-model switch (#55)

8a28685

Link issues to TODOs (#66)

b869c15

lisa0314 and others added 26 commits February 11, 2025 16:40

Implement batchNormalization (#76)

06f1dd7

Fix build error of batchNormalization attribute

ace612b

Add ops: Sigmoid and Gather (#83)

87aa3a0

Map to onnx tensor type (#73)

3b09712

1. Remove the unnecessary parameter ` OperandDataType data_type` of `CreateInitializer` method, map the data type to onnx tensor type 2. Add a helper method `CreateScalarInitializer` to create scalar with empty shape

implement pad (#84)

cf573ee

Add op: ArgMin/Max (#86)

904b326

implement resample2d (#88)

e2aa8ab

implement triangular (#89)

5ee3179

Implement concat (#93)

58a3c55

fix crash issue for L2Poo2d (#97)

76f3d66

[Resample2d] Using default axes as a workaround for OV (#99)

ca73778

Fix #92

refactor codes of inserting cast (#95)

9f4198c

Fix #87 1. refactor codes for inserting cast node 2. support logical not and fix bugs for all logical operators

Fix rebase conflicts (#104)

60fc118

Fix #101

Remove id_to_operand_info which is useless (#100)

5f613df

Support convTranspose2d for ORT backend (#98)

31eb131

support gelu (#107)

044163d

Fix #37

Implement split (#105)

f434487

Rename the operands to "label_id" to ensure the names are unique (#111)

ab6e57b

Fix #63 This PR renames the operands to make sure each name is unique, refactors `ComputeResources` for using new operands names.

Fix unsafe buffer usage warning (#114)

44a7c7f

PTAL /cc @shiyi9801

Fix compilation errors:

a97dfb2

* replace base::ranges with std::ranges * add RankRange for some ops

Fix the rank range of some operators (#136)

11e728b

* softmax requires axis so the input can't be a scalar * split requires axis so the input can't be a scalar * triangular only supports input rank >= 2

enable OV cache

c8569e6

shiyi9801 requested a review from huningxin February 13, 2025 05:59

huningxin reviewed Feb 13, 2025

View reviewed changes

shiyi9801 force-pushed the ort_backend branch from 79e0772 to d984069 Compare February 21, 2025 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dumping model cache for OV EP #137

Support dumping model cache for OV EP #137

shiyi9801 commented Feb 13, 2025

shiyi9801 commented Feb 13, 2025

huningxin Feb 13, 2025

Support dumping model cache for OV EP #137

Are you sure you want to change the base?

Support dumping model cache for OV EP #137

Conversation

shiyi9801 commented Feb 13, 2025

shiyi9801 commented Feb 13, 2025

huningxin Feb 13, 2025

Choose a reason for hiding this comment