Sync with MSFT rel-1.22.0 #673

ankitm3k · 2025-04-23T06:43:54Z

Description

This PR syncs the target branch with msft rel-1.22.0

### Description remove duplicated file in nodejs package. microsoft#23956

…rator (microsoft#23944) ### Description - Added support for custom position ids and attention masks to the GQA CPU operator (fp32 and fp16) - Added MLAS eltwise add kernel for mask application for FP32 and FP16 - Added unit tests for the added eltwise add MLAS kernel - Modified python tests to test the new GQA inputs ### Motivation and Context Custom position ids and attention mask are required in order to implement speculative decoding in PhiSilica ### Benchmarks All the benchmarks are executed on the GQA op configuration which will be used in the PhiSilica speculative decoding secnario, and the configuration is as follows: - num_heads: 32 - kv_num_heads: 32 - do_rotary: 1 - local_window_size: -1 - head_size: 96 - sequence_length: 6 - packed_qkv: True Benchmarks were executed on Cadmus with Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz In the tables below, column headers are total sequence length values used for benchmarking, and the row values are if the attention bias was used or not. Values are average inference time in ms over 100000 runs. #### Fp16 results | Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 | 2000 | 2500 | 3000 | 3500 | 4000 | |:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------| | Without bias | 0.284054 | 0.257449 | 0.275806 | 0.334123 | 0.458324 | 0.614133 | 0.912791 | 1.38585 | 1.92186 | 2.39203 | 2.88808 | 3.46262 | | With bias | 0.250926 | 0.253072 | 0.279724 | 0.337774 | 0.499058 | 0.585388 | 0.914316 | 1.40701 | 1.87311 | 2.47475 | 3.3906 | 3.47474 | | Runtime increase | -11.66% | -1.7% | +1.42% | +1.09% | +8.89% | -4.68% | +0.17% | +1.53% | -2.54% | +3.46% | +17.4% | +0.35% | #### Fp32 results | Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 | 2000 | 2500 | 3000 | 3500 | 4000 | |:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------|:--------| | Without bias | 0.259049 | 0.270541 | 0.304583 | 0.376708 | 0.554013 | 0.633217 | 1.20696 | 1.65985 | 1.95169 | 2.45807 | 3.05637 | 4.05169 | | With bias | 0.261631 | 0.268002 | 0.300853 | 0.370452 | 0.529865 | 0.735216 | 1.43493 | 1.4385 | 1.99028 | 2.3858 | 2.99425 | 4.80197 | | Runtime increase | +1.0% | -0.94% | -1.22% | -1.66% | -4.36% | +16.11% | +18.89% | -13.34% | +1.98% | -2.94% | -2.03% | +18.52% | --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

This PR adds some workarounds to enable int64 support for some WebNN backends which don't support int64 data type. - Do not fallback ops that are specifically due to the int64 limitation. - Convert all int64 initializer and input values to int32 and handle potential overflow errors. - Register all int64 model inputs and outputs as int32 ml-tensor. - Handle ONNX ops that need inputs or outputs conversion between int64 and int32. e.g. ArgMax, ArgMin, Cast, etc. - Convert int64 output data back to int32. - Disallow int64 outputs as 'ml-tensor' preferredOutputLocation. Fixed microsoft#21401

… Actions (microsoft#24029) ### Description Convert Windows GPU pipelines and Windows OpenVino pipeline to Github Actions

…t#23978) ### Description Fix fp16 const initialization on no-fp16 platform [such as Raspberry PI](microsoft#23957) ### Motivation and Context Resolve microsoft#23957

…roupQueryAttention operator (microsoft#23386) ### Description Add Packed QKV inputs and do_rotary attribute to GQA. ### Motivation and Context  Packed QKV inputs and do_rotary attribute are required for certain models.

### Description This PR re-designs how Whisper is created and supported in ONNX Runtime. The new solution leverages [previous optimization work](microsoft#15473), and it is designed to be used in conjunction with [this work](microsoft/onnxruntime-genai#1229) in ONNX Runtime GenAI. Some of the added changes include: - Re-designed export that creates new ONNX models without needing a `WhisperBeamSearch` op - Creates one encoder model that also pre-computes the cross-attention KV caches (since they only need to be calculated once) - Creates one decoder model that can be used during pre-fill and token generation - Creates one jump-times model that can be used for word-level timestamps - Removes need for a `WhisperBeamSearch` op to chain the encoder and decoder subgraphs - Removes need to duplicate decoder's weights in memory - Previous solution with the `WhisperBeamSearch` op created an encoder-decoder-init model and decoder-with-past model. The decoder was duplicated twice, one in each. - Removes need for separate logic to export the PyTorch model coming from OpenAI vs. the PyTorch model coming from Hugging Face - Re-factors common parameters and logic used in CPU and CUDA attention kernels - Adds `DUMP_STRING` to enable easy logging of intermediate information when running in debug mode to debug a problem. This info is not printed in release mode so it will not impact performance. - Integrates `DecoderMaskedMultiHeadAttention` into `MultiHeadAttention` - Enables past-present buffer sharing in the `MultiHeadAttention` op for improved performance - Adds `cache_indirection` and `past_sequence_length` as new optional inputs to `MultiHeadAttention` - Adds `output_qk` as new optional output to `MultiHeadAttention` - Enables calculating `output_qk` tensor with FP16 or FP32 precision, regardless of the model's precision - CI tests that run end-to-end across various flag combinations that are used by many customers internally and externally The existing solutions are still available if desired. ### Known Issues - The FP32 CPU model with the `WhisperBeamSearch` op and output QK is currently disabled. This is because ONNX Runtime doesn't currently support output QK kernels on CPU, only on CUDA. - The `DecoderMaskedMultiHeadAttention` CPU kernel has a parity mismatch with the `DecoderMaskedMultiHeadAttention` CUDA kernel. - Using `DecoderMaskedMultiHeadAttention` for the FP32 CPU model is not enabled. Currently, it uses `MultiHeadAttention` to avoid the parity mismatch issue. ### Motivation and Context Using the beam search op has made it more difficult to debug and fix errors that are encountered. This new approach is more flexible and more customizable for users (e.g. by running with ONNX Runtime GenAI). It also helps [this issue](microsoft#18216). --------- Co-authored-by: mindest <[email protected]>

…missing (microsoft#24053) ### Description When we fail to load a provider shared DLL in windows, the error is not very specific. Users have to figure out if the onnxruntime file is missing, a cuda file, or cudnn is not installed (and perhaps others). And this is just the cuda provider. It would be far more useful if it would say exactly what file is missing so the user can fix the actual problem. Plus, this will likely result in many fewer github issues regarding this problem, but if they do, they will be much easier to fix. This fix adds a function that will try loading a dll and its dependencies recursively to figure out which file is missing. It uses the OS dbghelp library to do it and is not very complex. This also fixes a many year old bug that was introduced in the change to use FormatMessage in env.cc, where the system error would always be an empty string `error 126 ""` due to passing 0 as the format buffer length. We will now see the more useful `The specified module could not be found.` style error messages. ### Motivation and Context Previously if we fail to load the cuda provider, the error would look like this, which is limited: `unknown file: error: C++ exception with description " onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll"` Now it will look like this if cudnn is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which depends on "cudnn64_9.dll" which is missing. (Error 126: "The specified module could not be found.")` If cuda is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which depends on "cudart64_12.dll" which is missing. (Error 126: "The specified module could not be found.")` And if onnxruntime_providers_cuda.dll is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which is missing. (Error 126: "The specified module could not be found.") `

…t#23928) ### Description  * Update range to build SASS on all arch and PTX on highest arch * when cuda>=12.8, build all arch (including latest blackwell) ### Motivation and Context  https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

@fs-eire

…23968) This change reduces the number of staging buffers used for uploading initializers to the GPU. On the one hand, we early release the upload staging buffers. On the other hand, we use the BufferMapExtendedUsages feature of Dawn on UMA GPUs, which allows us to directly write into the dest GPU buffer without the need of a staging buffer. To achieve this, we need to ensure the UMA GPU buffers are mapped at creation. We have BufferManager to be awared of OnSessionInitializationEnd(), so that it can handle buffer Create() and Upload() calls properly. Credits to @fs-eire for the overall design of implementation.

### Description  Adds naive implementations of ReduceMin, ReduceProd, ReduceL1, ReduceL2, ReduceLogSum, ReduceSumSquare, and ReduceLogSumExp. Will optimize to use shared memory in a later PR. ### Motivation and Context  Increases WebGPU EP operator coverage.

…t#24065) ### Description add bool support to EPContext schema to unblock some models

### Error ```Traceback /onnxruntime/onnxruntime/core/providers/webgpu/reduction/reduction_ops.cc:146 [allow_multi_axes = true] Axes values must be in the range [-rank, rank-1]. Got: 446098880 ```

### Description Upgrade current MacOS-13 to 14 ### Motivation and Context  - [x] Update the RN to 0.73.x+ to have the newer version of boost --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

### Description  Abs and Sign had bfloat16 kernels created but not registered with the CUDA EP. Additionally Sign bfloat16 didn't work. * register bfloat16 kernels with CUDA EP * fix incorrectly named macro by adding 'X' as they add bfloat16 registration * add specialization for bfloat16 to _Sign * copied existing pattern. not sure if there's a better way * update tests ### Motivation and Context  microsoft#23875

…soft#24086) ### Description Improve the OrtValue interface typing and changed `staticmethod` to `classmethod` for constructors to follow python conventions (https://google.github.io/styleguide/pyguide.html#2174-decision).

…icrosoft#24078) The DP4AMatMulQuantize shader needs to make sure that K is divisible by 128. Otherwise, we need align the scale to have shape [M, ceil(K / 128)]. To simplify the shader, we limit that K must be divisible by 128 to apply dp4a matmul.

### Description Add macOS ARM64 pipeline for webgpu. This pipeline is a temporary one. I created this pipeline because the current code already fails on macOS ARM64 for WebGPU EP. Adding this pipeline allows to check the status of the fix, and eventually when the build passes, this pipeline will be merged with the existing macOS arm64 pipeline.

…crosoft#23998) - Renamed all conflicting WebNN methods from `jsep*` to `webnn*`. - WebNN doesn't need flush(), therefore it doesn't need to set `jsepBackend`. This PR addresses issue microsoft/webnn-developer-preview#78

### Description Enables multithreading on FP16 to FP32 cast operator. ### Motivation and Context Improves CPU performance on FP16 models that require casting to FP32.

### Description Move Android CI Pipeline to Github Actions

…#23490) ### Description Cleanup CoreML EP's code to remove the COREML_ENABLE_MLPROGRAM macro. Also, increase MINIMUM_COREML_VERSION(first version we support) to 5 .

…olve warning (microsoft#23847) ### Description Removes namespace from AndroidManifest.XML ### Motivation and Context - Resolves microsoft#21681

### Description Use custom implementation for Pow to fix test failures.

…microsoft#24091) ### Description  There are still some timeout for the pipeline. further extend the timeout to 90 minutes for ARM64-Xcode16-targeting-iphonesimulator. It takes quite a while if all build cache is missing. ### Motivation and Context The pipeline sometimes failed because of timeout. There is a previous PR microsoft#24030 to increase the timeout from 60min to 75 min but it looks like not enough.

…ft#24108) ### Description fix test failure in Reduce operators on macOS ARM64 ``` [E:onnxruntime:ReduceL1, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running ReduceL1 node. Name:'node1' Status Message: webgpu_context.cc:259 Run Uniform variable[0] (output_size) data type mismatch in program "ReduceL1", Expected: u32, Actual: i32 ```

Increases WebGPU EP op coverage.

This PR uses 1d disptach group size and uses workgroup_idx instead of workgroup.x|workgroup.y in case they are normalized.

) ### Description abs_error is slightly loosen from 0.02 to 0.03 to allow test cases on macOS arm64 to pass.

…icrosoft#24416) ### Description Adds session config option (`"session.disable_model_compile"`) that disables model compilation during session initialization. If this option is set to "1", inference session creation will fail with error code ORT_MODEL_REQUIRES_COMPILATION if compilation is required to run the model on any Execution Provider added to the session. Only the following kinds of models are valid when this option is set to "1": - Pre-compiled models that have EPContext nodes for the compiling Execution Providers in the session. - Non-compiled models that run only on non-compiling Execution Providers, like CPU EP. ### Example usage The following example (taken from a unit test) tries to load a model that requires compilation with a session that disables compilation. The session creation fails with error code `ORT_MODEL_REQUIRES_COMPILATION`. Then, the example compiles the model and loads the compiled model successfully. ```C++ // Taken from a unit test ... // Initialize session options with QNN EP Ort::SessionOptions session_options; ProviderOptions provider_options; provider_options["backend_type"] = "htp"; provider_options["offload_graph_io_quantization"] = "0"; session_options.AppendExecutionProvider("QNN", provider_options); session_options.AddConfigEntry(kOrtSessionOptionsDisableEpCompile, "1"); // Disable model compilation! // Create an inference session that fails with error ORT_MODEL_REQUIRES_COMPILATION try { Ort::Session session(*ort_env, input_model_file, session_options); FAIL() << "Expected Session creation to fail but it succeeded"; // Should not get here! } catch (const Ort::Exception& excpt) { OrtErrorCode error_code = excpt.GetOrtErrorCode(); std::string_view error_msg = excpt.what(); ASSERT_EQ(error_code, ORT_MODEL_REQUIRES_COMPILATION); ASSERT_THAT(error_msg, testing::HasSubstr(kQnnExecutionProvider)); } // Session creation failed because the model was not pre-compiled. // Try to compile it now. // Create model compilation options from the session options. Ort::ModelCompilationOptions compile_options(*ort_env, session_options); compile_options.SetInputModelPath(input_model_file); compile_options.SetOutputModelPath(output_model_file); // Compile the model. Ort::Status status = Ort::CompileModel(*ort_env, compile_options); ASSERT_TRUE(status.IsOK()) << status.GetErrorMessage(); // Should be able to create a session with the compiled model and the original session options. Ort::Session session(*ort_env, output_model_file, session_options); ``` ### Motivation and Context Compiling models can take a very long time. Want to have a session option that requires input models that do not need to be compiled.

…microsoft#24463) ### Description Re-enables (and fixes) generation of compiled EpContext models with **both** input and output models stored in buffers. ### Motivation and Context Previous PR microsoft#24176 inadvertently added a check that disabled storing both input and output models in buffers. However, we need this functionality. This was actually a fortunate scenario, as it led to the discovery of a bug.

…oft#24472) ### Description * Rename filename and class name since it supports 4 and 8 bits. * Update HQQWeightOnlyQuantizer to support 8 bits. * Update some comments. ### Motivation and Context microsoft#24384 added 8 bits support for the default weight only quantizer.

…icrosoft#24474) ### Description  Use a pimpl-esque approach so that the winml OrtModel type doesn't conflict with the model editing API OrtModel. ### Motivation and Context  Fix crash due to linker calling the incorrect destructor when there are two different OrtModel types in the global namespace.

…h to int32 (microsoft#24425) Some WebNN backends support limited data types for the input and output of a WebNN graph. However, they can support more data types for intermediate nodes. To address this limitation, we implement a data type fallback mechanism. (Note: Currently, we only support fallback to int32 for certain integer data types.) If a data type is not supported for a graph's input or output but is supported for intermediate nodes, we will: 1. Save the input MLTensor as 'int32' data type, 2. Convert the input data from ORT to int32, 3. Insert a cast operation to WebNN graph to convert the input back to its original data type, 4. Insert a cast operation to WebNN graph to convert the output back to 'int32', 5. Convert the output data from int32 to its original data type.

### Description  Add infrastructure to enable auto EP selection. Device discovery for CPU/GPU/NPU on Windows. Supports internal (CPU/DML/WebGPU) and provider bridge (CUDA) EPs currently. Infrastructure will be used with plugin EPs next. Selection policy implementation will be added next, so in the interim there's a temporary function with manually specified selection so unit tests can cover the end-to-end. ### Motivation and Context  --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adrian Lizarraga <[email protected]>

) ### Description WebNN doesn't support AveragePool with count_include_pad == 1. ### Motivation and Context Support it by adding a pad and calling averagePool2D with pads as 0's.

### Description  Fix some issues. Use adapter number instead of bus number. Bus number doesn't work as expected on VMs. Disable for XBOX build. Needs different handling for adapter lookup. Use adapter number as device_id when creating DML OrtEpDevice. Fix some issues with the metadata. ### Motivation and Context

…mpiled blobs (#678)

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24487) - (microsoft#24466) - (microsoft#24493) - (microsoft#24484) - (microsoft#24494) - (microsoft#24489) - (microsoft#24504) - (microsoft#24510) - (microsoft#24456) - (microsoft#24537) - (microsoft#24501) - (microsoft#24519) - (microsoft#24513) - (microsoft#24539) - (microsoft#24514) - (microsoft#24542) - (microsoft#24585) Not added: Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready - (microsoft#24491) - (microsoft#24509) - (microsoft#24564) --------- Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: xhcao <[email protected]>

#679)

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24487) - (microsoft#24466) - (microsoft#24493) - (microsoft#24484) - (microsoft#24494) - (microsoft#24489) - (microsoft#24504) - (microsoft#24510) - (microsoft#24456) - (microsoft#24537) - (microsoft#24501) - (microsoft#24519) - (microsoft#24513) - (microsoft#24539) - (microsoft#24514) - (microsoft#24542) - (microsoft#24585) Not added: Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready - (microsoft#24491) - (microsoft#24509) - (microsoft#24564) --------- Co-authored-by: vraspar <[email protected]> Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: xhcao <[email protected]>

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24491) - (microsoft#24509) - (microsoft#24564) - (microsoft#24574) - (microsoft#24582) - (microsoft#24584) - (microsoft#24568) - (microsoft#24587) - (microsoft#24563) - (microsoft#24592) - (microsoft#24526) - (microsoft#24552) - (microsoft#24588) - (microsoft#24605) - (microsoft#24606) --------- Co-authored-by: Jing Fang <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: Baiju Meswani <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Mark Schofield <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ashwath Shankarnarayan <[email protected]> Co-authored-by: saurabh <[email protected]> Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: Hector Li <[email protected]>

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24608) - (microsoft#24545) --------- Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Maximilian Müller <[email protected]>

### Description  Add microsoft#24625 ### Motivation and Context  Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: George Wu <[email protected]>

…ft#24630) ### Description Adds microsoft#24629 to the ORT 1.22.0 release branch ### Motivation and Context

…684)

… (microsoft#24638) ### Description Adds support for selection policy delegate directly to the release branch. This is necessary to avoid having to update C# bindings (which are in main but not in the release branch) Based on microsoft#24635 ### Motivation and Context

Co-authored-by: Baiju Meswani <[email protected]>

…t#24651)" (microsoft#24668) This reverts commit 8fbc5d7 which results in packaging pipeline failures

### Description Update the folder name from win-arm64x to win-arm64 since it is invalid RID: https://learn.microsoft.com/en-us/dotnet/core/rid-catalog#windows-rids ### Description cherry-pick from microsoft#24690

Fix pipeline rename conflict to create NuGet release package --------- Co-authored-by: Alex Marin <[email protected]>

…l_sync

fs-eire and others added 30 commits March 14, 2025 11:23

avoid copy unnecessary files for nodejs pkg (microsoft#23992)

41c239d

### Description remove duplicated file in nodejs package. microsoft#23956

Convert Windows GPU pipelines and Windows OpenVino pipeline to Github…

b896666

… Actions (microsoft#24029) ### Description Convert Windows GPU pipelines and Windows OpenVino pipeline to Github Actions

[ARM CPU] Fix fp16 const initialization on no-fp16 platform (microsof…

f22ee08

…t#23978) ### Description Fix fp16 const initialization on no-fp16 platform [such as Raspberry PI](microsoft#23957) ### Motivation and Context Resolve microsoft#23957

add bool support to EPContext schema to unblock some models (microsof…

a46d212

…t#24065) ### Description add bool support to EPContext schema to unblock some models

[WebGPU EP] fix for reduce min/max error on MacOS CI (microsoft#24077)

b3aa5a3

### Error ```Traceback /onnxruntime/onnxruntime/core/providers/webgpu/reduction/reduction_ops.cc:146 [allow_multi_axes = true] Axes values must be in the range [-rank, rank-1]. Got: 446098880 ```

Enable multithreading on FP16 to FP32 cast operator (microsoft#23619)

7fc7d5e

### Description Enables multithreading on FP16 to FP32 cast operator. ### Motivation and Context Improves CPU performance on FP16 models that require casting to FP32.

Move Android CI Pipeline to Github Actions (microsoft#24094)

3488ba3

### Description Move Android CI Pipeline to Github Actions

Cleanup CoreML EP's code to remove COREML_ENABLE_MLPROGRAM (microsoft…

7444fee

…#23490) ### Description Cleanup CoreML EP's code to remove the COREML_ENABLE_MLPROGRAM macro. Also, increase MINIMUM_COREML_VERSION(first version we support) to 5 .

webgpu ep support for argmax/argmin (microsoft#24089)

b626409

[mobile/reactnative] Remove namespace from AndroidManifest.XML to res…

d8ed4da

…olve warning (microsoft#23847) ### Description Removes namespace from AndroidManifest.XML ### Motivation and Context - Resolves microsoft#21681

[WebGPU EP] fix implementation of Pow (microsoft#24088)

80441e4

### Description Use custom implementation for Pow to fix test failures.

[WebGPU EP] Implements CumSum Operator (microsoft#24047)

8d21bf7

Increases WebGPU EP op coverage.

[webgpu] Use 1d dispatch group size (microsoft#24084)

81a8920

This PR uses 1d disptach group size and uses workgroup_idx instead of workgroup.x|workgroup.y in case they are normalized.

[WebGPU] fix test failure in MatMulNBits on macOS ARM64 (microsoft#24109

9dcb99c

) ### Description abs_error is slightly loosen from 0.02 to 0.03 to allow test cases on macOS arm64 to pass.

adrianlizarraga and others added 10 commits April 18, 2025 09:02

[WebNN] Support AveragePool with count_include_pad == 1 (microsoft#24465

9c6351f

) ### Description WebNN doesn't support AveragePool with count_include_pad == 1. ### Motivation and Context Support it by adding a pad and calling averagePool2D with pads as 0's.

feat: ORT GenAI Stateful Compilation changes

d0eada1

feat: Enable EpCtx OVIR Encapsulation

dab2344

ankitm3k requested review from javier-intel and jnagi-intel April 23, 2025 16:49

RyanMetcalfeInt8 and others added 15 commits April 29, 2025 19:44

OpenVINO EP: Only enforce SDK version check on EPCtx models for preco…

8eb9687

…mpiled blobs (#678)

OpenVINO EP: Put infer_request back into queue upon catching exception (

c405425

#679)

Allow epctx->epcotx flow (#681)

651f535

OVEP Stateful: Improve accuracy on NPU for sequence lengths >= 2048 (#…

14ca511

…684)

Publish debug symbols for windows (microsoft#24643) (microsoft#24651)

8fbc5d7

Co-authored-by: Baiju Meswani <[email protected]>

Revert "Publish debug symbols for windows (microsoft#24643) (microsof…

6b0f7c9

…t#24651)" (microsoft#24668) This reverts commit 8fbc5d7 which results in packaging pipeline failures

Qnn nuget package update for arm64x (microsoft#24690) (microsoft#24694)

6c8097a

### Description Update the folder name from win-arm64x to win-arm64 since it is invalid RID: https://learn.microsoft.com/en-us/dotnet/core/rid-catalog#windows-rids ### Description cherry-pick from microsoft#24690

Cherry pick fix for NuGet DML Release package Issue (microsoft#24696)

f217402

Fix pipeline rename conflict to create NuGet release package --------- Co-authored-by: Alex Marin <[email protected]>

ankitm3k force-pushed the msb_rel_sync branch from 79bd422 to 14ca511 Compare May 12, 2025 07:42

ankitm3k added 2 commits May 12, 2025 13:30

Merge remote-tracking branch 'msft_ort_origin/rel-1.22.0' into msb_re…

2793f5e

…l_sync

fix: fixed merge conflict

b7c284a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with MSFT rel-1.22.0 #673

Sync with MSFT rel-1.22.0 #673

ankitm3k commented Apr 23, 2025

Sync with MSFT rel-1.22.0 #673

Are you sure you want to change the base?

Sync with MSFT rel-1.22.0 #673

Conversation

ankitm3k commented Apr 23, 2025

Description