Skip to content

Sync with MSFT rel-1.22.0 #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 351 commits into
base: msb_release
Choose a base branch
from
Open

Sync with MSFT rel-1.22.0 #673

wants to merge 351 commits into from

Conversation

ankitm3k
Copy link

Description

This PR syncs the target branch with msft rel-1.22.0

fs-eire and others added 30 commits March 14, 2025 11:23
### Description

remove duplicated file in nodejs package.

microsoft#23956
…rator (microsoft#23944)

### Description

- Added support for custom position ids and attention masks to the GQA
CPU operator (fp32 and fp16)
- Added MLAS eltwise add kernel for mask application for FP32 and FP16
- Added unit tests for the added eltwise add MLAS kernel
- Modified python tests to test the new GQA inputs


### Motivation and Context
Custom position ids and attention mask are required in order to
implement speculative decoding in PhiSilica

### Benchmarks

All the benchmarks are executed on the GQA op configuration which will
be used in the PhiSilica speculative decoding secnario, and the
configuration is as follows:

- num_heads: 32
- kv_num_heads: 32
- do_rotary: 1
- local_window_size: -1
- head_size: 96
- sequence_length: 6
- packed_qkv: True

Benchmarks were executed on Cadmus with Snapdragon(R) X 12-core X1E80100
@ 3.40 GHz

In the tables below, column headers are total sequence length values
used for benchmarking, and the row values are if the attention bias was
used or not. Values are average inference time in ms over 100000 runs.

#### Fp16 results

| Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 |
2000 | 2500 | 3000 | 3500 | 4000 |

|:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------|
| Without bias | 0.284054 | 0.257449 | 0.275806 | 0.334123 | 0.458324 |
0.614133 | 0.912791 | 1.38585 | 1.92186 | 2.39203 | 2.88808 | 3.46262 |
| With bias | 0.250926 | 0.253072 | 0.279724 | 0.337774 | 0.499058 |
0.585388 | 0.914316 | 1.40701 | 1.87311 | 2.47475 | 3.3906 | 3.47474 |
| Runtime increase | -11.66% | -1.7% | +1.42% | +1.09% | +8.89% | -4.68%
| +0.17% | +1.53% | -2.54% | +3.46% | +17.4% | +0.35% |

#### Fp32 results

| Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 |
2000 | 2500 | 3000 | 3500 | 4000 |

|:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------|:--------|
| Without bias | 0.259049 | 0.270541 | 0.304583 | 0.376708 | 0.554013 |
0.633217 | 1.20696 | 1.65985 | 1.95169 | 2.45807 | 3.05637 | 4.05169 |
| With bias | 0.261631 | 0.268002 | 0.300853 | 0.370452 | 0.529865 |
0.735216 | 1.43493 | 1.4385 | 1.99028 | 2.3858 | 2.99425 | 4.80197 |
| Runtime increase | +1.0% | -0.94% | -1.22% | -1.66% | -4.36% | +16.11%
| +18.89% | -13.34% | +1.98% | -2.94% | -2.03% | +18.52% |

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR adds some workarounds to enable int64 support for some WebNN
backends which don't support int64 data type.

- Do not fallback ops that are specifically due to the int64 limitation.
- Convert all int64 initializer and input values to int32 and handle
potential overflow errors.
- Register all int64 model inputs and outputs as int32 ml-tensor.
- Handle ONNX ops that need inputs or outputs conversion between int64
and int32. e.g. ArgMax, ArgMin, Cast, etc.
- Convert int64 output data back to int32.
- Disallow int64 outputs as 'ml-tensor' preferredOutputLocation.

Fixed microsoft#21401
… Actions (microsoft#24029)

### Description
Convert Windows GPU pipelines and Windows OpenVino pipeline to Github
Actions
…t#23978)

### Description
Fix fp16 const initialization on no-fp16 platform [such as Raspberry
PI](microsoft#23957)



### Motivation and Context
Resolve microsoft#23957
…roupQueryAttention operator (microsoft#23386)

### Description
Add Packed QKV inputs and do_rotary attribute to GQA.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Packed QKV inputs and do_rotary attribute are required for certain
models.
### Description

This PR re-designs how Whisper is created and supported in ONNX Runtime.
The new solution leverages [previous optimization
work](microsoft#15473), and it is
designed to be used in conjunction with [this
work](microsoft/onnxruntime-genai#1229) in ONNX
Runtime GenAI.

Some of the added changes include:
- Re-designed export that creates new ONNX models without needing a
`WhisperBeamSearch` op
- Creates one encoder model that also pre-computes the cross-attention
KV caches (since they only need to be calculated once)
- Creates one decoder model that can be used during pre-fill and token
generation
- Creates one jump-times model that can be used for word-level
timestamps
- Removes need for a `WhisperBeamSearch` op to chain the encoder and
decoder subgraphs
  - Removes need to duplicate decoder's weights in memory
- Previous solution with the `WhisperBeamSearch` op created an
encoder-decoder-init model and decoder-with-past model. The decoder was
duplicated twice, one in each.
- Removes need for separate logic to export the PyTorch model coming
from OpenAI vs. the PyTorch model coming from Hugging Face
- Re-factors common parameters and logic used in CPU and CUDA attention
kernels
- Adds `DUMP_STRING` to enable easy logging of intermediate information
when running in debug mode to debug a problem. This info is not printed
in release mode so it will not impact performance.
- Integrates `DecoderMaskedMultiHeadAttention` into `MultiHeadAttention`
- Enables past-present buffer sharing in the `MultiHeadAttention` op for
improved performance
- Adds `cache_indirection` and `past_sequence_length` as new optional
inputs to `MultiHeadAttention`
  - Adds `output_qk` as new optional output to `MultiHeadAttention`
- Enables calculating `output_qk` tensor with FP16 or FP32 precision,
regardless of the model's precision
- CI tests that run end-to-end across various flag combinations that are
used by many customers internally and externally

The existing solutions are still available if desired.

### Known Issues

- The FP32 CPU model with the `WhisperBeamSearch` op and output QK is
currently disabled. This is because ONNX Runtime doesn't currently
support output QK kernels on CPU, only on CUDA.
- The `DecoderMaskedMultiHeadAttention` CPU kernel has a parity mismatch
with the `DecoderMaskedMultiHeadAttention` CUDA kernel.
- Using `DecoderMaskedMultiHeadAttention` for the FP32 CPU model is not
enabled. Currently, it uses `MultiHeadAttention` to avoid the parity
mismatch issue.

### Motivation and Context

Using the beam search op has made it more difficult to debug and fix
errors that are encountered. This new approach is more flexible and more
customizable for users (e.g. by running with ONNX Runtime GenAI). It
also helps [this
issue](microsoft#18216).

---------

Co-authored-by: mindest <[email protected]>
…missing (microsoft#24053)

### Description
When we fail to load a provider shared DLL in windows, the error is not
very specific. Users have to figure out if the onnxruntime file is
missing, a cuda file, or cudnn is not installed (and perhaps others).
And this is just the cuda provider. It would be far more useful if it
would say exactly what file is missing so the user can fix the actual
problem.

Plus, this will likely result in many fewer github issues regarding this
problem, but if they do, they will be much easier to fix.

This fix adds a function that will try loading a dll and its
dependencies recursively to figure out which file is missing. It uses
the OS dbghelp library to do it and is not very complex.

This also fixes a many year old bug that was introduced in the change to
use FormatMessage in env.cc, where the system error would always be an
empty string `error 126 ""` due to passing 0 as the format buffer
length. We will now see the more useful `The specified module could not
be found.` style error messages.

### Motivation and Context

Previously if we fail to load the cuda provider, the error would look
like this, which is limited:

`unknown file: error: C++ exception with description "
onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL :
LoadLibrary failed with error 126 "" when trying to load
"C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll"`

Now it will look like this if cudnn is not installed:

`unknown file: error: C++ exception with description
onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error
loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll"
which depends on "cudnn64_9.dll" which is missing. (Error 126: "The
specified module could not be found.")`

If cuda is not installed:

`unknown file: error: C++ exception with description
onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error
loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll"
which depends on "cudart64_12.dll" which is missing. (Error 126: "The
specified module could not be found.")`

And if onnxruntime_providers_cuda.dll is not installed:

`unknown file: error: C++ exception with description
onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error
loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll"
which is missing. (Error 126: "The specified module could not be
found.")
`
…t#23928)

### Description
<!-- Describe your changes. -->
* Update range to build SASS on all arch and PTX on highest arch
* when cuda>=12.8, build all arch (including latest blackwell)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list
…23968)

This change reduces the number of staging buffers used for uploading
initializers to the GPU. On the one hand, we early release the upload
staging buffers. On the other hand, we use the BufferMapExtendedUsages
feature of Dawn on UMA GPUs, which allows us to directly write into the
dest GPU buffer without the need of a staging buffer. To achieve this,
we need to ensure the UMA GPU buffers are mapped at creation. We have
BufferManager to be awared of OnSessionInitializationEnd(), so that it
can handle buffer Create() and Upload() calls properly.

Credits to @fs-eire for the overall design of implementation.
### Description
<!-- Describe your changes. -->

Adds naive implementations of ReduceMin, ReduceProd, ReduceL1, ReduceL2,
ReduceLogSum, ReduceSumSquare, and ReduceLogSumExp. Will optimize to use
shared memory in a later PR.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Increases WebGPU EP operator coverage.
…t#24065)

### Description
add bool support to EPContext schema to unblock some models
### Error

```Traceback
/onnxruntime/onnxruntime/core/providers/webgpu/reduction/reduction_ops.cc:146 [allow_multi_axes = true] Axes values must be in the range [-rank, rank-1]. Got: 446098880
```
### Description
Upgrade current MacOS-13 to 14


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

- [x] Update the RN to 0.73.x+ to have the newer version of boost

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Abs and Sign had bfloat16 kernels created but not registered with the
CUDA EP. Additionally Sign bfloat16 didn't work.
* register bfloat16 kernels with CUDA EP
* fix incorrectly named macro by adding 'X' as they add bfloat16
registration
* add specialization for bfloat16 to _Sign
  * copied existing pattern. not sure if there's a better way
* update tests



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
microsoft#23875
…soft#24086)

### Description

Improve the OrtValue interface typing and changed `staticmethod` to
`classmethod` for constructors to follow python conventions
(https://google.github.io/styleguide/pyguide.html#2174-decision).
…icrosoft#24078)

The DP4AMatMulQuantize shader needs to make sure that K is divisible by
128. Otherwise, we need align the scale
to have shape [M, ceil(K / 128)]. To simplify the shader, we limit that
K must be divisible by 128 to apply dp4a matmul.
### Description

Add macOS ARM64 pipeline for webgpu.

This pipeline is a temporary one. I created this pipeline because the
current code already fails on macOS ARM64 for WebGPU EP. Adding this
pipeline allows to check the status of the fix, and eventually when the
build passes, this pipeline will be merged with the existing macOS arm64
pipeline.
…crosoft#23998)

- Renamed all conflicting WebNN methods from `jsep*` to `webnn*`.
- WebNN doesn't need flush(), therefore it doesn't need to set
`jsepBackend`.

This PR addresses issue microsoft/webnn-developer-preview#78
### Description
Enables multithreading on FP16 to FP32 cast operator.



### Motivation and Context
Improves CPU performance on FP16 models that require casting to FP32.
### Description
Move Android CI Pipeline to Github Actions
…#23490)

### Description
Cleanup CoreML EP's code to remove the COREML_ENABLE_MLPROGRAM macro.
Also, increase MINIMUM_COREML_VERSION(first version we support) to 5 .
…olve warning (microsoft#23847)

### Description
Removes namespace from AndroidManifest.XML



### Motivation and Context
- Resolves microsoft#21681
### Description

Use custom implementation for Pow to fix test failures.
…microsoft#24091)

### Description
<!-- Describe your changes. -->

There are still some timeout for the pipeline. further extend the
timeout to 90 minutes for ARM64-Xcode16-targeting-iphonesimulator.

It takes quite a while if all build cache is missing.

### Motivation and Context

The pipeline sometimes failed because of timeout. There is a previous PR
microsoft#24030 to increase the timeout from 60min to 75 min but it looks like
not enough.
…ft#24108)

### Description

fix test failure in Reduce operators on macOS ARM64

```
[E:onnxruntime:ReduceL1, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running ReduceL1 node. Name:'node1' Status Message: webgpu_context.cc:259 Run Uniform variable[0] (output_size) data type mismatch in program "ReduceL1", Expected: u32, Actual: i32
```
This PR uses 1d disptach group size and uses workgroup_idx instead of
workgroup.x|workgroup.y in case they are normalized.
)

### Description

abs_error is slightly loosen from 0.02 to 0.03 to allow test cases on
macOS arm64 to pass.
adrianlizarraga and others added 10 commits April 18, 2025 09:02
…icrosoft#24416)

### Description
Adds session config option (`"session.disable_model_compile"`) that
disables model compilation during session initialization.

If this option is set to "1", inference session creation will fail with
error code ORT_MODEL_REQUIRES_COMPILATION if compilation is required to
run the model on any Execution Provider added to the session. Only the
following kinds of models are valid when this option is set to "1":
- Pre-compiled models that have EPContext nodes for the compiling
Execution Providers in the session.
- Non-compiled models that run only on non-compiling Execution
Providers, like CPU EP.

### Example usage
The following example (taken from a unit test) tries to load a model
that requires compilation with a session that disables compilation. The
session creation fails with error code `ORT_MODEL_REQUIRES_COMPILATION`.
Then, the example compiles the model and loads the compiled model
successfully.

```C++
  // Taken from a unit test ...

  // Initialize session options with QNN EP
  Ort::SessionOptions session_options;
  ProviderOptions provider_options;
  provider_options["backend_type"] = "htp";
  provider_options["offload_graph_io_quantization"] = "0";

  session_options.AppendExecutionProvider("QNN", provider_options);
  session_options.AddConfigEntry(kOrtSessionOptionsDisableEpCompile, "1");  // Disable model compilation!

  // Create an inference session that fails with error ORT_MODEL_REQUIRES_COMPILATION
  try {
    Ort::Session session(*ort_env, input_model_file, session_options);
    FAIL() << "Expected Session creation to fail but it succeeded";  // Should not get here!
  } catch (const Ort::Exception& excpt) {
    OrtErrorCode error_code = excpt.GetOrtErrorCode();
    std::string_view error_msg = excpt.what();
    ASSERT_EQ(error_code, ORT_MODEL_REQUIRES_COMPILATION);
    ASSERT_THAT(error_msg, testing::HasSubstr(kQnnExecutionProvider));
  }

  // Session creation failed because the model was not pre-compiled.
  // Try to compile it now.

  // Create model compilation options from the session options.
  Ort::ModelCompilationOptions compile_options(*ort_env, session_options);
  compile_options.SetInputModelPath(input_model_file);
  compile_options.SetOutputModelPath(output_model_file);

  // Compile the model.
  Ort::Status status = Ort::CompileModel(*ort_env, compile_options);
  ASSERT_TRUE(status.IsOK()) << status.GetErrorMessage();

  // Should be able to create a session with the compiled model and the original session options.
  Ort::Session session(*ort_env, output_model_file, session_options);
```

### Motivation and Context
Compiling models can take a very long time. Want to have a session
option that requires input models that do not need to be compiled.
…microsoft#24463)

### Description
Re-enables (and fixes) generation of compiled EpContext models with
**both** input and output models stored in buffers.

### Motivation and Context
Previous PR microsoft#24176 inadvertently added a check that disabled storing
both input and output models in buffers. However, we need this
functionality. This was actually a fortunate scenario, as it led to the
discovery of a bug.
…oft#24472)

### Description

* Rename  filename and class name since it supports 4 and 8 bits.
* Update HQQWeightOnlyQuantizer to support 8 bits.
* Update some comments.

### Motivation and Context
microsoft#24384 added 8 bits support
for the default weight only quantizer.
…icrosoft#24474)

### Description
<!-- Describe your changes. -->
Use a pimpl-esque approach so that the winml OrtModel type doesn't
conflict with the model editing API OrtModel.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix crash due to linker calling the incorrect destructor when there are
two different OrtModel types in the global namespace.
…h to int32 (microsoft#24425)

Some WebNN backends support limited data types for the input and output
of a WebNN graph. However, they can support more data types for
intermediate nodes. To address this limitation, we implement a data type
fallback mechanism. (Note: Currently, we only support fallback to int32
for certain integer data types.)

If a data type is not supported for a graph's input or output but is
supported for intermediate nodes, we will:
1. Save the input MLTensor as 'int32' data type,
2. Convert the input data from ORT to int32,
3. Insert a cast operation to WebNN graph to convert the input back to
its original data type,
4. Insert a cast operation to WebNN graph to convert the output back to
'int32',
5. Convert the output data from int32 to its original data type.
### Description
<!-- Describe your changes. -->
Add infrastructure to enable auto EP selection.

Device discovery for CPU/GPU/NPU on Windows.
Supports internal (CPU/DML/WebGPU) and provider bridge (CUDA) EPs
currently.
Infrastructure will be used with plugin EPs next.

Selection policy implementation will be added next, so in the interim
there's a temporary function with manually specified selection so unit
tests can cover the end-to-end.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adrian Lizarraga <[email protected]>
)

### Description
WebNN doesn't support AveragePool with count_include_pad == 1.



### Motivation and Context
Support it by adding a pad and calling averagePool2D with pads as 0's.
### Description
<!-- Describe your changes. -->
Fix some issues.
Use adapter number instead of bus number. Bus number doesn't work as
expected on VMs.
Disable for XBOX build. Needs different handling for adapter lookup. 
Use adapter number as device_id when creating DML OrtEpDevice.
Fix some issues with the metadata. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
RyanMetcalfeInt8 and others added 15 commits April 29, 2025 19:44
### Description

Cherry pick the following into
[rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0)


- (microsoft#24487)
- (microsoft#24466)
- (microsoft#24493)
- (microsoft#24484)
- (microsoft#24494)
- (microsoft#24489)
- (microsoft#24504)
- (microsoft#24510)
- (microsoft#24456)
- (microsoft#24537)
- (microsoft#24501)
- (microsoft#24519)
- (microsoft#24513)
- (microsoft#24539)
- (microsoft#24514)
- (microsoft#24542)
- (microsoft#24585)

Not added:

Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing
cuda pipeline is ready
- (microsoft#24491)
- (microsoft#24509)
- (microsoft#24564)

---------

Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: minfhong-quic <[email protected]>
Co-authored-by: minfhong-quic <[email protected]>
Co-authored-by: Justin Chu <[email protected]>
Co-authored-by: Prathik Rao <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Ankan Banerjee <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Gaurav Garg <[email protected]>
Co-authored-by: iraut <[email protected]>
Co-authored-by: Hrishikesh Manohar <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: kunal-vaishnavi <[email protected]>
Co-authored-by: xhcao <[email protected]>
### Description

Cherry pick the following into
[rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0)


- (microsoft#24487)
- (microsoft#24466)
- (microsoft#24493)
- (microsoft#24484)
- (microsoft#24494)
- (microsoft#24489)
- (microsoft#24504)
- (microsoft#24510)
- (microsoft#24456)
- (microsoft#24537)
- (microsoft#24501)
- (microsoft#24519)
- (microsoft#24513)
- (microsoft#24539)
- (microsoft#24514)
- (microsoft#24542)
- (microsoft#24585)

Not added:

Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing
cuda pipeline is ready
- (microsoft#24491)
- (microsoft#24509)
- (microsoft#24564)

---------

Co-authored-by: vraspar <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: minfhong-quic <[email protected]>
Co-authored-by: minfhong-quic <[email protected]>
Co-authored-by: Justin Chu <[email protected]>
Co-authored-by: Prathik Rao <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Ankan Banerjee <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Gaurav Garg <[email protected]>
Co-authored-by: iraut <[email protected]>
Co-authored-by: Hrishikesh Manohar <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: kunal-vaishnavi <[email protected]>
Co-authored-by: xhcao <[email protected]>
### Description

Cherry pick the following into
[rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0)

- (microsoft#24491)
- (microsoft#24509)
- (microsoft#24564)
- (microsoft#24574)
- (microsoft#24582)
- (microsoft#24584)
- (microsoft#24568)
- (microsoft#24587)
- (microsoft#24563)
- (microsoft#24592)
- (microsoft#24526)
- (microsoft#24552)
- (microsoft#24588)
- (microsoft#24605)
- (microsoft#24606)

---------

Co-authored-by: Jing Fang <[email protected]>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: Baiju Meswani <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Mark Schofield <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Ashwath Shankarnarayan <[email protected]>
Co-authored-by: saurabh <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Hector Li <[email protected]>
### Description

Cherry pick the following into
[rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0)

- (microsoft#24608)
- (microsoft#24545)

---------

Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
### Description
<!-- Describe your changes. -->
Add microsoft#24625 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: George Wu <[email protected]>
…ft#24630)

### Description
Adds microsoft#24629 to the ORT
1.22.0 release branch

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
… (microsoft#24638)

### Description
Adds support for selection policy delegate directly to the release
branch. This is necessary to avoid having to update C# bindings (which
are in main but not in the release branch)

Based on microsoft#24635

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…t#24651)" (microsoft#24668)

This reverts commit 8fbc5d7 which
results in packaging pipeline failures
### Description
Update the folder name from win-arm64x to win-arm64 since it is invalid
RID:
https://learn.microsoft.com/en-us/dotnet/core/rid-catalog#windows-rids

### Description
cherry-pick from microsoft#24690
Fix pipeline rename conflict to  create NuGet release package

---------

Co-authored-by: Alex Marin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.