Compile API: support for OrtModel input and write output to stream #24740

adrianlizarraga · 2025-05-13T08:03:11Z

Description

Adds API function to compile an OrtModel instance that was created with the model editor API.
- C and C++ support.
- No Python bindings yet until we first add the model editor API to the Python bindings.
Adds API function to write the compiled model to a user-provided output stream.
- C, C++, and Python support

Examples

More examples can be found in the unit tests.

C++

Compiles an OrtModel to a stream that itself writes to a file:

static OrtStatus* ORT_API_CALL MyWrite(void* stream_state, const void* buffer, size_t buffer_num_bytes) {
  std::ofstream* outfile = reinterpret_cast<std::ofstream*>(stream_state);
  outfile->write(reinterpret_cast<const char*>(buffer), buffer_num_bytes);
  return nullptr;  // No error.
}

int main(int argc, char** argv) {
  // Not shown: build OrtModel using editor API ...
  Ort::Model model(opsets);
  model.AddGraph(graph);

  Ort::SessionOptions so;
  so.AppendExecutionProvider("QNN", QnnHTPOptionsWithoutQDQOffloading());

  const ORTCHAR_T* output_model_file = ORT_TSTR("compileapi_ortmodel_ctx.onnx");
  std::ofstream outfile(output_model_file, std::ios::binary);

  // Create model compilation options that load an OrtModel and write the compiled model bytes to MyWrite function
  Ort::ModelCompilationOptions compile_options(*ort_env, so);
  compile_options.SetInputModel(model.GetConst());
  compile_options.SetOutputModelOutStream(MyWrite, reinterpret_cast<void*>(&outfile));  // Set output stream
  compile_options.SetEpContextEmbedMode(true);

  // Compile the model.
  Ort::Status status = Ort::CompileModel(*ort_env, compile_options);
  assert(status.IsOk());
  outfile.close();

  assert(std::filesystem::exists(output_model_file));  // assert model was created.
}

Motivation and Context

Provide flexibility to users of the Compile API that, like those compiling for webnn, that want to open/write files outside of ORT.

Next PR: allow loading input model from an existing file handle.

onnxruntime/core/session/compile_api.cc

adrianlizarraga · 2025-05-14T17:49:09Z

onnxruntime/core/framework/tensorprotoutils.cc

+        file_offset,
+        tensor_byte_size,
+        gsl::make_span(reinterpret_cast<char*>(unpacked_tensor.data()), tensor_byte_size)));
+  }


Note: fixes incomplete handling of external data that is stored in memory with the tag kTensorProtoMemoryAddressTag. I ran into this when serializing an OrtModel (created via editor api) to a model proto. There's a test in this PR that triggers this scenario.

This is re-worked in my OrtValue initializers PR

There's also an external PR to address. What's the preferred approach? Get something in now and replace with the re-worked OrtValue initializers PR?

#24894

skottmckay · 2025-05-14T22:30:38Z

include/onnxruntime/core/session/onnxruntime_c_api.h

+  ORT_API2_STATUS(ModelCompilationOptions_SetInputModel, _In_ OrtModelCompilationOptions* model_compile_options,
+                  _In_ const OrtModel* input_model);


Does this work with both variants of OrtModel? One completely created with the model editor API and one that augments and existing model? I think it's fine to only support the former.

skottmckay · 2025-05-14T22:32:35Z

include/onnxruntime/core/session/onnxruntime_c_api.h

+typedef OrtStatus*(ORT_API_CALL* OrtOutStreamWriteFunc)(_In_ void* stream_state,
+                                                        _In_ const void* buffer,
+                                                        _In_ size_t buffer_num_bytes,
+                                                        _Out_ size_t* num_bytes_written);


do we need num_bytes_written? doesn't that complicate things vs. requiring the function implementer to handle all bytes they were given?

Good call. Removed.

skottmckay · 2025-05-29T06:02:09Z

onnxruntime/core/framework/graph_partitioner.cc

+                                                                                                   model_saving_options);
+    size_t buffer_size = model_proto.ByteSizeLong();
+    ORT_RETURN_IF(buffer_size > static_cast<size_t>(std::numeric_limits<int>::max()),
+                  "Cannot serialize ONNX ModelProto larger than 2GB");


Is this going to be a problem for the WebNN use case?

I'm not sure. As far as I'm aware, onnx (protobuf) does not support files larger than 2gb, so weights would have to be external for large models. Generating external weight files would seem to be an issue for web. hmm

skottmckay · 2025-05-29T06:10:03Z

onnxruntime/core/framework/tensorprotoutils.cc

+        file_offset,
+        tensor_byte_size,
+        gsl::make_span(reinterpret_cast<char*>(unpacked_tensor.data()), tensor_byte_size)));
+  }


There's also an external PR to address. What's the preferred approach? Get something in now and replace with the re-worked OrtValue initializers PR?

#24894

skottmckay · 2025-05-29T06:12:50Z

onnxruntime/core/session/model_compilation_options.h

-  size_t input_model_data_size_ = 0;
+
+  std::variant<std::monostate,                  // Initial state (no input model)
+               std::string,                     // input model path


does this need to support wide chars? IIRC the paths in the C API do and using std::filesystem is a simple way to do that.

Compile API: add suport for OrtModel input

769014f

adrianlizarraga commented May 13, 2025

View reviewed changes

onnxruntime/core/session/compile_api.cc Outdated Show resolved Hide resolved

adrianlizarraga added 6 commits May 13, 2025 01:05

Update onnxruntime/core/session/compile_api.cc

ffcfab9

Fix tensorprotoutils when reading ext data from a memory buffer

025474a

Clean up

94596eb

Add C API to write compiled model to stream

134a702

Serialize proto to stream. need test

7f719a9

Add unit test for compiling to user's write stream

72eeece

adrianlizarraga commented May 14, 2025

View reviewed changes

adrianlizarraga added 4 commits May 14, 2025 11:29

Clean up

2043a36

Handle case where user's write function never writes data (return error)

1ee829f

Reduce repetition in unit tests

a21be47

Add Python bindings for compiling to a stream

78df685

adrianlizarraga requested a review from skottmckay May 14, 2025 21:45

adrianlizarraga marked this pull request as ready for review May 14, 2025 21:45

fix min build unused var

09a90a1

adrianlizarraga changed the title ~~Compile API: add support for OrtModel input and output write stream~~ Compile API: support for OrtModel input and write output to stream May 14, 2025

adrianlizarraga added 2 commits May 14, 2025 17:32

Flush before checking status of write

d12f435

Simplify: expect write function to write entire provided buffer

e5c27c3

skottmckay mentioned this pull request May 29, 2025

Support reading in-memory external data for tensor #24894

Closed

skottmckay reviewed May 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compile API: support for OrtModel input and write output to stream #24740

Compile API: support for OrtModel input and write output to stream #24740

Uh oh!

adrianlizarraga commented May 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

adrianlizarraga May 14, 2025

Uh oh!

yuslepukhin May 14, 2025

Uh oh!

skottmckay May 29, 2025

Uh oh!

skottmckay May 14, 2025

Uh oh!

skottmckay May 14, 2025

Uh oh!

adrianlizarraga May 30, 2025

Uh oh!

skottmckay May 29, 2025

Uh oh!

adrianlizarraga May 30, 2025 •

edited

Loading

Uh oh!

skottmckay May 29, 2025

Uh oh!

skottmckay May 29, 2025

Uh oh!

Uh oh!

		ORT_API2_STATUS(ModelCompilationOptions_SetInputModel, _In_ OrtModelCompilationOptions* model_compile_options,
		_In_ const OrtModel* input_model);

Compile API: support for OrtModel input and write output to stream #24740

Are you sure you want to change the base?

Compile API: support for OrtModel input and write output to stream #24740

Uh oh!

Conversation

adrianlizarraga commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Examples

C++

Motivation and Context

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrianlizarraga May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrianlizarraga commented May 13, 2025 •

edited

Loading

adrianlizarraga May 30, 2025 •

edited

Loading