Skip to content

Compile API: support for OrtModel input and write output to stream #24740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented May 13, 2025

Description

  • Adds API function to compile an OrtModel instance that was created with the model editor API.
    • C and C++ support.
    • No Python bindings yet until we first add the model editor API to the Python bindings.
  • Adds API function to write the compiled model to a user-provided output stream.
    • C, C++, and Python support

Examples

More examples can be found in the unit tests.

C++

Compiles an OrtModel to a stream that itself writes to a file:

static OrtStatus* ORT_API_CALL MyWrite(void* stream_state, const void* buffer, size_t buffer_num_bytes) {
  std::ofstream* outfile = reinterpret_cast<std::ofstream*>(stream_state);
  outfile->write(reinterpret_cast<const char*>(buffer), buffer_num_bytes);
  return nullptr;  // No error.
}

int main(int argc, char** argv) {
  // Not shown: build OrtModel using editor API ...
  Ort::Model model(opsets);
  model.AddGraph(graph);

  Ort::SessionOptions so;
  so.AppendExecutionProvider("QNN", QnnHTPOptionsWithoutQDQOffloading());

  const ORTCHAR_T* output_model_file = ORT_TSTR("compileapi_ortmodel_ctx.onnx");
  std::ofstream outfile(output_model_file, std::ios::binary);

  // Create model compilation options that load an OrtModel and write the compiled model bytes to MyWrite function
  Ort::ModelCompilationOptions compile_options(*ort_env, so);
  compile_options.SetInputModel(model.GetConst());
  compile_options.SetOutputModelOutStream(MyWrite, reinterpret_cast<void*>(&outfile));  // Set output stream
  compile_options.SetEpContextEmbedMode(true);

  // Compile the model.
  Ort::Status status = Ort::CompileModel(*ort_env, compile_options);
  assert(status.IsOk());
  outfile.close();

  assert(std::filesystem::exists(output_model_file));  // assert model was created.
}

Motivation and Context

Provide flexibility to users of the Compile API that, like those compiling for webnn, that want to open/write files outside of ORT.

Next PR: allow loading input model from an existing file handle.

file_offset,
tensor_byte_size,
gsl::make_span(reinterpret_cast<char*>(unpacked_tensor.data()), tensor_byte_size)));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: fixes incomplete handling of external data that is stored in memory with the tag kTensorProtoMemoryAddressTag. I ran into this when serializing an OrtModel (created via editor api) to a model proto. There's a test in this PR that triggers this scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is re-worked in my OrtValue initializers PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also an external PR to address. What's the preferred approach? Get something in now and replace with the re-worked OrtValue initializers PR?

#24894

@adrianlizarraga adrianlizarraga marked this pull request as ready for review May 14, 2025 21:45
@adrianlizarraga adrianlizarraga changed the title Compile API: add support for OrtModel input and output write stream Compile API: support for OrtModel input and write output to stream May 14, 2025
Comment on lines +5999 to +6000
ORT_API2_STATUS(ModelCompilationOptions_SetInputModel, _In_ OrtModelCompilationOptions* model_compile_options,
_In_ const OrtModel* input_model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with both variants of OrtModel? One completely created with the model editor API and one that augments and existing model? I think it's fine to only support the former.

typedef OrtStatus*(ORT_API_CALL* OrtOutStreamWriteFunc)(_In_ void* stream_state,
_In_ const void* buffer,
_In_ size_t buffer_num_bytes,
_Out_ size_t* num_bytes_written);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need num_bytes_written? doesn't that complicate things vs. requiring the function implementer to handle all bytes they were given?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Removed.

model_saving_options);
size_t buffer_size = model_proto.ByteSizeLong();
ORT_RETURN_IF(buffer_size > static_cast<size_t>(std::numeric_limits<int>::max()),
"Cannot serialize ONNX ModelProto larger than 2GB");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be a problem for the WebNN use case?

Copy link
Contributor Author

@adrianlizarraga adrianlizarraga May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. As far as I'm aware, onnx (protobuf) does not support files larger than 2gb, so weights would have to be external for large models. Generating external weight files would seem to be an issue for web. hmm

file_offset,
tensor_byte_size,
gsl::make_span(reinterpret_cast<char*>(unpacked_tensor.data()), tensor_byte_size)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also an external PR to address. What's the preferred approach? Get something in now and replace with the re-worked OrtValue initializers PR?

#24894

size_t input_model_data_size_ = 0;

std::variant<std::monostate, // Initial state (no input model)
std::string, // input model path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to support wide chars? IIRC the paths in the C API do and using std::filesystem is a simple way to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants