-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard model format #28
Comments
I'm wondering if you could outline the key trade-offs and decisions to be made here? Particularly in regard to the key points discussed in this issue. Would you say that there are any elements of that discussion that have been resolved, or that have become more "clear"? Some points raised in that thread:
With my current (limited) understanding, it seems like, with the pace at which the ML ecosystem is moving/changing, it'd be better to develop a low-level spec that models the fundamental compute paradigm that ML leans on (similar to the goals of WebGPU and Wasm specs for their own respective paradigms) rather than chasing higher-level instance/use-cases of that paradigm. It would be helpful to hear some thoughts on this perspective/framing from someone with more experience. |
One benefit of the Model Loader approach is that it somewhat dodges the issue of ops vs instructions, MLIR, and XLA. Whatever format the community chooses is what's supported. If ML evolves enough, the group can add another format. For context, Microsoft in particular was pretty insistent that there needs to be a model format that's guaranteed to be supported everywhere, for developer sanity. It makes sense. That requirement is written into the ML Working Group Charter. Given the pace of evolution in ML, browser vendors will want to add additional operations or options to optimize for the OS or device. As they do, the official supported format will evolve too. Having a common supported format as a baseline will make it much easier for developers to provide a good experience across the Web. In the context of the Web NN proposal, the approach has been to decompose all of the higher-level operations into their lowest-level building blocks. The hope is that the building blocks allow composition and support new operations as they're invented. The implementation can use tricks like what you mention -- such as, recognize when the low-level instructions are combined into a higher-level operation that has special acceleration. In theory, an MLIR or XLA format could be parsed with a javascript library, and converted into calls to Web NN to build a graph of operations/instructions. |
The Web ML Working Group charter says "The Model Loader API needs a standard format supported across browsers and devices for broad interoperability."
I'm creating this issue to ask, what should the standard format be?
For now, the prototype is using TensorFlow Lite. All models start off in either the TensorFlow SavedModel format or Keras format.
See the list of TF Lite operations that are supported.
The reason TF Lite doesn't just use SavedModel, a protocol buffer format, is mainly to reduce the file size. In addition to some compression, the conversion process supports post-training quantization.
Some more thoughts:
Operator definitions: The community group has spent a lot of time on the operator definitions for the WebNN spec. We should keep those for Model Loader.
Serialization format: Apple's CoreML, TensorFlow's SavedModel, and Microsoft's ONNX all use protocol buffers as the serialization format. TensorFlow Lite converts the SavedModel to a flatbuffer. I'm not sure if Apple or Microsoft also have compression or conversion steps, or if they use the protobuf directly.
The text was updated successfully, but these errors were encountered: