-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this API likely to be a long-term solution? #7
Comments
In defining the specification for the WebNN operations, we looked into not only TensorFlow and NNAPI but also other popular frameworks e.g. PyTorch, ONNX, and CoreML, to name a few, and found that most building block operations are reusable and share an unusually high degree of similarity across all the frameworks; many of them are in fact identical. When we looked at the current hardware design in the market, especially on the GPU, we found that the same building blocks are also implemented either in the hardware or at the system software level. This is not surprising because the optimizations are there to satisfy the existing software use cases. A note on the TensorFlow's operators: TensorFlow indeed supports over a thousand operators (around 1200), but only about half of that are implemented in CUDA today, the rest are just unimportant. Of that amount, most are implemented as compositions of basic building block operations, which altogether are no more than 150-200. These building block operators are at the level of operations targeted in the WebNN design because it's the level just high enough to be efficiently handled by the hardware today, but also low enough to be used to compose new reusable ones. The count is finite, in small numbers, and not open-ended. As to whether this set of building block operations will stand the test of time, only time will tell. But in my experience working on various platform technologies, I have yet to see a new piece of software technology that would come in and render the existing ones completely obsolete, especially with those that are already supported in a healthy ecosystem i.e. will people stop using convolutions, gemm, or the various activations in their model? I highly doubt it. |
@jbingham do you have links to what the multiple candidates are so we can read about them and give feedback? |
Some relevant links: XLA HLO |
Thanks @jbingham for the pointers!
We did check XLA HLO compatibility when defining WebNN ops for the first-wave models. When defining the higher level WebNN ops, the lower level primitives that the higher level op can be decomposed to are also defined. So according to the table, most of the relevant XLA HLO operations are mapped by corresponding WebNN operations. The exception is ReduceWindow which is used to support pooling operations. However, in other references, e.g. the Tensor Operator Set Architecture, the pooling operations are defined as primitives. We may explore it further.
I did the first round cross checking between TOSA and WebNN. It seems to me that TOSA operations have good mappings to WebNN ops. The gaps include bitwise ops, logical ops and control flows. I believe we can explore these ops with the use cases and models for WebNN. More details are in the following tables. Tensor Operators
Activation Functions
Elementwise Binary Operators
Elementwise Unary Operators
Elementwise Ternary Operators
Comparison Operators
Reduction Operators
Data Layout
Scatter/Gather Operators
Image Operators
Type Conversion
Data Nodes
Custom Operators
Control Flow Operators
|
Thanks for the detailed comparison, @huningxin . IIUC, what we're saying is this: A graph API can accommodate both the higher level ops, like what's defined in ONNX or TF Lite, and the lower level ops, like what's defined in XLA HLO and TOSA. If that's right, we could define a single operation set that includes the union of both sets of operations. Or we could define two operation sets, one for the higher level and one for the lower level. In either case, the graph construction, compilation, and prediction APIs could be the same. Is that an accurate summary? |
Thanks @jbingham , your summary looks accurate to me. According to operations selection, TOSA spec defines a set of principles that may be a useful reference. I am pasting it here for convince. 1.3. Operator SelectionTOSA defines a set of primitive operators to which higher level operators can be lowered in a consistent way. To remain effective and efficient to implement the set of operators must be constrained to a reasonably small set of primitive operations out of which others can be constructed. The following principles govern the selection of operators within TOSA. Table 2. Principles
|
Thanks for sharing those Principles, @huningxin . Do you have a sense of how many of the current NN API operators could be broken down into simpler tensor operations? Or how many of the ~120 ONNX/TF Lite operations could be? |
Regarding to the 47 ops of current WebNN spec, there are 9 decomposable operations, such as ONNX also has a guideline of Proposing and submitting a new operator or function to ONNX where a function can be composed by other ONNX operators. Ping @wchao1115 @gramalingam for more insights. Ping @pyu10055 @miaowang14 for insights of TF Lite / NNAPI. Thanks! |
I've looked at TOSA before as well, and as @huningxin thoroughly enumerated here, from the conceptual standpoint they are not that much different from WebNN or even ONNX. They do share a lot of overlaps because these are all basic building blocks for deep learning neural networks. Even when you look at XLA-HLO, you will find element-wise operators, reductions, convolutions, tensor mutations, normalizations, activations, and recurrent networks. They are all mappable to one another. And while a bigger operation such as a normalization function like However, from the conceptual standpoint, it is still useful to also define all of the smaller operations from which new functions in the future may be composed. One of our design principles for WebNN operation is to also define the lower level operations that, semantically, together compose the bigger operation. The most vivid example of this principle may be in the way we define |
Per discussion on the WebML CG Teleconference – 10 December 2020 this issue can be closed. |
Google has already shared with the group that we believe the operations in Web NN (and ONNX and TF Lite) are too high level to be a good long-term solution for the Web, or for Android, or for ML practitioners more generally.
The number of operations in the Android NN API has grown from 30-40 in 2017 to 120 in 2020. The number of operations in TensorFlow has grown to over 1000. ML researchers are publishing new operations all the time, even daily. The potential number of operations is unbounded. Growth year over year has been 20-30%. That would be really hard to maintain for a web standard. Worse yet, operations fall into disuse, or are superseded, or undergo incompatible changes. The web could be stuck supporting them forever.
Also, given that devices don’t get updated often due to the hardware release cycle and device upgrade cycle, a static set of operations is limited in its ability to meet developers’ and users’ needs.
That’s why the TensorFlow and Android NN API teams are actively working on replacements for the current TensorFlow, TF Lite, and NN API operation sets, with the goal of having something extensible, that does not require defining and growing an operation set at such a rapid rate.
The plan on Android is to replace the current NN API with a lower level instruction set. There are multiple candidates, with no clear winner yet. We want to develop the instruction set in an open, vendor-neutral, standards-based way that would work for Android (an open-source project) as well as the Web -- and including Windows, MacOS and iOS.
This is the plan for the TensorFlow ecosystem too.
In other words, at Google we expect that a graph API -- on Android -- will be obsolete around the time the Web NN might ship in the major browsers. So what should we do?
IIUC, one possible argument is that it’s ok if web APIs are replaced. There’s precedent. It’s more important for the web to evolve and provide better solutions for web developers, even if those have a lifespan of just a few years.
And to be fair, the new solutions don’t exist yet, and there’s a risk they might not be available for a long time. Is it better to move ahead with a tried-and-true approach, modeled after the Android NN API? That was announced in 2017 and is still supported. Even after a replacement launches, the NN API in its present form will continue to be supported for some number of years. Why not give the web the same opportunity?
First, is this the argument that others have for moving forward?
Second, what do the web standards experts think? Is it ok to launch an API we expect to replace in a few years?
If we decide it’s worth moving ahead, even with the risk that we’re shipping a stop-gap solution with significant known limitations, we can talk about how to mitigate those risks, probably in a separate issue.
The text was updated successfully, but these errors were encountered: