-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebGPU fitness for ML frameworks #66
Comments
Cc @Kangz I'm afraid I don't have any insights on this for now. |
WebGPU provide compute shaders, by themselves they allow using "shared workgroup memory" which is nice but not the best you can do in native GPU ML today. Next are subgroup operations that could be a WebGPU extension and that some people are already looking at. And finally there's the cooperative matrix multiply (marketed as "tensor cores" by Nvidia): it might become a WebGPU extension if it becomes supported by more than one HW vendor. |
Thanks @Kangz, very useful! here is the link to the current discussion on subgroup operations for others' benefits. |
So one thing we wanted to find out is if there is a way to have garbage collection much like JS currently has, but for GPU related activities too. Right now we made TF.tidy() to somewhat deal with the release of memory when finished but newer users take time to realise this exists and it would be better if consistent with how JS generally functions - most JS devs do not even think about memory management as they are used to the JS garbage collector doing its thing. |
There isn't really a way to do automatic GC of GPU resources, and this can be seen in WebGL's |
Newly added is a SIMD operations in WebGPU for ML talk by @mehmetoguzderin discussing proposed subgroup operations @Kangz mentioned. @mehmetoguzderin feel free to provide your further perspectives in this issue for workshop discussions. Also please review other workshop talks relevant to this issue as well as the WebNN API spec and its open issues. In particular, the WebNN API issue webmachinelearning/webnn#6 discussing custom ops using lower-level APIs such as WebGPU. |
@anssiko WebNN is very interesting; I will have a look at it. And I will provide input in this repository for anything workshop related. Thanks for the mention. |
Now a sample code that uses SIMD operations is available in the repository of my talk. For the speed benchmark's chart that compares SIMD to alternative methods, please check out the main README.md, and for the code itself, please check out the samples folder. (Code is written in Vulkan and GLSL but structured enough to give a general idea): |
I'd like to offer a different take in response to @jasonmayes' question in his talk:
As we know, most meaningful ML accelerations are rooted in the underlying hardware, and the work to surface such capabilities have been concentrating in the OS layer where the actual interaction between the platform tech and the hardware drivers meet. This is true for Windows, Linux, Android and MacOS. It is done this way because the hardware difference in the ecosystem is diverse and that hardware abstraction is a problem the OS is very good at. WebNN is designed to provide an ML-specific path for the web platform to leverage OS native ML functionality that make use of this hardware acceleration in a more consistent and manageable way. So instead of relying on low-level, general-purpose compute constructs such as WebGL or WebGPU shaders, an ML framework could leverage native ML constructs more directly through an ML-specific web API like WebNN by letting it carry out platform-specific acceleration in the OS layer under the hood. In the case of DirectML, in addition to providing a very optimized version of the compute-based ML implementation, being an OS component, it also leverages fine-grained interaction with the underlying compute drivers in the OS stack to maximize runtime performance and reduce latency; when appropriate, it provides short-cuts to operation-specific underlying capabilities based on hardware's availability. As discussed in my talk, we've so far been reasonably successful with the integration of DirectML to both ONNX and TensorFlow. DirectML functionality can be mapped through WebNN. |
During the Zoom, I asked about whether subgroups were the right level to seek portability, and if it might be better to target a DSL like Halide or MLIR as the portable abstraction layer. The challenges of making anything at the level of OpenCL subgroups portable are:
At least for some ML workloads, the second category are more useful, and a better target than vector operations. Background
|
Thanks a lot for the feedback, @jeffhammond An essential aspect of the SIMD proposal for WebGPU is the restricted set of operations exposed in itself. For example, shuffle operations and indexed accesses don't exist at all; this stems from the concerns they bring, and because not all target native APIs have those operations. Demonstrated in the sample I provided for this workshop, even with a safer subset which requires a uniform control, the performance gain can push the bands of 10 times. As people said in the call, they want their GPU execution to be as little as possible when considering embedded or mobile aspects. SIMD operations enable that for very realistic use cases such as exploratory data analysis. And the rougher terrains of these operations are not that extreme (some driver bugs exist) given that atomics and writeable buffers are available in WebGPU. I believe if they are available in MVP, people that work on fantastic higher-level abstractions similar to Halide will squeeze the benefit of SIMD operations and reflect to benefit users that can't invest the time to work on SIMD reductions. But even for such people, SIMD operations bring a benefit because when it comes to reduction, atomic operations only work for integers. In contrast, SIMD operations give access to more types, and they outperform atomics even on integers. I think exposing the kind of tensor cores is independent of SIMD operations discussion because they are way more recent, and their API surface is a bit different. |
For a structured capture of the WebGPU debate on subgroups, one can also have a look at argdown-plain and argdown-component views. |
@jasonmayes raises the question of whether WebGPU exposes the right API surface needed to support ML frameworks interactions with GPUs.
@jasonmayes, do you have a list of specific asks from the TFJS experience?
@grorg @tidoust any insights on this?
The text was updated successfully, but these errors were encountered: