diff --git a/explainer.md b/explainer.md index 4a456de2..4cd1d8b8 100644 --- a/explainer.md +++ b/explainer.md @@ -47,7 +47,7 @@ const bufferB = new Float32Array(4).fill(0.8); const bufferC = new Float32Array(4); const inputs = {'A': bufferA, 'B': bufferB}; const outputs = {'C': bufferC}; -graph.compute(inputs, outputs); +context.compute(graph, inputs, outputs); // The computed result of [[1, 1], [1, 1]] is in the buffer associated with // the output operand. console.log('Output value: ' + bufferC); @@ -99,12 +99,13 @@ There are many important [application use cases](https://webmachinelearning.gith export class NSNet2 { constructor() { this.graph = null; + this.context = null; this.frameSize = 161; this.hiddenSize = 400; } async build(baseUrl, batchSize, frames) { - const context = navigator.ml.createContext(); + this.context = navigator.ml.createContext(); const builder = new MLGraphBuilder(context); // Create constants by loading pre-trained data from .npy files. const weight172 = await buildConstantByNpy(builder, baseUrl + '172.npy'); @@ -153,7 +154,7 @@ export class NSNet2 { 'gru94': gru94Buffer, 'gru157': gru157Buffer }; - return this.graph.compute(inputs, outputs); + return this.context.compute(this.graph, inputs, outputs); } } ``` diff --git a/index.bs b/index.bs index 7f59d9fb..89238762 100644 --- a/index.bs +++ b/index.bs @@ -30,6 +30,12 @@ urlPrefix: https://gpuweb.github.io/gpuweb/; spec: WEBGPU text: GPUDevice; url: gpu-device text: GPUBuffer; url: buffer-interface text: GPUTexture; url: texture-interface + text: GPUQueue; url: queues + text: GPUCommandBuffer; url: command-buffers + text: GPUCommandBufferDescriptor; url: dictdef-gpucommandbufferdescriptor +urlPrefix: https://webidl.spec.whatwg.org/; spec: WEBIDL + type: interface + text: Promise; url: idl-promise
{ @@ -395,7 +401,7 @@ In order to not allow an attacker to target a specific implementation that may c Issue: Hinting partially mitigates the concern. Investigate additional mitigations. -The API design minimizes the attack surface for the compiled computational graph. The {{MLGraphBuilder}} interface that hosts the various operations is a data definition API and as such doesn't execute anything, only constructs data. What follows, is that the potential for an attack is limited to when binding the data to the graph before executing it by invoking the {{MLGraph/compute()}} method. This enables implementers to focus on hardening the {{MLGraph/compute()}} method. For example, by making sure it honors the boundary of data and fails appropriately when the bounds are not respected. +The API design minimizes the attack surface for the compiled computational graph. The {{MLGraphBuilder}} interface that hosts the various operations is a data definition API and as such doesn't execute anything, only constructs data. What follows, is that the potential for an attack is limited to when binding the data to the graph before executing it by invoking the {{MLContext}}.{{MLContext/compute()}} method. This enables implementers to focus on hardening the {{MLContext}}.{{MLContext/compute()}} method. For example, by making sure it honors the boundary of data and fails appropriately when the bounds are not respected. Purpose-built Web APIs for measuring high-resolution time mitigate against timing attacks using techniques such as resolution reduction, adding jitter, detection of abuse and API call throttling [[hr-time-3]]. The practical deployment of WebNN implementations are likely to bring enough jitter to make timing attacks impractical (e.g. because they would use IPC) but implementers are advised to consider and test their implementations against timing attacks. @@ -444,8 +450,7 @@ computer vision, natural language processing, and robotics. The WebNN API is a specification for constructing, compiling, and executing computational graphs of neural networks. -The {{MLGraph}} interface represents a compiled computational graph (that is, a model) and exposes -a compute method to perform inference. +The {{MLGraph}} interface represents a compiled computational graph that is immutable (that is, a model). The {{MLGraphBuilder}} interface serves as a builder (factory) to create a {{MLGraph}}. An {{MLOperand}} is a representation of data that flows within the computational graph, @@ -456,21 +461,11 @@ At inference time, every {{MLOperand}} will be bound to a tensor (the actual dat The {{MLGraphBuilder}} interface enables the creation of {{MLOperand}}s. A key part of the {{MLGraphBuilder}} interface are the operations (such as -{{MLGraphBuilder/gemm()}} and {{MLGraphBuilder/softmax()}}). The operations have a functional +{{MLGraphBuilder}}.{{MLGraphBuilder/gemm()}} and {{MLGraphBuilder}}.{{MLGraphBuilder/softmax()}}). The operations have a functional semantics, with no side effects. Each operation invocation conceptually returns a distinct new value, without changing the value of any other {{MLOperand}}. -The {{MLGraphBuilder/build()}} method of the {{MLGraphBuilder}} interface is used to compile and optimize -the computation graph used to compute one or more specified outputs. The key -purpose of the compilation step is to enable optimizations that span two or -more operations, such as operation or loop fusion. - -The {{MLGraph/compute()}} method of the {{MLGraph}} interface is used to execute the -compiled computation graph (to perform inference). The caller supplies the input -values using {{MLNamedInputs}}, binding the input {{MLOperand}}s to their values. -The caller supplies pre-allocated buffers for output {{MLOperand}}s using {{MLNamedOutputs}}. - The runtime values (of {{MLOperand}}s) are tensors, which are essentially multidimensional arrays. The representation of the tensors is implementation dependent, but it typically includes the array data stored in some buffer (memory) and some metadata describing the @@ -483,24 +478,52 @@ that shares the same buffer as the input tensor. (In the case of reshape or sque the entire data is shared, while in the case of slice, a part of the input data is shared.) The implementation may use views, as above, for intermediate values. +The {{MLGraphBuilder}}.{{MLGraphBuilder/build()}} method of the {{MLGraphBuilder}} interface is used to compile and optimize +the computation graph used to compute one or more specified outputs. The key +purpose of the compilation step is to enable optimizations that span two or +more operations, such as operation or loop fusion. + +Once the {{MLGraph}} is constructed, there are multiple ways by which the graph may be executed. The +{{MLContext}}.{{MLContext/compute()}} method represents a way the execution of the graph is carried out immediately +on the calling thread, which must also be a worker thread, either on a CPU or GPU device. The execution +produces the results of the computation from all the inputs bound to the graph. + +The {{MLContext}}.{{MLContext/computeAsync()}} method represents a way the execution of the graph is performed asynchronously +either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU +command queue. This method returns immediately without blocking the calling thread while the actual execution is +offloaded to a different timeline. This type of execution is appropriate when the responsiveness of the calling +thread is critical to good user experience. The computation results will be placed at the bound outputs at the +time the operation is successfully completed on the offloaded timeline at which time the calling thread is +signaled. This type of execution supports both the CPU and GPU device, including when the context is created +from the {{WebGLRenderingContext}}. + +In both the {{MLContext}}.{{MLContext/compute()}} and {{MLContext}}.{{MLContext/computeAsync()}} execution methods, the caller supplies +the input values using {{MLNamedArrayInputs}}, binding the input {{MLOperand}}s to their values. The caller +then supplies pre-allocated buffers for output {{MLOperand}}s using {{MLNamedArrayOutputs}}. + +The {{MLCommandEncoder}} interface created by the {{MLContext}}.{{MLContext/createCommandEncoder()}} method supports +a graph execution method that provides the maximum flexibility to callers that also utilize WebGPU in their +application. It does this by placing the workload required to initialize and compute the results of the +operations in the graph onto a {{GPUCommandBuffer}}. The callers are responsible for the eventual submission +of this workload on the {{GPUQueue}} through the WebGPU queue submission mechanism. Once the submitted workload +is completely executed, the result is avaialble in the bound output buffers. + ## Device Selection ## {#programming-model-device-selection} -An {{MLContext}} interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. An {{MLContext}} could be created from a specific GPU device such as {{GPUDevice}} or {{WebGLRenderingContext}} that is already in use by the application, in which case the corresponding {{GPUBuffer}} or {{WebGLBuffer}} resources used as graph constants, as well as the {{GPUTexture}} and {{WebGLTexture}} as graph inputs must also be created from the same device. In a multi-adapter configuration, the device used for {{MLContext}} must be created from the same adapter as the device used to allocate the resources referenced in the graph. +An {{MLContext}} interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. In addition to the default method of creation with {{MLContextOptions}}, an {{MLContext}} could also be created from a specific GPU device such as {{GPUDevice}} or {{WebGLRenderingContext}} that is already in use by the application, in which case the corresponding {{GPUBuffer}} or {{WebGLBuffer}} resources used as graph constants, as well as the {{GPUTexture}} and {{WebGLTexture}} as graph inputs must also be created from the same device. In a multi-adapter configuration, the device used for {{MLContext}} must be created from the same adapter as the device used to allocate the resources referenced in the graph. In a situation when a GPU context executes a graph with a constant or an input in the system memory as an {{ArrayBufferView}}, the input content is automatically uploaded from the system memory to the GPU memory, and downloaded back to the system memory of an {{ArrayBufferView}} output buffer at the end of the graph execution. This data upload and download cycles will only occur whenever the execution device requires the data to be copied out of and back into the system memory, such as in the case of the GPU. It doesn't occur when the device is a CPU device. Additionally, the result of the graph execution is in a known layout format. While the execution may be optimized for a native memory access pattern in an intermediate result within the graph, the output of the last operation of the graph must convert the content back to a known layout format at the end of the graph in order to maintain the expected behavior from the caller's perspective. -When an {{MLContext}} is created with {{MLContextOptions}}, the user agent selects and creates the underlying execution device by taking into account the application's [=power preference=] and [=device preference=] specified in the {{MLPowerPreference}} and {{MLDevicePreference}} options. +When an {{MLContext}} is created with {{MLContextOptions}}, the user agent selects and creates the underlying execution device by taking into account the application's [=power preference=] and [=device type=] specified in the {{MLPowerPreference}} and {{MLDeviceType}} options. -The following table summarizes the types of resource supported by the device selected. +The following table summarizes the types of resource supported by the context created through different method of creation:@@ -522,10 +545,9 @@ WorkerNavigator includes NavigatorML; ## ML ## {#api-ml} @@ -554,28 +576,21 @@ The {{ML/createContext()}} method steps are:-
Device Type ArrayBufferView GPUBuffer GPUTexture WebGLBuffer WebGLTexture + Creation method ArrayBufferView GPUBuffer GPUTexture WebGLBuffer WebGLTexture + MLContextOptions Yes No No No No GPUDevice Yes Yes Yes No No WebGLRenderingContext Yes No No Yes Yes - default Yes No No No No - gpu Yes No No No No - cpu Yes No No No No
webnn
".
Its default allowlist is 'self'
.
## MLContext ## {#api-mlcontext}
-The {{MLContext}} interface represents a global state of neural network compute workload and execution processes. Each {{MLContext}} object has associated [=context type=], [=device preference=] and [=power preference=].
+The {{MLContext}} interface represents a global state of neural network compute workload and execution processes. Each {{MLContext}} object has associated [=context type=], [=device type=] and [=power preference=].
The context type is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph:
default
"webgl
"webgpu
"default
"gpu
"cpu
"gpu
"cpu
"+function sizeOfShape(array) { + return array.reduce( + (accumulator, currentValue) => accumulator * currentValue); +} + +const context = navigator.ml.createContext(); + +// Create a graph with dynamic shaped inputs. +const builder = new MLGraphBuilder(context); +const descA = {type: 'float32', dimensions: [-1, 4]}; +const a = builder.input('a', descA); +const descB = {type: 'float32', dimensions: [4, -1]}; +const b = builder.input('b', descB); +const c = builder.matmul(a, b); +const graph = builder.build({'c': c}); + +function allocateAndCompute(shapeA, shapeB, shapeC) { + const bufferA = new Float32Array(sizeOfShape(shapeA)).fill(0.5); + const bufferB = new Float32Array(sizeOfShape(shapeB)).fill(0.5); + const bufferC = new Float32Array(sizeOfShape(shapeC)); + + // Specify the shape of inputs when computing. + const inputs = { + 'a': {resource: bufferA, dimensions: shapeA}, + 'b': {resource: bufferB, dimensions: shapeB}, + }; + const outputs = {'c': bufferC}; + context.compute(graph, inputs, outputs); + console.log(`values: ${bufferC}`); +} + +allocateAndCompute([3, 4], [4, 3], [3, 3]); +allocateAndCompute([4, 4], [4, 4], [4, 4]); +allocateAndCompute([5, 4], [4, 5], [5, 5]); ++
+const context = navigator.ml.createContext(); + +// Build a graph with two outputs. +const builder = new MLGraphBuilder(context); +const descA = {type: 'float32', dimensions: [3, 4]}; +const a = builder.input('a', descA); +const descB = {type: 'float32', dimensions: [4, 3]}; +const bufferB = new Float32Array(sizeOfShape(descB.dimensions)).fill(0.5); +const b = builder.constant(descB, bufferB); +const descC = {type: 'float32', dimensions: [3, 3]}; +const bufferC = new Float32Array(sizeOfShape(descC.dimensions)).fill(1); +const c = builder.constant(descC, bufferC); +const d = builder.matmul(a, b); +const e = builder.add(d, c); +const graph = builder.build({'d': d, 'e': e}); + +const bufferA = new Float32Array(sizeOfShape(descA.dimensions)).fill(0.5); +const inputs = {'a': bufferA}; + +// Compute d. +const bufferD = new Float32Array(sizeOfShape([3, 3])); +context.compute(graph, inputs, {'d': bufferD}); +console.log(`values: ${bufferD}`); + +// Compute e. +const bufferE = new Float32Array(sizeOfShape([3, 3])); +context.compute(graph, inputs, {'e': bufferE}); +console.log(`values: ${bufferE}`); ++
- |inputs|: an {{MLNamedInputs}}. The resources and optional dimensions of inputs for the compute. - |outputs|: an {{MLNamedOutputs}}. The pre-allocated resources of required outputs for the compute. -- - **Returns:** {{undefined}}. - - 1. If any of the following requirements are unmet, then throw a {{DataError}} {{DOMException}} and stop. -
-function sizeOfShape(array) { - return array.reduce( - (accumulator, currentValue) => accumulator * currentValue); -} + -const context = navigator.ml.createContext(); +-+ **Arguments:** + - *graph*: an {{MLGraph}}. The compiled graph to be initialized with graph constant inputs. -// Create a graph with dynamic shaped inputs. -const builder = new MLGraphBuilder(context); -const descA = {type: 'float32', dimensions: [-1, 4]}; -const a = builder.input('a', descA); -const descB = {type: 'float32', dimensions: [4, -1]}; -const b = builder.input('b', descB); -const c = builder.matmul(a, b); -const graph = builder.build({'c': c}); + **Returns:** {{undefined}}. +-function allocateAndCompute(shapeA, shapeB, shapeC) { - const bufferA = new Float32Array(sizeOfShape(shapeA)).fill(0.5); - const bufferB = new Float32Array(sizeOfShape(shapeB)).fill(0.5); - const bufferC = new Float32Array(sizeOfShape(shapeC)); ++Graph initialization stage typically involves a process known as "weight preprocessing" where all the constant inputs to the graph are preprocessed and cached at the operating system level for subsequent graph execution calls. The initializing inputs are typically the constant weight data specified through the {{MLGraphBuilder}}.{{MLGraphBuilder/constant(desc, bufferView)}} method as constant operands during graph construction time. +- // Specify the shape of inputs when computing. - const inputs = { - 'a': {resource: bufferA, dimensions: shapeA}, - 'b': {resource: bufferB, dimensions: shapeB}, - }; - const outputs = {'c': bufferC}; - graph.compute(inputs, outputs); - console.log(`values: ${bufferC}`); -} +### Dispatch Execution Commands ### {#api-mlcommandencoder-dispatch-commands} +Record the {{MLGraph}} execution with the inputs {{MLNamedGPUInputs}} and outputs {{MLNamedGPUOutputs}}. -allocateAndCompute([3, 4], [4, 3], [3, 3]); -allocateAndCompute([4, 4], [4, 4], [4, 4]); -allocateAndCompute([5, 4], [4, 5], [5, 5]); -
-const context = navigator.ml.createContext(); ++ **Arguments:** + - *graph*: an {{MLGraph}}. The compiled graph to be executed. + - *inputs*: an {{MLNamedGPUInputs}}. The resources and optional dimensions of inputs. + - *outputs*: an {{MLNamedGPUOutputs}}. The pre-allocated resources of required outputs. + + **Returns:** {{undefined}}. + + 1. If any of the following requirements are unmet, then throw a {{DataError}} {{DOMException}} and stop. +-// Build a graph with two outputs. -const builder = new MLGraphBuilder(context); -const descA = {type: 'float32', dimensions: [3, 4]}; -const a = builder.input('a', descA); -const descB = {type: 'float32', dimensions: [4, 3]}; -const bufferB = new Float32Array(sizeOfShape(descB.dimensions)).fill(0.5); -const b = builder.constant(descB, bufferB); -const descC = {type: 'float32', dimensions: [3, 3]}; -const bufferC = new Float32Array(sizeOfShape(descC.dimensions)).fill(1); -const c = builder.constant(descC, bufferC); -const d = builder.matmul(a, b); -const e = builder.add(d, c); -const graph = builder.build({'d': d, 'e': e}); +### Generate GPU Command Buffer ### {#api-mlcommandencoder-generate-gpu-command-buffer} +Complete the recording of ML workload and return a WebGPU-compatible {{GPUCommandBuffer}} containing the recorded workload. -const bufferA = new Float32Array(sizeOfShape(descA.dimensions)).fill(0.5); -const inputs = {'a': bufferA}; + -// Compute d. -const bufferD = new Float32Array(sizeOfShape([3, 3])); -graph.compute(inputs, {'d': bufferD}); -console.log(`values: ${bufferD}`); ++ 1. For each |key| -> |value| of |inputs|: + 1. |graph|.{{MLGraph/[[inputDescriptors]]}}[|key|] must exist. + 1. Let |inputDesc| be |graph|.{{MLGraph/[[inputDescriptors]]}}[|key|]. + 1. Let |inputSize| be 1. + 1. If |value| is an {{MLGPUInput}}, then: + 1. The length of |value|.{{MLGPUInput/dimensions}} must be the same as the length of |inputDesc|.{{MLOperandDescriptor/dimensions}}. + 1. Let |i| be 0. + 1. While true: + 1. Let |dimension| be |value|.{{MLGPUInput/dimensions}}[|i|]. + 1. |dimension| must be greater than 0. + 1. If |inputDesc|.{{MLOperandDescriptor/dimensions}}[|i|] is greater than 0, then |dimension| must be equal to |inputDesc|.{{MLOperandDescriptor/dimensions}}[|i|]. + 1. Set |inputSize| to the product of |inputSize| and |dimension|. + 1. Increment |i| by 1. + 1. If |i| if equal to the length of |value|.{{MLGPUInput/dimensions}}, then break. + 1. Else: + 1. For each |dimension| of |inputDesc|.{{MLOperandDescriptor/dimensions}}: + 1. The value of |dimension| must be greater than 0. + 1. Set |inputSize| to the product of |inputSize| and |dimension|. + 1. If |value| is an {{MLGPUInput}}, then let |resource| be |value|.{{MLGPUInput/resource}}. + 1. If |value| is an {{MLGPUResource}}, then let |resource| be |value|. + 1. For each |key| -> |value| of |outputs|: + 1. |graph|.{{MLGraph/[[outputNames]]}}[|key|] must exist. ++ + 1. For each |key| -> |value| of |inputs|: + 1. Let |inputDesc| be |graph|.{{MLGraph/[[inputDescriptors]]}}[|key|]. + 1. Let |inputTensor| be a new tensor for |graph|.{{MLGraph/[[implementation]]}} of data type that is compatible with |inputDesc|.{{MLOperandDescriptor/type}}. + 1. If |value| is an {{MLGPUInput}}, then: + 1. Set the dimensions of |inputTensor| to |value|.{{MLGPUInput/dimensions}}. + 1. Else: + 1. Set the dimensions of |inputTensor| to |inputDesc|.{{MLOperandDescriptor/dimensions}}. + 1. If |value| is an {{MLGPUInput}}, then: + 1. Set the values of |inputTensor| to the values of |value|.{{MLGPUInput/resource}}. + 1. If |value| is an {{MLGPUResource}}, then: + 1. Set the values of |inputTensor| to the values of |value|. + 1. Set the input of |graph|.{{MLGraph/[[implementation]]}} that is associated with |key| to |inputTensor|. + 1. For each |key| -> |value| of |outputs|: + 1. Issue a compute request for output of |graph|.{{MLGraph/[[implementation]]}} that is associated with |key|. + 1. Wait for the compute request to be completed. + 1. If there is an error returned by |graph|.{{MLGraph/[[implementation]]}}, then: + 1. Throw an {{OperationError}} {{DOMException}} and stop. + 1. Else: + 1. Let |outputTensor| be the output tensor returned by |graph|.{{MLGraph/[[implementation]]}}. + 1. If the kind of |value| is not compatible with the value type of |outputTensor|, then throw a {{DataError}} {{DOMException}} and stop. + 1. Let |outputSize| be 1. + 1. For each |dimension| of dimensions of |outputTensor|: + 1. Set |outputSize| to the product of |outputSize| and |dimension|. + 1. If |outputSize| is greater than the length of |value|, then: + 1. Throw a {{DataError}} {{DOMException}} and stop. + 1. Else: + 1. Set the values of |value| to the values of |outputTensor|. + 1. Return {{undefined}}. ++ **Arguments:** + - *descriptor*: an optional {{GPUCommandBufferDescriptor}}. Descriptor of the command buffer. -// Compute e. -const bufferE = new Float32Array(sizeOfShape([3, 3])); -graph.compute(inputs, {'e': bufferE}); -console.log(`values: ${bufferE}`); - + **Returns:** {{GPUCommandBuffer}}.Examples {#examples} @@ -2435,7 +2677,7 @@ const inputs = { 'input2': inputBuffer2, }; const outputs = {'output': outputBuffer}; -graph.compute(inputs, outputs); +context.compute(graph, inputs, outputs); console.log('Output value: ' + outputBuffer); // Output value: 2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25