Skip to content

Commit 779459e

Browse files
committed
make urls relative
1 parent 1da3240 commit 779459e

11 files changed

+5616
-23
lines changed

webgpu/lessons/webgpu-optimizating.md

+201-3
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,207 @@ we've done the following steps
3535

3636
Let's make an example we can optimize
3737

38-
* Pack your vertices
39-
* Use mappedOnCreation for initial data
40-
* Split uniform buffer (shared, material, per model)
38+
## Use mappedOnCreation for initial data
39+
40+
In the example above, and in most of the examples on this site we've
41+
used `writeBuffer` to copy data into a vertex or index buffer. As a very
42+
minor optimization, for this particular case, when you create a buffer
43+
you can pass in `mappedAtCreation: true`. This has 2 benefits.
44+
45+
1. It's slightly faster to put the data into the new buffer (2)
46+
47+
2. You don't have to add `GPUBufferUsage.COPY_DST` to the buffer's usage.
48+
49+
This assumes you're not going to change the data later.
50+
51+
```js
52+
function createBufferWithData(device, data, usage) {
53+
const buffer = device.createBuffer({
54+
size: data.byteLength,
55+
- usage: usage | GPUBufferUsage.COPY_DST,
56+
+ usage: usage,
57+
+ mappedAtCreation: true,
58+
});
59+
- device.queue.writeBuffer(buffer, 0, data);
60+
+ const dst = new Uint8Array(buffer.getMappedRange());
61+
+ dst.set(new Uint8Array(data.buffer));
62+
+ buffer.unmap();
63+
return buffer;
64+
}
65+
```
66+
67+
Note that this optimization only helps at creation time so it will not
68+
affect our performance at render time.
69+
70+
## Pack and interleave your vertices
71+
72+
In the example above we have 3 buffers, one for position, one for normals,
73+
and one for texture coordinates. This is slower both on the CPU and GPU.
74+
One the CPU in JavaScript we need to call `setVertexBuffer` once for each
75+
buffer for each model we want to draw. On the GPU there are cache issues.
76+
So, if we interleave the vertex data into a single buffer we'll only need
77+
one call to `setVertexBuffer` and we'll help the GPU as well as all the
78+
data needed for a single vertex will be located together in memory.
79+
80+
```js
81+
- const positions = new Float32Array([1, 1, -1, 1, 1, 1, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, -1, -1, -1, -1, 1, -1, 1, 1, 1, 1, 1, 1, 1, -1, -1, 1, -1, -1, -1, -1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, -1, -1]);
82+
- const normals = new Float32Array([1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1]);
83+
- const texcoords = new Float32Array([1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1]);
84+
+ const vertexData = new Float32Array([
85+
+ // position normal texcoord
86+
+ 1, 1, -1, 1, 0, 0, 1, 0,
87+
+ 1, 1, 1, 1, 0, 0, 0, 0,
88+
+ 1, -1, 1, 1, 0, 0, 0, 1,
89+
+ 1, -1, -1, 1, 0, 0, 1, 1,
90+
+ -1, 1, 1, -1, 0, 0, 1, 0,
91+
+ -1, 1, -1, -1, 0, 0, 0, 0,
92+
+ -1, -1, -1, -1, 0, 0, 0, 1,
93+
+ -1, -1, 1, -1, 0, 0, 1, 1,
94+
+ -1, 1, 1, 0, 1, 0, 1, 0,
95+
+ 1, 1, 1, 0, 1, 0, 0, 0,
96+
+ 1, 1, -1, 0, 1, 0, 0, 1,
97+
+ -1, 1, -1, 0, 1, 0, 1, 1,
98+
+ -1, -1, -1, 0, -1, 0, 1, 0,
99+
+ 1, -1, -1, 0, -1, 0, 0, 0,
100+
+ 1, -1, 1, 0, -1, 0, 0, 1,
101+
+ -1, -1, 1, 0, -1, 0, 1, 1,
102+
+ 1, 1, 1, 0, 0, 1, 1, 0,
103+
+ -1, 1, 1, 0, 0, 1, 0, 0,
104+
+ -1, -1, 1, 0, 0, 1, 0, 1,
105+
+ 1, -1, 1, 0, 0, 1, 1, 1,
106+
+ -1, 1, -1, 0, 0, -1, 1, 0,
107+
+ 1, 1, -1, 0, 0, -1, 0, 0,
108+
+ 1, -1, -1, 0, 0, -1, 0, 1,
109+
+ -1, -1, -1, 0, 0, -1, 1, 1,
110+
+ ]);
111+
const indices = new Uint16Array([0, 1, 2, 0, 2, 3, 4, 5, 6, 4, 6, 7, 8, 9, 10, 8, 10, 11, 12, 13, 14, 12, 14, 15, 16, 17, 18, 16, 18, 19, 20, 21, 22, 20, 22, 23]);
112+
113+
- const positionBuffer = createBufferWithData(device, positions, GPUBufferUsage.VERTEX);
114+
- const normalBuffer = createBufferWithData(device, normals, GPUBufferUsage.VERTEX);
115+
- const texcoordBuffer = createBufferWithData(device, texcoords, GPUBufferUsage.VERTEX);
116+
+ const vertexBuffer = createBufferWithData(device, vertexData, GPUBufferUsage.VERTEX);
117+
const indicesBuffer = createBufferWithData(device, indices, GPUBufferUsage.INDEX);
118+
const numVertices = indices.length;
119+
120+
const pipeline = device.createRenderPipeline({
121+
label: 'textured model with point light w/specular highlight',
122+
layout: 'auto',
123+
vertex: {
124+
module,
125+
buffers: [
126+
- // position
127+
- {
128+
- arrayStride: 3 * 4, // 3 floats
129+
- attributes: [
130+
- {shaderLocation: 0, offset: 0, format: 'float32x3'},
131+
- ],
132+
- },
133+
- // normal
134+
- {
135+
- arrayStride: 3 * 4, // 3 floats
136+
- attributes: [
137+
- {shaderLocation: 1, offset: 0, format: 'float32x3'},
138+
- ],
139+
- },
140+
- // uvs
141+
- {
142+
- arrayStride: 2 * 4, // 2 floats
143+
- attributes: [
144+
- {shaderLocation: 2, offset: 0, format: 'float32x2'},
145+
- ],
146+
- },
147+
+ {
148+
+ arrayStride: (3 + 3 + 2) * 4, // 8 floats
149+
+ attributes: [
150+
+ {shaderLocation: 0, offset: 0 * 4, format: 'float32x3'}, // position
151+
+ {shaderLocation: 1, offset: 3 * 4, format: 'float32x3'}, // normal
152+
+ {shaderLocation: 2, offset: 6 * 4, format: 'float32x2'}, // texcoord
153+
+ ],
154+
+ },
155+
],
156+
},
157+
fragment: {
158+
module,
159+
targets: [{ format: presentationFormat }],
160+
},
161+
primitive: {
162+
cullMode: 'back',
163+
},
164+
depthStencil: {
165+
depthWriteEnabled: true,
166+
depthCompare: 'less',
167+
format: 'depth24plus',
168+
},
169+
});
170+
171+
...
172+
- pass.setVertexBuffer(0, positionBuffer);
173+
- pass.setVertexBuffer(1, normalBuffer);
174+
- pass.setVertexBuffer(2, texcoordBuffer);
175+
+ pass.setVertexBuffer(0, vertexBuffer);
176+
```
177+
178+
* Split uniform buffers (shared, material, per model)
179+
180+
Our example right now has one uniform buffer object.
181+
182+
```wgsl
183+
struct Uniforms {
184+
normalMatrix: mat3x3f,
185+
viewProjection: mat4x4f,
186+
world: mat4x4f,
187+
color: vec4f,
188+
lightWorldPosition: vec3f,
189+
viewWorldPosition: vec3f,
190+
shininess: f32,
191+
};
192+
```
193+
194+
Some of those uniform values like `viewProjection`, `lightWorldPosition`
195+
and `viewWorldPosition` can be shared.
196+
197+
We can split these into at least 2 uniform buffers. One for the shared
198+
values and one for *per object values*.
199+
200+
```wgsl
201+
struct SharedUniforms {
202+
viewProjection: mat4x4f,
203+
lightWorldPosition: vec3f,
204+
viewWorldPosition: vec3f,
205+
};
206+
struct PerObjectUniforms {
207+
normalMatrix: mat3x3f,
208+
world: mat4x4f,
209+
color: vec4f,
210+
shininess: f32,
211+
};
212+
```
213+
214+
With this change, we'll save having to copy the `viewProjection`, `lightWorldPosition` and `viewWorldPosition` to every uniform buffer.
215+
We'll also copy less data with `device.queue.writeBuffer`
216+
217+
With that change our math portion dropped ~30%
218+
219+
A common organization in a 3D library is to have "models" (the vertex data),
220+
"materials" (the colors, shininess, and texture), "lights" (which lights to use),
221+
"viewInfo" (the view and projection matrix). In particular, in our example,
222+
`color` and `shininess` never change so it's a waste to keep copying them
223+
to the uniform buffer every frame.
224+
225+
## Double buffer uniform buffers that are updated every frame
226+
227+
WebGPU is required to make accessing a buffer to be safe. That means
228+
when submit a command buffer, WebGPU has to effectively check, "is this buffer
229+
being updated? If so wait until the update is finished". Or, going the other way,
230+
let's say you call `device.queue.writeBuffer`. WebGPU has to check "is this buffer currently being read by shaders? If so wait until that finishes".
231+
232+
Double buffering in this case means, instead of one uniform buffer for
233+
the "per object uniforms", the ones we're updating with thee world and
234+
normal matrices, we'd have two. We'd ping-pong which one we're updating.
235+
This why, while WebGPU is drawing using one of those 2 buffers, we'r updating
236+
the other. So, WebGPU never has to wait.
237+
238+
41239

42240
* Texture Atlas or 2D-array
43241
* GPU Occlusion culling

0 commit comments

Comments
 (0)