Skip to content

Optimize pen line drawing using instanced rendering #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 16, 2024

Conversation

Tacodiva
Copy link

This is an optimization for the pen line drawing which results in an increase in performance in some projects, particularly raycasters or other 3D projects which draw a lot of lines. For me, this results in a bigger performance boost in firefox then chrome, but is very measurable in both.

The main difference is instead of writing the attributes for each pen line 6 times (once for each vertex), I am using instance rendering to only need to write the data once. This also allows all pen lines to share the same short vertex position buffer instead of having to have identical data copied a lot of times.

I have also merged all attributes (other than position) into one buffer, which allows the data to be written faster by the CPU and read faster by the GPU. It also saved on opengl calls.

The tests for this don't pass on my computer and the output is fairly incomprehensible, I suspect it's something about my environment given tests seemingly unrelated to what I've changed are failing.

I'm not sure how to test this more rigorously, definitely need to do more before merging, but I'm happy with the state the code is in.

@GarboMuffin
Copy link
Member

GarboMuffin commented Aug 15, 2024

I guess there needs to be a webgl1 fallback

The equivalent (?) extension https://developer.mozilla.org/en-US/docs/Web/API/ANGLE_instanced_arrays seems to have very good support https://web3dsurvey.com/webgl/extensions/ANGLE_instanced_arrays, may not be worth maintaining another fallback for 0.05% of users

@Tacodiva
Copy link
Author

I have added a fallback to ANGLE_instanced_arrays and was able to test it in firefox by disabling webgl2.

@GarboMuffin
Copy link
Member

I basically copy+pasted the old code back as a second fallback if no webgl2 and no webgl1 extension so there should no downside at all now, going to test this on a whole bunch of different systems to make sure there's no surprises

@GarboMuffin
Copy link
Member

GarboMuffin commented Aug 16, 2024

Numbers from running https://scratch.mit.edu/projects/1056145815/ on a bunch of systems

  • webgl2-only forces the WebGL2 path
  • webgl1-angle forces the WebGL1 w/ ANGLE_instanced_arrays path
  • webgl1-fallback forces the WebGL1 w/o ANGLE_instanced_arrays path, which does only use one buffer now so it's a tiny bit faster than before
  • production is what's on the develop branch right now
  • vanilla uses the upstream scratchfoundation/scratch-render (still uses the compiler)

I don't have telemetry but I'm pretty sure >99% of our users just have WebGL2 natively so that's the most important column. Seems to vary between "a tiny bit faster" and "very significantly faster", so that's great

Unscientific, only one run, no averaging, no attempt to control for thermals or background tasks, expect big error bars...

System Browser webgl2-only webgl1-angle webgl1-fallback production vanilla Comments
Pixel 6 Vanadium (GrapheneOS Chromium fork, JIT enabled) 1.98 2.08 3.09 2.9 10.8  
Pixel 6 Firefox 2.5 2.36 3.85 4.21 10.15  
Early 2020 MacBook Air Safari 1.29 1.32 2.45 4.65 6.57  
Early 2020 MacBook Air Firefox 3.83 3.66 4.84 5.02 10.14  
Early 2020 MacBook Air Brave 3.32 3.36 4.37 3.87 9.95  
i7 4790k, RX 570, LMDE Firefox 1.69 1.79 2.79 2.95 6.23  
i7 4790k, RX 570, LMDE Chromium 1.59 1.62 1.97 2.02 6.76  
i7 4790k, RX 570, LMDE GNOME Web (WebKit) 1.14 1.1 1.58 1.47 11.25  
i7 4790k, RX 570, Windows 11 Edge 1.72 1.62 2.26 2.58 8.77  
i7 4790k, RX 570, Windows 11 Firefox 2.25 2.74 5.27 6.96 8.35  
A8-4500M Firefox -- -- -- -- 35.9 Trying to run anything except vanilla resulted in GPU instability, probably hardware issue not worth reading into
Samsung Note 3 Fennec 36.57 37.36 45.5 51.17 227.34 Needed warp timer, otherwise phone restarts itself. Tried to run Chrome too but the phone’s certificates are too old; Firefox uses its own certificates so it still works

@GarboMuffin
Copy link
Member

GarboMuffin commented Aug 16, 2024

anything else you want to change before merge?

@Tacodiva
Copy link
Author

I would be happy with a marge where this is at.

I have thought of one very minor speed improvement but it may not be worth retesting everything, it would only speed up GPU performance not CPU which I think is the main bottleneck.

Basically when instance rendering instead of drawing gl.TRIANGLES with 6 vertices, we could draw gl.TRIANGLE_STRIP with the 4 vertices (1, 0), (0, 0), (1, 1), (0, 1). If you can be bothered could be worth seeing if it helps at all on older hardware.

@Tacodiva
Copy link
Author

I've implemented that now if you wanna investigate, otherwise I'm happy to just revert that commit and merge.

@GarboMuffin
Copy link
Member

Didn't seem to help but also doesn't appear to hurt

Gonna check a couple more computers with weird or old graphics hardware

@Tacodiva
Copy link
Author

Didn't seem to help but also doesn't appear to hurt

Yeah that was my conclusion as well. I suspect most GPU drivers are smart enough to do this optimization anyways so it ends up being equivalent.

@GarboMuffin GarboMuffin merged commit ed45bcd into TurboWarp:develop Aug 16, 2024
1 check passed
@GarboMuffin
Copy link
Member

on staging, to go up whenever i remember

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants