-
-
Notifications
You must be signed in to change notification settings - Fork 36.3k
Open
Labels
Description
Description
This week I learnt I can ask Claude to analyze chrome devtools performance traces.
I tried it with a 2 second trace of this example and these are the findings:
▗ ▗ ▖ ▖ Claude Code v2.1.29
Opus 4.5 · Claude Max
▘▘ ▝▝ ~/Work/git/three.js
❯ Can you investigate Trace-20260201T095112.json.gz and see what's taking
the most CPU time? The code for this in the /src folder (WebGPURenderer).
Lets see if we can find any possible performance improvmeents.
⏺ Bash(gunzip -c Trace-20260201T095112.json.gz | head -c 50000)
⎿ {"metadata": {
"source": "DevTools",
"startTime": "2026-02-01T00:51:12.270Z",
… +874 lines (ctrl+o to expand)
⏺ Bash(gunzip -c Trace-20260201T095112.json.gz | jq -r '.traceEvents[] |
select(.cat == "devtools.timeline" and .name == "FunctionCall") | "\(.dur
// 0) \(.args.data.f…)
⎿ 311556 update
854
120 MutationObserver.observe.childList
⏺ Let me get more detailed profiling data from the trace.
⏺ Bash(gunzip -c Trace-20260201T095112.json.gz | jq -r '.traceEvents[] |
select(.name == "Profile" or .name == "ProfileChunk")' 2>/dev/null | head
-c 100000)
⎿ {
"args": {
"data": {
… +6502 lines (ctrl+o to expand)
⏺ Bash(gunzip -c Trace-20260201T095112.json.gz | jq -c '.traceEvents[] |
select(.name == "ProfileChunk") | .args.data.cpuProfile.nodes[]? |
select(.callFrame.url != nu…)
⎿ (No content)
⏺ Bash(gunzip -c Trace-20260201T095112.json.gz | jq '.traceEvents[] |
select(.name == "ProfileChunk") | .args.data.cpuProfile.nodes' 2>/dev/null
| jq -s 'add | .[] | s…)
⎿ (No content)
⏺ Bash(gunzip -c Trace-20260201T095112.json.gz | jq '[.traceEvents[] |
select(.name == "ProfileChunk") | .args.data.cpuProfile.nodes // empty] |
add | map(select(.call…)
⎿ (No content)
⏺ Read 1 file (ctrl+o to expand)
⏺ Bash(gunzip -c Trace-20260201T095112.json.gz | python3 -c "
import json…)
⎿ Top functions by self time (microseconds):
=======================================================================
=========
… +46 lines (ctrl+o to expand)
⏺ Excellent! Now let me map these hot functions to the source code in /src.
⏺ Searched for 7 patterns, read 2 files (ctrl+o to expand)
⏺ Let me also check the ChainMap and RenderList which are high on the list.
⏺ Searched for 1 pattern, read 1 file (ctrl+o to expand)
⏺ Let me check the Renderer's _renderObjectDirect method and getCacheKey which
are also high on CPU time.
⏺ Searched for 1 pattern, read 1 file (ctrl+o to expand)
⏺ Based on my analysis of the trace file, here's a summary of the top CPU
consumers in the WebGPURenderer:
Performance Analysis Results
┌────────────────┬────────────────────────┬───────────────────────┐
│ Self Time (µs) │ Function │ Location │
├────────────────┼────────────────────────┼───────────────────────┤
│ 37,647 │ get (ChainMap) │ ChainMap.js │
├────────────────┼────────────────────────┼───────────────────────┤
│ 34,074 │ setPipelineAndBindings │ WebGPUBackend.js:1505 │
├────────────────┼────────────────────────┼───────────────────────┤
│ 22,994 │ _renderObjectDirect │ Renderer.js:3301 │
├────────────────┼────────────────────────┼───────────────────────┤
│ 19,725 │ get (DataMap) │ DataMap/other │
├────────────────┼────────────────────────┼───────────────────────┤
│ 17,913 │ setVertexBuffer │ WebGPU native │
├────────────────┼────────────────────────┼───────────────────────┤
│ 13,202 │ updateMatrixWorld │ Object3D.js │
├────────────────┼────────────────────────┼───────────────────────┤
│ 11,141 │ sort │ RenderList.js:362 │
├────────────────┼────────────────────────┼───────────────────────┤
│ 10,693 │ draw │ WebGPUBackend.js:1486 │
├────────────────┼────────────────────────┼───────────────────────┤
│ 9,943 │ setBindGroup │ WebGPU native │
├────────────────┼────────────────────────┼───────────────────────┤
│ 9,033 │ needsRenderUpdate │ WebGPUBackend.js:1799 │
├────────────────┼────────────────────────┼───────────────────────┤
│ 8,469 │ _projectObject │ Renderer.js │
├────────────────┼────────────────────────┼───────────────────────┤
│ 7,421 │ getCacheKey │ RenderObject.js │
├────────────────┼────────────────────────┼───────────────────────┤
│ 7,092 │ intersectsSphere │ Frustum.js │
├────────────────┼────────────────────────┼───────────────────────┤
│ 3,187 │ painterSortStable │ RenderList.js:12 │
├────────────────┼────────────────────────┼───────────────────────┤
│ 1,993 │ cyrb53 │ hash function │
└────────────────┴────────────────────────┴───────────────────────┘
Key Hotspots & Potential Improvements
1. ChainMap get lookups (37.6ms) — Highest cost
The ChainMap iterates through multiple nested WeakMaps per lookup. For scenes
with many objects, this compounds.
Potential improvement: Consider caching the result or using a simpler
single-key lookup when possible (e.g., combining keys into a single object
reference).
2. needsRenderUpdate (9ms) — Expensive comparison
At WebGPUBackend.js:1799-1828, there's a massive comparison of ~25 material
properties every frame:
if ( data.material !== material || data.materialVersion !== material.version
||
data.transparent !== material.transparent || data.blending !== ... // 25+
comparisons
Potential improvement: Use a single version number or hash on the material
rather than comparing every property individually.
3. painterSortStable (3.2ms) — Called per comparison
The sort comparator at RenderList.js:12-31 has 4 levels of nested if-else
checks. With many objects, this adds up.
Potential improvement: Pre-compute a single sort key (e.g., (groupOrder << 24)
| (renderOrder << 16) | z_encoded) to allow a single numeric comparison.
4. setCacheKey / getRenderCacheKey (7.4ms)
Cache key generation involves string concatenation which is expensive.
5. WebGPU Native Calls (28ms total)
setVertexBuffer, setBindGroup, setIndexBuffer are already optimized by
tracking current state to avoid redundant calls. This is working correctly.
Would you like me to implement any of these optimizations? The most impactful
would likely be:
1. Simplifying needsRenderUpdate with a version/hash approach
2. Optimizing the sort comparator with pre-computed keys
✻ Cooked for 2m 12s
Solution
Reduce CPU usage.
Alternatives
Not reduce CPU usage.
Additional context
No response
Reactions are currently unavailable