-
Notifications
You must be signed in to change notification settings - Fork 316
Description
This proposal was written subsequentially to the March 2026 iTowns Hackaton.
Involved contributors: @mgermerie, @PierreAntoineChiron and @airnez
Context
Developing features for iTowns, we often ask ourselves :
- To what extent this feature / contribution impacts performances ?
- How to benchmark performances for iTowns ?
- What are the metrics to monitor ?
- Is there a memory leak somewhere ?
This proposal aims at defining an architecture for automated performance tests for iTowns, answering those questions.
Description of the proposal
We'll try to provide soon a first PR partially fulfilling requirements stated bellow. It will be a starting point for performance test development.
Identified use-cases :
- Automatically checking for performance regressions against master branch before merging a pull request
- Checking for performance regressions when bumping dependencies
- Providing a reliable performance tracing toolbox for debugging and bottleneck identification
For now, we'll focus on use-case n°1 while enabling the other ones to be answered later by the same test architecture.
What this proposal does NOT aims at solving
- Providing better live performance debugging tools: This is a different job (Improve debug tools with rendering stats #2020)
- Providing automated bottleneck identification : A human will still be needed to read metrics exposed by those tests
Implementation
Functional Implementation
Test scenarios types
- "Functional" performance tests: Instantiating a view and running real-use scenarios
- "Unit" performance tests: Only benchmarking an iTowns sub-system (Parsing, reprojection... ). This second approach is not as good as n°1, but might be needed for specific parts of the code.
We'll provide a first working "Functional" performance test to begin with.
What are the metrics to monitor ?
We identified a first list of metrics that would be interesting to have for Functional performance tests. Here is a non-exhaustive list (feel free to suggest other ones) :
| Metric | Description |
|---|---|
| update Time | Time required to perform an update |
| Frame Time | Time to render a frame |
| Time to first frame | Time from test start to first render |
| Time to first tile | Time from test start to first tile rendered |
| Test time | Time to perform the test itself |
| Data Parsing and converting time | Time spend parsing / converting data |
| Draw Calls | Number of draw calls per frame |
| Shader Compilation time | Time required to compile shader. Can be responsible for startup slow-down |
| Textures count | The number of active textures |
| Triangle Count | The number of rendered triangles primitives |
| Geometries count | The number of active geometries |
| Number of shaders | The number of shader programs |
| JS Heap Size | Heap memory used after garbage collecting |
| Number and duration of long tasks | Counting tasks >50ms blocking main thread and summing their duration |
For statistics that are measured for each frame / render, they should be then accumulated into statistics :
- Min / Max
- Average
- 95th percentile (P95)
- Standard deviation
Statistical significancy
When looking for reliable measurement (e.g. automated testing), we should be able to run the tests multiple times and measure statistical difference between tested options in a round-robin manner.
Quoting google tachometer readme :
Even if you run the same JavaScript, on the same browser, on the same machine, on the same day, you'll still get a different result every time. But if you take enough repeated samples and apply the right statistics, you can reliably identify even tiny differences in runtime.
How to compare official release performances over time ?
- We can not test performances for each version independently and expect metrics to be comparable. It is environment and browser-version dependent. We will always have to run all compared versions during the same test run. Like Maptiler did for Mapbox and Maplibre.
Technical Implementation details
Identified third-party tool candidates
- Test harness : Running several tests for many times and compare them
- Tachometer: It is designed to only work with time measures. This might not perfectly fit our needs. We will probably just implement what we need ourselves.
- Browser control API
- Playwright : Cool and trendy but we did not consider it (see reason just bellow)
- Puppeteer: We are already using it for functional tests, we'll keep it for performance tests for now (arbitrary and debatable)
- Tracing performance metrics for debugging
- Chrome DevTools Protocol seems unavoidable for that use case. We don't need it for automated tests, but this is the go-to for a human deep-dive investigation
Dependency to the Chrome browser
Chrome environment seems to stand above competitors for performance metrics collection. BUT it would be nice to be able to run those tests for Firefox at least. Since puppeteer supports firefox now, it should be feasible.
How to measure each metric ?
Puppeteer coupled with standard browser API should already be enough to get most of what we need without modifying iTowns core code
- Timings should be measured using performance api, coupled to events listenners
- PerformanceObserver allows us to track long tasks and other things
- The Three.js WebGLRenderer exposes most required 3D metrics
- Puppeteer exposes Page.metrics(), including
JSHeapUsedSize. - Other ways are available for memory monitoring :
- performance.memory (deprecated ?)
- measureUserAgentSpecificMemory() (not standard yet)
- Tracing for debugging
- Puppeteer exposes a tracing api over Chrome DevTools Protocol
| Metric | How to measure |
|---|---|
| update Time | MAIN_LOOP_EVENTS UPDATE_START / UPDATE_END |
| Frame Time | MAIN_LOOP_EVENTS BEFORE_RENDER / AFTER_RENDER |
| Time to first frame | compute it from first AFTER_RENDER event |
| Time to first tile | check for visible level0Node in view.tileLayer |
| Time to stable view | check for empty scheduler and RENDERING_PAUSED |
| Data Parsing and Conversion time | Hard to track: requires specific tests or new logs / events |
| Draw Calls | renderer.info.render.calls |
| Shader Compilation time | sketchy but possible: call renderer.compile beforehand |
| Textures count | renderer.info.memory.textures |
| Triangle Count | renderer.info.render.triangles |
| Geometries count | renderer.info.memory.geometries |
| Number of shaders | renderer.info.programs.length |
| JS Heap Size | performance.memory after forcing a GC (how?) |
| Number and duration of long tasks | PerformanceObserver: observe({ entryTypes: ['longtask'] }) |
Good practices
- Any network-related behavior should be mocked. Data has to be pre-fetched before testing or being available locally.
- The build method used for the performance tests should be as close to production build as possible.
- When comparing versions, we will need to build them separately. Maplibre has an interesting approach building a minified js test file for each version providing all the test and core code: See Maplibre-gl-js benchmark readme.
- In the end, it would be nice to have dedicated hardware WITH GPUs unlike what we have on Github to run those tests consistently. For now it will run on developer's hardware.
Architecture proposal
@mgermerie will complete this section