Add case cache to repo #3162

ben-clayton · 2023-11-13T21:51:27Z

Add case cache to repo

Add the cache files to the CTS repo under src/resources/cache.
Compress & decompress the cache with gzip using the Compression Streams API. All the cache files sum to just over 4 meg.
Update gen_cache --validate to check that all the cache files are present and up to date. Also error if there are old cache files that are no longer needed.
Update pr.yml to use run gen_cache --validate before building with all (which re-generates the cache)

Requirements for PR author:

All missing test coverage is tracked with "TODO" or .unimplemented().
New helpers are /** documented */ and new helper files are found in helper_index.txt.
Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)

Requirements for reviewer sign-off:

Tests are properly located in the test tree.
Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

ben-clayton · 2023-11-13T21:58:06Z

Looks like I need to update the version of node that the presubmits use.

ben-clayton · 2023-11-13T22:19:12Z

This is fun. The cache seems to differ based on the version of node used. Investigating.

Update: Looks like gzip compresses differently between node versions. I'll change the validation to test post-decompression tomorrow. Even still, there's one 4-byte diff between node 18 and 20 where a float number (looks non-finite, possibly f16) is encoded differently.

kainino0x

This is fun. The cache seems to differ based on the version of node used. Investigating.

Update: Looks like gzip compresses differently between node versions. I'll change the validation to test post-decompression tomorrow. Even still, there's one 4-byte diff between node 18 and 20 where a float number (looks non-finite, possibly f16) is encoded differently.

I think it would be much preferable if we can make the compression results stable. If they're not stable we'll end up with different developers' machines fighting over the compressed files.

Another idea - not sure if this would work - would be to store in src/ in a non-compressed format, compress into gen/, and copy into out*/.
Or even store in a text format (to make git happier) and serialize+compress into gen/.
Never mind if this defeats the whole point of checking the cache into the repository. (Though come to think of it - I don't actually understand why we can't generate the cache at compile time)

Gruntfile.js

src/common/runtime/server.ts

package.json

Gruntfile.js

ben-clayton · 2023-11-14T00:26:54Z

This is fun. The cache seems to differ based on the version of node used. Investigating.
Update: Looks like gzip compresses differently between node versions. I'll change the validation to test post-decompression tomorrow. Even still, there's one 4-byte diff between node 18 and 20 where a float number (looks non-finite, possibly f16) is encoded differently.

I think it would be much preferable if we can make the compression results stable. If they're not stable we'll end up with different developers' machines fighting over the compressed files.

Understood. I'm not entirely sure it can be though. What if we pinned the generation to a node version, erroring if you try to generate with the wrong version of node? It would only affect the developers that re-generate the cases, which so far has been ~3 people.

Another idea - not sure if this would work - would be to store in src/ in a non-compressed format, compress into gen/, and copy into out*/. Or even store in a text format (to make git happier) and serialize+compress into gen/. Never mind if this defeats the whole point of checking the cache into the repository. (Though come to think of it - I don't actually understand why we can't generate the cache at compile time)

We did generate the cache at compile time. That was the first version. It was deemed too expensive to do each time we run the CTS.

kainino0x · 2023-11-14T00:46:29Z

We did generate the cache at compile time. That was the first version. It was deemed too expensive to do each time we run the CTS.

OK. I'd like to understand this a bit more. We already build all of Chromium in order to run the CTS in Dawn - does the cache take so long to build that we don't want to add it onto that build stage?

ben-clayton · 2023-11-14T09:45:34Z

We did generate the cache at compile time. That was the first version. It was deemed too expensive to do each time we run the CTS.

OK. I'd like to understand this a bit more. We already build all of Chromium in order to run the CTS in Dawn - does the cache take so long to build that we don't want to add it onto that build stage?

It did. It previously took around 6 minutes, then spiked to 50, which caused major upset and forced me to drop everything to get the cache build removed from the CTS runs. Since we've all been optimizing the CTS, this has been brought down to less than 2 minutes on a moderately powerful single core machine.

Clearly we've brought this down to something almost negligible in the grand scheme of things, however:

This is only going to climb again as we add more tests.
This was being run for each CTS bots configuration, and the data is purely static. It's a bunch of immutable constant expression constants that can be completely calculated once, ahead of time.
CTS runs will repeat the generation N times to build this, so would each and every presubmit and CI run.
(I could be wrong) but I believe the bots that were doing the cache calculation were the GPU bots, which are limited in their numbers. Infra were not happy that we were spending cycles doing non-gpu work with them.

Obviously it would be nicer if we could just build this when we build the CTS, but we've yet to find a place where this can be done without past infra impact and future scalability concerns.

ben-clayton · 2023-11-14T13:01:36Z

I've added a followup change that adds a cache/hashes.json file, which holds the CRC32 hashes of all the source files used to build the cache binary file. This massively speeds up the validation, and means we only have to rebuild the cases that have been modified.
It also fixes the validation issue between node versions, as the output binary is no longer regenerated as part of validation.

ben-clayton · 2023-11-15T06:30:18Z

PTAL

kainino0x

Sorry for the delay, LGTM

src/resources/cache/hashes.json

src/unittests/parse_imports.spec.ts

src/common/tools/gen_cache.ts

kainino0x · 2023-11-15T21:53:16Z

It did. It previously took around 6 minutes, then spiked to 50, which caused major upset and forced me to drop everything to get the cache build removed from the CTS runs. Since we've all been optimizing the CTS, this has been brought down to less than 2 minutes on a moderately powerful single core machine.

This is a major improvement that sounds like it came from one particular change - if I knew what that change was I could judge better how likely it is to be a problem again in the future.

Clearly we've brought this down to something almost negligible in the grand scheme of things, however:

This is only going to climb again as we add more tests.

It's not clear to me how much this will climb. We could 3x the number of tests and if it takes 6min to build then I think we're still in an OK realm.

This was being run for each CTS bots configuration, and the data is purely static. It's a bunch of immutable constant expression constants that can be completely calculated once, ahead of time.

This doesn't concern me that much if it's a few minutes of compiler bot time. It should be just once per platform, not once per swarming config.

I worry slightly about the maintainability of the solution you've built, though I'm pretty sure it will be fine. But if it does turn out to cause difficulties then we can reevaluate.

CTS runs will repeat the generation N times to build this, so would each and every presubmit and CI run.

True on GitHub. On Dawn it's probably cached?

(I could be wrong) but I believe the bots that were doing the cache calculation were the GPU bots, which are limited in their numbers. Infra were not happy that we were spending cycles doing non-gpu work with them.

I hope not - theoretically the C++ compile should definitely not be happening on GPU bots, so would be OK if it's part of that build.

Obviously it would be nicer if we could just build this when we build the CTS, but we've yet to find a place where this can be done without past infra impact and future scalability concerns.

kainino0x · 2023-11-15T21:59:35Z

I worry slightly about the maintainability of the solution you've built

I'll add I was pleasantly surprised by how little code it took to do this nice hashing solution, so if it turns out to be robust then that's great.

kainino0x · 2023-11-16T02:13:09Z

I just remembered one more thought I had while reviewing (not blocking this, but I'll let you decide when to land it).

What if we checked in the uncompressed cache, and did only the compression in the build step? This would be fast and free us from variations in gzip results.

Depending on how these files change over time, Git might also do a better job storing incremental changes to the binary files if they're uncompressed. https://stackoverflow.com/a/20690323

ben-clayton · 2023-11-16T09:51:49Z

What if we checked in the uncompressed cache, and did only the compression in the build step? This would be fast and free us from variations in gzip results.

We could investigate dropping the compression.

The main purpose for adding the compression was to reduce the bloat of the repo. The binary files are substantially larger when uncompressed, but yes, git will probably end up compressing the files with a very similar compression method. However, the files in the working directory will remain uncompressed and possibly objects outside of the packfiles may be uncompressed.

Let's see how this goes. I'm happy to tweak things if they become problematic (4th iteration so far!).

* Add the cache files to the CTS repo under `src/resources/cache`. * Compress & decompress the cache with gzip using the Compression Streams API. All the cache files sum to just over 4 meg. * Add a `src/resources/cache/hashes.json` to track the hash of the source files used to generate each of the cache files. Only rebuild the cases that are stale. * Update `gen_cache --validate` to check that all the cache files are present and up to date. Also error if there are old cache files that are no longer needed. This is run by `npm test`, which is also called by the GitHub presubmits. * Add `wpt lint` exemption for `*.bin` files Optimization: Add a cache hash file This file contains a hash of the source files used to build the cache. Only source files that have been changed need to be rebuilt. The validation step doesn't have to perform the expensive build step to know whether files are stale.

ben-clayton requested review from kainino0x and austinEng November 13, 2023 21:51

ben-clayton force-pushed the commit-cache branch from 11243b6 to 2641069 Compare November 13, 2023 22:04

ben-clayton force-pushed the commit-cache branch from 2641069 to 7a09db6 Compare November 13, 2023 22:36

kainino0x reviewed Nov 13, 2023

View reviewed changes

Gruntfile.js Outdated Show resolved Hide resolved

Gruntfile.js Show resolved Hide resolved

src/common/runtime/server.ts Show resolved Hide resolved

package.json Outdated Show resolved Hide resolved

Gruntfile.js Show resolved Hide resolved

ben-clayton force-pushed the commit-cache branch from d418342 to ae51f87 Compare November 14, 2023 13:01

ben-clayton force-pushed the commit-cache branch 2 times, most recently from c8d7774 to 876c02d Compare November 14, 2023 13:48

ben-clayton force-pushed the commit-cache branch from 876c02d to 2e7f20c Compare November 15, 2023 09:11

kainino0x approved these changes Nov 15, 2023

View reviewed changes

src/resources/cache/hashes.json Outdated Show resolved Hide resolved

src/unittests/parse_imports.spec.ts Show resolved Hide resolved

src/common/tools/gen_cache.ts Show resolved Hide resolved

ben-clayton force-pushed the commit-cache branch from 2e7f20c to 9f651e9 Compare November 15, 2023 23:47

kainino0x approved these changes Nov 16, 2023

View reviewed changes

ben-clayton force-pushed the commit-cache branch from 9f651e9 to 70ba58d Compare November 16, 2023 09:44

ben-clayton force-pushed the commit-cache branch from 70ba58d to 0021140 Compare November 16, 2023 09:55

ben-clayton merged commit ec19459 into gpuweb:main Nov 16, 2023
1 check passed

ben-clayton deleted the commit-cache branch November 16, 2023 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add case cache to repo #3162

Add case cache to repo #3162

ben-clayton commented Nov 13, 2023 •

edited by kainino0x

Loading

ben-clayton commented Nov 13, 2023

ben-clayton commented Nov 13, 2023 •

edited

Loading

kainino0x left a comment

ben-clayton commented Nov 14, 2023

kainino0x commented Nov 14, 2023

ben-clayton commented Nov 14, 2023

ben-clayton commented Nov 14, 2023

ben-clayton commented Nov 15, 2023

kainino0x left a comment

kainino0x commented Nov 15, 2023

kainino0x commented Nov 15, 2023

kainino0x commented Nov 16, 2023

ben-clayton commented Nov 16, 2023

Add case cache to repo #3162

Add case cache to repo #3162

Conversation

ben-clayton commented Nov 13, 2023 • edited by kainino0x Loading

ben-clayton commented Nov 13, 2023

ben-clayton commented Nov 13, 2023 • edited Loading

kainino0x left a comment

Choose a reason for hiding this comment

ben-clayton commented Nov 14, 2023

kainino0x commented Nov 14, 2023

ben-clayton commented Nov 14, 2023

ben-clayton commented Nov 14, 2023

ben-clayton commented Nov 15, 2023

kainino0x left a comment

Choose a reason for hiding this comment

kainino0x commented Nov 15, 2023

kainino0x commented Nov 15, 2023

kainino0x commented Nov 16, 2023

ben-clayton commented Nov 16, 2023

ben-clayton commented Nov 13, 2023 •

edited by kainino0x

Loading

ben-clayton commented Nov 13, 2023 •

edited

Loading