-
Notifications
You must be signed in to change notification settings - Fork 34
Add option to use Satori GC #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I wonder your measurement result, especially around working set. In real world application there're much more survived objects than synthesized stress test, so the difference should be smaller. |
|
It's hard to quantify because everything's so dynamic, but I'm seeing probably a ~20% increase in RSS. Rough results look like (64GB total system memory, linux-x64):
This metric isn't very important for us though. |
|
Are you using SVR or SVR-DATAS on master? |
|
WKS on master (I'll clarify in the table) |
|
I recall 3GB of memory consumption under WKS during debugging, but maybe I was misremembering. ~2GB with Satori is definitely good enough. |
seems pretty high, but that's probably on us to some extent. |
|
It's high, but that only represents ~3% of total system memory. I expect this to behave differently on a more limited system, but I'm not able to test that right now. Besides that, the difference in raw performance is staggering. Here's what I would say is a "simple" case of song select (not exactly what I tested above, but leads to similar results): WKS: 2025-05-15.02-43-57.mp4Satori: 2025-05-15.02-44-49.mp4Though I say simple, this is still seemingly allocating on the order of ~500MB/sec according to |
|
Am i reading this correctly that you are almost doubling the average framerate?! |
|
Yeah, but this is, as I've found out now, a pretty extreme case. During gameplay we're only allocating ~2MB/sec, so the GC isn't taking much away from the average but Satori is smoothing out the P99 frame times. I've still seen some concerning behaviours that doesn't align with the general super-low-pause-times (still not as bad as WKS), but I haven't been able to put it in words yet/dig deeper. It's something along the lines of:
I'm not sure if any of this is a problem, or expected behaviour. I would need to test |
|
If you are talking about SustainedLowLatency mode for Satori, i don't think it is supported unless i did something wrong in my testing. When i would set the mode, it wouldn't update the actual value. Based on this, i think Satori only supports Interactive and LowLatency. WKS supports all 4 modes as far as i understand. I also observed zero Gen0 collections in my synthetic benchmarks for Satori in both modes, but hez2010 and huoyaoyuan both do show plenty of Gen0s, so i don't know what to make of this. |
Yeah, you're right. I wasn't sure what the default behaviour would be - makes sense that the default behaviour is to act as Here's the same test as above with 2025-05-15.03-56-39.mp4Looks like working set is reduced while keeping performance about the same, as expected? 👍 |
|
Keeping my eye on the FPS meters on the bottom Interactive mode seems to produce even higher FPS, although i did see some dips in there. But these are still two great options to have!
The memory growth of Satori LL is a potential concern since it is >2x WKS. But you said this is a large memory machine that you are testing on? |
|
That was on a 64GB system. I'll have to find some time to test at lower limits but the easiest path is to get it into more people's hands in any case. |
Satori can generally run in low latency mode with no ill effects, other than turned off compaction may result in higher heap watermark. So I do not know in which way a "sustainable" mode would be different. Right now there is only one low latency mode internally and both |
There are some heuristics that may decide that gen0 is not worth using. Low rate of allocations is one of such cases. Allocating below roughly160 Mb/sec is a low-allocation scenario. (no big science behind this threshold, just had to pick something reasonable for starters).
That could be normal for a low-allocation scenario.
3ms does not seem too bad. I'd expect it to be < 1ms for low-allocation scenario though. In low latency mode blocking stage mostly deals with incremental work created by the app while concurrent GC was doing its thing. There is not a lot of incremental work in general and in low-allocating scenario would be even less. It would be mostly just a validation that all what had to be done has been done. If very curious about what happens, you can disable Gen1 - as in
There is only one kind of low latency mode internally. And in that mode compactions do not happen, so no worries here. |
Ongoing discussion is taking place in dotnet/runtime#96213
I've added
ppy/Satoriwith GHA builds for the GC. The deploy script is now able to download the latest release and attach Satori based on the installation steps provided by the author.This is exposed via the environment variable (NOTE: as opposed to
App.config)USE_SATORI_GC=true.