Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fosselize_replay eats all RAM when background processing shaders #210

Open
tdaven opened this issue Jan 5, 2023 · 21 comments
Open

fosselize_replay eats all RAM when background processing shaders #210

tdaven opened this issue Jan 5, 2023 · 21 comments

Comments

@tdaven
Copy link

tdaven commented Jan 5, 2023

This seems to happen in multiple games. Latest issue occurs with "Red Dead Redemption 2" as well as "The Elder Scrolls V: Skyrim Special Edition".

Both games, cannot complete shader processing with out eating all system memory.

System Details:

  • Memory: 64GB
  • CPU: i7-13700k
  • GPU: RX 6700 XT.
  • OS : Fedora 37
  • Kernel Version: 6.0.16-300.fc37.x86_64
  • Steam API: V020
  • Steam package version: 1671236931
  • Mesa: 22.3.2

Steam was installed from the rpmfusion.org repo.

The issue looks very similar to #194 , #198 or #84.

Something seems to cause it to not limit memory use. If I watch the memory use and enable/disable processing, you can aid it through this process as it is making progress. It just doesn't complete before it uses all memory.

Happy to help troubleshoot with direction. Haven't found anything particularly helpful in ~/.local/share/Steam/logs/shader_log.txt. Not sure how the manually run fosselize_replay to troubleshoot further.

@sampie
Copy link

sampie commented Jan 6, 2023

I have the same issue. I am running Ubuntu 22.10.

@marcosbitetti
Copy link

Same issue here
Pop!_OS 22.04 LTS
nVidia GTX 1650 OC 4GB
nVidia driver version 525.60.11

@tdaven
Copy link
Author

tdaven commented Jan 28, 2023

This an example of what I see happen with memory usage. Just happened again after updating steam. There where 6 fosselize_replay processes each consuming over 9+GB of RAM. Swap had been disabled in a hope the fosselize processes would just OOM but that didn't happen.

image

The big spike fell due to the fosselize processes being killed manually.

@kakra
Copy link
Contributor

kakra commented Jan 29, 2023

@tdaven Does your kernel have /proc/pressure/memory? Does this happen while fossilize is background processing the shaders (Steam is idle) or foreground processing shaders (starting a game with the Vulkan shaders dialog running)?

@tdaven
Copy link
Author

tdaven commented Jan 29, 2023

@tdaven Does your kernel have /proc/pressure/memory? Does this happen while fossilize is background processing the shaders (Steam is idle) or foreground processing shaders (starting a game with the Vulkan shaders dialog running)?

@kakra Yes. Standard Fedora 37 kernel which has /proc/pressure/memory.

For example:

[tdaven@desktop ~]$ cat /proc/pressure/memory 
some avg10=0.00 avg60=0.00 avg300=0.00 total=29518146
full avg10=0.00 avg60=0.00 avg300=0.00 total=28791057
[tdaven@desktop ~]$ 

It happens during background processing typically. Usually triggered after an update for the game is installed and fosselize is triggered.

@kakra
Copy link
Contributor

kakra commented Jan 29, 2023

If you watch cat /proc/pressure/{memory,io}, do these values raise before the problem hits? Because then fossilize should throttle itself down by putting some threads into "stopped" phase... I've had a lot of similar problems before PSI support was introduced, and no problems since then. Actually, I inspired to introduce PSI support into fossilize. So I wonder what's different about your system? Also, the high memory usage should have been fixed a long time ago by sharing memory between processes properly.

Does your kernel have transparent hugepages force-enabled? If cat /sys/kernel/mm/transparent_hugepage/enabled says always, then try setting it to madvise or never. It could explain why memory usage kind of "explodes".

@tdaven
Copy link
Author

tdaven commented Feb 1, 2023

Transparent huge pages:

[tdaven@desktop ~]$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

I see /proc/pressure/io change, but memory seems to always be zero, besides the total. This used to not happen on an older computer. I only started having this problem on this newer system which has more cores and memory.

@Noctis-Bennington
Copy link

Noctis-Bennington commented Feb 11, 2023

I'm having the same problem. Image here

OS: Ubuntu 22.04
CPU: AMD Ryzen 7 4000
GPU: Radeon RX5600m

@kakra
Copy link
Contributor

kakra commented Feb 11, 2023

I'm having the same problem.

The memory usage is shared between processes. You need to look at the PSS size, or consider shared memory usage, too.

@Noctis-Bennington
Copy link

I'm having the same problem.

The memory usage is shared between processes. You need to look at the PSS size, or consider shared memory usage, too.

Sorry, I'm lost with your answer. There are four process sharing memory far as I see. And these process don't stop consuming memory until get OOM (this happens in seconds).

@cachandlerdev
Copy link

cachandlerdev commented Feb 19, 2023

I have noticed this issue appearing on Fedora in both Skyrim and No Man's Sky, where Steam rapidly starts eating my 32GB of ram while processing Vulkan shaders and will happily lock up my pc unless I manually kill the process in time.

Cpu: AMD Ryzen 7 7700X
Gpu: Nvidia RTX 2070 Super

Edit: The memory bug also occurs while processing Battlefield 1's shaders.

Edit 2: This happens regardless of whether foreground (launching the game) or background processing is happening.

@Noctis-Bennington
Copy link

What I saw is the fact that it happens only in a few games, but big games. Left 4 Dead 2 is one of them.

@WildPenquin
Copy link

WildPenquin commented Aug 25, 2023

This has recently started to happen to me up to the point I can not leave Steam running. It will eat up to 30GiB of RAM, which makes the computer nearly unusable.

It will always start many, many threads of fossilize_replay (but the thread usage is not as much of a problem, the RAM usage is). Seems like AppID 346110 (Ark: Survival Evolved) (Proton issue ValveSoftware/Proton#3218) is triggering this problem more often than others. I've seen huge amount of RAM usage for other AppIDs, too. Ark:SE is triggering this so often that I've considered uninstalling it.

@nwestervelt
Copy link

This has started happening to me on Arch Linux with Deep Rock Galactic recently. Runs out of memory if the shaders are processing in the background or in the foreground before the game launches.

If I disable shader precaching, it happens while the game is running (I assume from the shaders compiling while the game is open).

Current mesa version: 1:23.1.6-4

@kisak-valve
Copy link
Member

Hello @WetWayne, mesa 23.1.6 (specifically) has a known memory leak which should hopefully be fixed in the next point release.

Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9599

@nwestervelt
Copy link

@kisak-valve In that case, disregard my comment.

@Hubro
Copy link

Hubro commented Sep 15, 2023

This has been happening to me the last few weeks, it seems to have started after I installed Armored Core 6.

A bunch of fossilize_replay processes will be running and consuming a ton of RAM, and sometimes they suddenly consume everything I have (64 GB) and lock up my PC for a few seconds before the Steam process is killed. Sometimes this even happens in a loop, where every 10-20 seconds or so, my PC will freeze for a few seconds and Steam will get killed and automatically restart, until I kill Steam myself.

It happens right after I start Steam, as well as randomly in the background as I'm doing other stuff. Strangely it hasn't stopped happening after I uninstalled Armored Core.

System Information: https://gist.github.com/Hubro/10e2be14104aecb0f0e42f4ec9fe4c82

I'm also running mesa 23.1.6, so hopefully it will resolve itself soon.

@Danternas
Copy link

Same issue. Restarting Steam seem to solve the issue better than killing the process. It seem to be stuck in some kind of loop, going out of memory and then restarting. Even with 32gb of ram it fills the ram in seconds.

@kakra
Copy link
Contributor

kakra commented Nov 28, 2023

Lately, I found that KDE baloo may fight over resources with fossilize. Are you running KDE with baloo? Try stopping it and see if it helps.

@CaptaiNiveau
Copy link

I ran into this now. I switched my desktop to Arch Linux on zfs, before I didn't notice an issue like this.
If I kill Rocket League and start it again without waiting long, the shader compilation eats up all RAM and kills most of my open applications.

Not sure if this is related to zfs, but it's the only major thing that changed compared to my past installs.

@kakra
Copy link
Contributor

kakra commented Apr 12, 2024

I think this is because fossilize uses more threads if it starts working in the foreground - that is, if you start the game and Steam shows the progress dialog for fossilize. If you have a spare partition with another filesystem, try moving the shader caches there: I have a spare SSD formatted with xfs, and moved $HOME/.steam/steam/steamapps/shaderchache there, and then created a symlink from the old to the new location. This also takes away some filesystem lock contention from the game library while the game reads and writes shader caches, and reads game data at the same time (I'm using btrfs for my rootfs).

If you don't have a spare partition, maybe it helps creating an additional zvol dedicated for the shadercache, so this will split IO operations to a dedicated volume. E.g., in my case I split IO operations in btrfs to dedicated subvolumes which allows the system to run into less lock contentions during heavy IO load.

Also, check if /proc/pressure/io exists: If it does, fossilize will use that to reduce cache thrashing if IO latency spikes, so the system would not start to stall. If it doesn't exist, you may need to enable it at boot time. Check your distribution docs how to do that (the feature is called PSI: pressure stall information).

Fossilize creates a lot of random reads and writes due to how it uses its files (memory mapped and shared memory). zfs and btrfs are particularly bad at those patterns because they are copy on write filesystems. fossilize optimizes the reads at least by pre-caching everything needed sequentially but the writes are still pretty non-linear, and that's a very bad pattern for copy on write extents. Because those writes are small, random, and slow, dirty cache data piles up in the page cache, leading to high memory usage. Accelerating those writes should help (e.g., by using a dedicated non-cow filesystem, or using other write accelerators like bcache on btrfs with allocation hint patches, or slog on zfs). Take note that zfs itself needs a lot of memory for being fast: a 32 GB desktop system may be way too small for using zfs with high random write loads effectively. Also take note that due to the small random writes, SSDs really cannot do magic here, they are slow at those IO patterns (still faster than spinning, but slow with high latency).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests