Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factorio: FPS drops massively when zooming out #4214

Open
kuruczgy opened this issue Dec 13, 2024 · 6 comments
Open

Factorio: FPS drops massively when zooming out #4214

kuruczgy opened this issue Dec 13, 2024 · 6 comments

Comments

@kuruczgy
Copy link

kuruczgy commented Dec 13, 2024

What Game
Factorio 2.0.23 (build 80769 expansion, linux64) (Space Age mods disabled)
https://store.steampowered.com/app/427520/Factorio/

Describe the bug
Massive FPS drop when zooming out.

To Reproduce
Steps to reproduce the behavior:

  1. Create a new singleplayer Freeplay game (I used seed 1)
  2. Start the game
  3. Observe that the game is running at 60 FPS
  4. Zoom out
  5. Observe that it drops to 15 FPS

Note that some menu animations also drop to quite low FPS, I assume it's the same issue.

Expected behavior
The game runs smoothly.

Screenshots and Video
factorio_screenshot

System information:

See this commit for the nix expressions for the bwrap based chroot: https://github.com/kuruczgy/nixos-aarch64-gaming/tree/6a8acdb69f2963a3f5f998f3a65c8520dfffa6b8

Additional context

  • Is this an x86 or x86-64 game: x86-64
  • Does this reproduce on AArch64 with Radeon/Intel/Nvidia: Untested
  • Is this a Vulkan game: Unknown
@Sonicadvance1
Copy link
Member

Reproduced the slowness. Got down to 12FPS when zooming out, looks fully CPU bounded on one thread.

  12.61%  [JIT] tid 17020   [.] 0x00007ffd3b6f9a6c
    -> 0x16c07c0 ->
        DrawCommandBatch::drawSprite(BlendMode, SamplingMode, VideoBitmap const*, float, float, float, float, Color32, float, float, float, float, float, float, float, DrawingFlags, SpriteEffectData, GraphicsEffect)+2192
  11.99%  [JIT] tid 17020   [.] 0x00007ffd3b6f9b14
  -> 0x16c0022 ->
    DrawCommandBatch::drawSprite(BlendMode, SamplingMode, VideoBitmap const*, float, float, float, float, Color32, float, float, float, float, float, float, float, DrawingFlags, SpriteEffectData, GraphicsEffect)+242
  11.63%  [JIT] tid 17020   [.] 0x00007ffd3b6fac30
  -> 0x16c0054 ->
    DrawCommandBatch::drawSprite(BlendMode, SamplingMode, VideoBitmap const*, float, float, float, float, Color32, float, float, float, float, float, float, float, DrawingFlags, SpriteEffectData, GraphicsEffect)+292
   6.23%  [JIT] tid 17020   [.] 0x00007ffd3b702128
  -> 0x16c0450 ->
      DrawCommandBatch::drawSprite(BlendMode, SamplingMode, VideoBitmap const*, float, float, float, float, Color32, float, float, float, float, float, float, float, DrawingFlags, SpriteEffectData, GraphicsEffect)+1312
   6.11%  [JIT] tid 17020   [.] 0x00007ffd3b702148
  -> 0x16c0450 ->
      DrawCommandBatch::drawSprite(BlendMode, SamplingMode, VideoBitmap const*, float, float, float, float, Color32, float, float, float, float, float, float, float, DrawingFlags, SpriteEffectData, GraphicsEffect)+1312
   5.50%  [JIT] tid 17020   [.] 0x00007ffd3b6fa524
  - 0x16c031b -> 
        DrawCommandBatch::drawSprite(BlendMode, SamplingMode, VideoBitmap const*, float, float, float, float, Color32, float, float, float, float, float, float, float, DrawingFlags, SpriteEffectData, GraphicsEffect)+1003
   1.69%  [JIT] tid 17020   [.] 0x00007ffd3ae30ad8
   1.35%  [JIT] tid 17020   [.] 0x00007ffd3ae30ae0
   1.02%  [JIT] tid 17020   [.] 0x00007ffd3ae30ad0
   1.01%  [JIT] tid 17020   [.] 0x00007ffd3b7021a0
   0.97%  [JIT] tid 17020   [.] 0x00007ffd3b6acef4
   0.92%  [JIT] tid 17020   [.] 0x00007ffd3b6acefc
   0.89%  [JIT] tid 17020   [.] 0x00007ffd3b6aceec
   0.86%  [JIT] tid 17020   [.] 0x00007ffd3ae30ad4
   0.82%  [JIT] tid 17020   [.] 0x00007ffd3b6f954c
   0.75%  [JIT] tid 17020   [.] 0x00007ffd3b6f9a54
   0.74%  [JIT] tid 17020   [.] 0x00007ffd3ae30adc
   0.72%  [JIT] tid 17020   [.] 0x00007ffd3b6f9afc
   0.66%  [JIT] tid 17020   [.] 0x00007ffd3b6acee4
   0.60%  [JIT] tid 17020   [.] 0x00007ffd3b6acf04
   0.59%  [JIT] tid 17020   [.] 0x00007ffd3b6acef8
   0.58%  [JIT] tid 17020   [.] 0x00007ffd3bb284d8
   0.56%  [JIT] tid 17020   [.] 0x00007ffd3b6f9a74

Looks like this drawSprite function just gets absolutely hammered, taking up 53% of the CPU time on a single thread.

@alyssarosenzweig
Copy link
Collaborator

so.. the hottest blocks here are tiny, and translated optimally for single block
I believe for factorio, the fix here is multiblock
without multiblock, we're dominated by per-block overheads

(the actual block cache overhead, for one. but for the generated instructions, we're dominated by flag calcs that would get eliminated with multiblock [with the existing global flag opt pass I did last summer])

"Factorio drawSprite+0x890": {
  "x86InstructionCount": 3,
  "ExpectedInstructionCount": 5,
  "Comment": "first load should be rip relative",
  "x86Insts": [
    "movss  xmm9,dword [rbp]",
    "and    r9d,0x800000",
    "movss  dword [rbp-0x58],xmm9"
  ],
  "ExpectedArm64ASM": [
    "ldr s25, [x9]",
    "ands w26, w13, #0x800000",
    "mov x13, x26",
    "stur s25, [x9, #-88]",
    "cfinv"
  ]
},

multiblock would eliminate the move and the cfinv, and then we're optimal
note: instcountci doesn't do rip-relative loads so if that's what's slow, it's not modelled here.
I did find one thing to improve but only in the fourth block of the list
one of the hottest blocks is just a single cmp instruction and a branch

@kuruczgy
Copy link
Author

I tried enabling Multiblock in the config, but no luck, the issue remains the same. It would be interesting to see the arm code generated with multiblock.

@alyssarosenzweig how did you get this dump of the hot spot? (I also see you added it to the test suite in #4226.) I tried enabling DumpIR but that's way too verbose (and also obviously only dumps the IR), something much more precise is needed.

@Sonicadvance1
Copy link
Member

Looks like this is almost entirely TSO emulation bottlenecked. So once again a case of LRCPC not being good enough. Disabling TSO emulation gets me up to 60FPS on X1E. I bet Apple Silicon doesn't have any issue with this due to the hardware TSO support.

@kuruczgy
Copy link
Author

Indeed setting "TSOEnabled": false "fixes" the issue for me. I wonder if this will cause any glitches.

@alyssarosenzweig
Copy link
Collaborator

I tried enabling Multiblock in the config, but no luck, the issue remains the same. It would be interesting to see the arm code generated with multiblock.

@alyssarosenzweig how did you get this dump of the hot spot? (I also see you added it to the test suite in #4226.) I tried enabling DumpIR but that's way too verbose (and also obviously only dumps the IR), something much more precise is needed.

Tangential to the original issue, but with this little script I wrote originally to scrape hot blocks out of bytemark given the RIP:

alyssa@blossom ~> cat bin/hot
python3 /home/alyssa/bin/hot-dbg.py $1 | tee /dev/shm/a.txt | wl-copy
less /dev/shm/a.txt
alyssa@blossom ~> cat bin/hot-dbg.py 
import sys
import subprocess
import json

sym = sys.argv[1]
file, addr = sym.split('+')
SIZE = 0x800
disasm = subprocess.run(["objdump", "-d", "-Mintel", file, "--start-address", addr, "--stop-address", hex(int(addr, 0) + SIZE), "--no-show-raw-insn"], capture_output=True).stdout.decode('utf-8')
lines = ((disasm.split('\n')[6:]))
title = lines[0]
rest = lines[1:]

SUBS = {
        'PTR ': '',
        'DWORD': 'dword',
        'QWORD': 'qword',
        'XMMword': 'oword',
        'YMMword': 'yword',
        'WORD': 'word',
        'BYTE': 'byte',
        'rip': 'rbp',
}

x86Insts = []
for x in lines[1:]:
    if '\t' in x:
        rep = x.split('\t')[1]
        for i in range(2):
            for key in SUBS:
                rep = rep.replace(key, SUBS[key])
        rep = rep.split('        # ')[0]
        x86Insts += [rep]

jj = {
    title: {
      "ExpectedInstructionCount": 0,
      "x86Insts": x86Insts,
      "ExpectedArm64ASM": [],
    }
}

print('\n'.join(json.dumps(jj,indent=2).split('\n')[1:-1]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants