Skip to content

Commit

Permalink
FEX 2407
Browse files Browse the repository at this point in the history
Signed-off-by: Alyssa Rosenzweig <[email protected]>
  • Loading branch information
alyssarosenzweig authored and Sonicadvance1 committed Jul 4, 2024
1 parent d3e3400 commit d91bd55
Show file tree
Hide file tree
Showing 15 changed files with 119 additions and 0 deletions.
119 changes: 119 additions & 0 deletions _posts/2024-06-28-FEX-2407.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
layout: post
title: FEX 2407 Tagged ... with AVX!
author: FEX-Emu Maintainers
---

I hope you're ready to game, because your Arm system just got support for AVX.
And AVX2. And FMA3. And F16C. And BMI1. And BMI2. And VAES. And VPCLMULQDQ.

Yeah, we've been busy.

We did a little team-building exercise this month: "bring up AVX on 128-bit
hardware in a week". Now our team is built, and AVX games run on FEX:

<a href="{{site.baseurl}}/images/posts/2407-06-28/Forspoken.png"><img title="Forspoken running on FEX" width="100%" src="{{site.baseurl}}/images/posts/2407-06-28/Forspoken.avif"/></a>
<a href="{{site.baseurl}}/images/posts/2407-06-28/MetroExodus.png"><img title="Metro Exodus running on FEX" width="100%" src="{{site.baseurl}}/images/posts/2407-06-28/MetroExodus.avif"/></a>
<a href="{{site.baseurl}}/images/posts/2407-06-28/DeathStranding.png"><img title="Death Stranding running on FEX" width="100%" src="{{site.baseurl}}/images/posts/2407-06-28/DeathStranding.avif"/></a>
<a href="{{site.baseurl}}/images/posts/2407-06-28/Helldiver.png"><img title="Helldiver running on FEX" width="100%" src="{{site.baseurl}}/images/posts/2407-06-28/Helldiver.avif"/></a>

## AVX on 128-bit Arm

Computers traditionally perform one operation at a time. The hardware decodes
an instruction, evaluates the operation on a single pair of numbers, and
repeats for the next instruction. In mathematical terms, the instructions
operate on scalars.

That design leaves performance on the table.

Many programs repeat one operation many times with different data. Modern
instruction sets exploit that repetition. A single "vector" instruction can
operate on multiple pieces of data at once. Programs will perform the same
amount of arithmetic overall, but there are fewer instructions to decode and
the arithmetic is more predictable. That enables more efficient hardware.

A "scalar" instruction adds a pair of numbers; a "vector" instruction adds
multiple pairs. How many pairs? That is, what length is the vector?

That's a design trade-off. Increasing the vector length decreases the number of
instructions we need to execute while increasing the hardware cost.
Supporting large vectors efficiently requires a large register file and many
arithmetic logic units. Besides, there are diminishing returns past a certain
vector length.

There is no-one-size-fits-all vector length. Different instruction sets make
different choices. In x86, the SSE instruction set uses 128-bit vectors, while
AVX and AVX-512 instructions support 256-bit and 512-bit respectively. For Arm,
the traditional ASIMD (NEON) instructions use 128-bit vectors. Depending on the
specific Arm hardware implementation, the flexible new SVE instructions can use
either 128-bit or 256-bit.

For performance, we try to translate each x86 instruction to an equivalent arm64
instruction. There's no perfect 1:1 correspondence, but we can get close. For
vector instructions, we translate 128-bit SSE instructions into equivalent
128-bit ASIMD instructions.

In theory, we can do the same for AVX, mapping 256-bit AVX instructions to
256-bit SVE instructions. [Mai](https://github.com/lioncash) implemented that last year, speculating that their
work would enable AVX on future 256-bit SVE hardware.

...

That hardware never came. Some recent hardware supports SVE but only 128-bit.
Others don't have that, supporting only ASIMD. We had a shiny AVX-256
implementation with no 256-bit hardware to use it with.

Our position remains that efficient AVX emulation requires 256-bit SVE. Unfortunately,
many games today have a hard requirement on AVX. We want to let you play
those games on your Arm devices, so we need to plug our noses and implement
256-bit AVX on 128-bit hardware.

The idea is simple; the implementation is not. To translate a 256-bit
instruction, we decompose it into two 128-bit instructions operating on each
half of the 256-bit vector. In effect, we partially "undo" the vectorization.

This plan has a gaping hole: the register file. Arm has more general purpose
registers than x86, so we statically assign each x86 register to an Arm
register. If we didn't, accessing certain x86 registers would require slow
memory accesses. All efficient x86-on-Arm emulators therefore statically map
registers.

This scheme unfortunately fails with AVX emulation. Our Arm hardware has 128-bit
vectors, but we're emulating 256-bit AVX vectors. We would need twice as many Arm
vector registers as x86 vector registers, and we don't have enough.

Still, running AVX games suboptimally is better than not running at all. Yes,
we're out of registers -- we'll just have to keep some in memory. The assembly
isn't pretty, but it works well enough.

Building on Mai's SVE-256 implementation of AVX, [Ryan](https://github.com/Sonicadvance1) whipped together a
version supporting both SVE-128 and ASIMD. That means it should work on
all arm64 devices, all the way back to ARMv8.0.

## F16C, FMA, and more

AVX isn't the only x86 extension new games require. After beating AVX, x86 has a
post-AVX questline for emulator developers, with extensions like F16C.
Fortunately, nothing could scare Ryan and Mai after AVX, and they tackled the
new extensions without a hitch.

## Speeding up translation

Dynamically assigning AVX registers means we can't translate instructions
directly and expect good results. While we don't need a full optimizing
compiler, we do need enough intelligence for basic inter-instruction
optimizations. FEX has had some optimizations since day one, but the priority
has always been on bringing up new games. There's been even less focus on the
*translation* time -- not how fast the generated arm64 code executes but how
fast we can generate the arm64 code. Translation overhead contributes to slow
loading screens and in-game stutter, so while it flies under the issue radar,
it does matter. So, [Alyssa](https://github.com/alyssarosenzweig) optimized the
FEX optimizer this month by merging compiler passes. That both simplifies the
code and speeds up translation. Since the start of June, we've reduced
translation time **10%**... and more optimization is coming.

Happy gaming :-)

<a href="{{site.baseurl}}/images/posts/2407-06-28/MonsterHunter.png"><img title="Monster Hunter running on FEX" width="100%" src="{{site.baseurl}}/images/posts/2407-06-28/MonsterHunter.avif"/></a>
<a href="{{site.baseurl}}/images/posts/2407-06-28/Crysis3.png"><img title="Crysis 3 running on FEX" width="100%" src="{{site.baseurl}}/images/posts/2407-06-28/Crysis3.avif"/></a>
<a href="{{site.baseurl}}/images/posts/2407-06-28/HellbladeII.png"><img title="Hellblade II running on FEX" width="100%" src="{{site.baseurl}}/images/posts/2407-06-28/HellbladeII.avif"/></a>
Binary file added images/posts/2407-06-28/Crysis3.avif
Binary file not shown.
Binary file added images/posts/2407-06-28/Crysis3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/posts/2407-06-28/DeathStranding.avif
Binary file not shown.
Binary file added images/posts/2407-06-28/DeathStranding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/posts/2407-06-28/Forspoken.avif
Binary file not shown.
Binary file added images/posts/2407-06-28/Forspoken.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/posts/2407-06-28/HellbladeII.avif
Binary file not shown.
Binary file added images/posts/2407-06-28/HellbladeII.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/posts/2407-06-28/Helldiver.avif
Binary file not shown.
Binary file added images/posts/2407-06-28/Helldiver.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/posts/2407-06-28/MetroExodus.avif
Binary file not shown.
Binary file added images/posts/2407-06-28/MetroExodus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/posts/2407-06-28/MonsterHunter.avif
Binary file not shown.
Binary file added images/posts/2407-06-28/MonsterHunter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d91bd55

Please sign in to comment.