-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8471a24
commit 8345700
Showing
1 changed file
with
172 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
--- | ||
layout: post | ||
title: FEX 2312 Tagged! | ||
author: FEX-Emu Maintainers | ||
--- | ||
|
||
We're back with another month of changes. After last month being a bit slower, we're back in the swing of implementing more optimizations and bug fixes. No dilly dallying, let's get right in to it! | ||
|
||
## More optimizations this month! | ||
Once again this month has a whole bunch of optimizations that is very exciting! We will lightly go over the changes to talk about what changed. | ||
|
||
### Keep guest SF/ZF/CF/OF flags resident in host NZCV | ||
This is one of the bigger optimizations this month. A bit of backstory is needed for what this optimization is for. x86 has a flags register called **EFLAGS** which contains quite a few random bits of information. The subflags we care about here are the SF, ZF, CF, and OF flags inside of it. These are various flags that are set typically from ALU operations for information depending on the result. So something like an integer Add will set ZF if the result was zero, SF if the result has the sign bit set, CF if a carry occured, and OF if the operation overflowed. These are usually quite cheap for the CPU to calculate by itself, but manually calculating the flags usually takes a few additional instructions each. | ||
|
||
The original implementation inside of FEX for calculating these flags would spend the additional instructions and calculate each one manually. This | ||
would usually end up with a dozen or so of additional instructions for calculating flags. While FEX would typically optimize out the calculations if | ||
they weren't used, it would still add CPU time when we couldn't. | ||
|
||
Luckily ARM also has a flags register called NZCV which maps almost perfectly to x86's EFLAGS. This lets us optimize these instruction implementations | ||
to instead use the ARM flags directly. This has a couple of effects, not only does it remove the instructions from our code generation, it has | ||
knock-on effects that the flags are now stored inside of the NZCV which reduces memory accesses. A multi-hit combo for improving performance. | ||
|
||
While not all x86 instructions map their flags registers 1:1, this has a fairly significant performance uplift in most situations! | ||
|
||
### Dedicate registers for PF/AF | ||
Related to the previous change, x86 has two flags registers stored inside of **EFLAGS** that doesn't have a direct equivalent on ARM CPUs. These two | ||
flags are fairly uncommon but instructions will still generate them. These flags have the additional problem that they are fairly costly to calculate, | ||
with one of them requiring a GPR population count instruction which ARM doesn't even support until new instruction extensions called CSSC. While in | ||
most cases the result of these flags isn't used, the overhead of calculating them can add up a bit. This is why we are now dedicating two registers to | ||
these flags to reduce their overhead as much as possible! | ||
|
||
### Misc optimizations | ||
- Optimize BT/BTC/BTS/BTR | ||
- Optimize shifts/rotates | ||
- Optimize selects & branches & more nzcv goodies | ||
- Optimize three sha instructions | ||
- Make "not" not garbage | ||
- Optimize memcpy and memset when direction is compile time constant | ||
|
||
With all these optimizations in place this month we have a fairly significant performance uplift! | ||
|
||
<div id="bytemark_64bit" style="min-width: 250px; height: 300px; margin: 0 auto"> | ||
</div> | ||
|
||
<div id="geekbench_5" style="min-width: 250px; height: 300px; margin: 0 auto"> | ||
</div> | ||
|
||
While Geekbench is showing a fairly modest **17.6%** performance uplift, bytemark is showing up to a **60%** performance uplift! Over the course of | ||
the last three months we have had benchmarks that have improved by over 100%! These improvements can be seen in games as well, with some CPU heavy | ||
games have had their FPS improve by over 2x. In a lot of games tested they have changed from being CPU limited to GPU limited on our Lenovo X13s | ||
laptops even! We are looking forward to when these companies release new laptops based on [Snapdragon X | ||
Elite](https://en.wikipedia.org/wiki/List_of_Qualcomm_Snapdragon_systems_on_chips#Snapdragon_X_series) in the middle of next year! | ||
|
||
## Various bug fixes | ||
In addition to performance improvements, we have some bug fixes this month. | ||
- Fixed corruption in the JIT | ||
- Caused corruption with x87 heavy games | ||
- Fixes integer multiply corrupting results | ||
- Corrupted some register state, which was breaking the game Dungeon Defenders | ||
|
||
## Support extracting erofs images | ||
One of the features that FEXRootFSFetcher was missing was the ability to extract erofs images once downloaded. This was because we didn't know that | ||
erofs-utils provided an application for extracting these images without FUSE. Turns out the developers put an extractor inside of their fsck | ||
application that we had completely missed! Now if a user wants to extract an x86 rootfs image for lower overhead, they can do this directly from our | ||
FEXRootFSFetcher tool. | ||
|
||
## Preparation for improving gdbserver | ||
GDBServer is a socket interface that GDB supports for remotely debugging applications. One of the harder things about working on FEX-Emu is that the | ||
ability to debug an application is usually quite hard. GDBServer is a way to improve this situation so that GDB can remotely connect to a FEX process. | ||
|
||
There's a bunch of work this month towards cleaning up this interface and getting it to work correctly. While it is still not quite usable for | ||
debugging, we are working towards this so applications can actually be debugged! | ||
|
||
## Improvements to WOW64 compatibility for newer WINE | ||
Newer versions of WINE has changed some behaviour around WOW64 support. So this month we have added support for some of this newer behaviour. Thanks | ||
again to [Bylaws](https://github.com/bylaws) for implementing this! | ||
|
||
## FEX rootfs image updates | ||
This month we are updating our rootfs images to incorporate the latest [Mesa 23.3.0](https://docs.mesa3d.org/relnotes/23.3.0.html) release that | ||
occured a few days ago. We have updated our Ubuntu 22.04, 23.04, 23.10, ArchLinux, and Fedora 38 images with this latest version of mesa. As usual if | ||
there are any issues, let us know so we can sort them out. | ||
|
||
# Video game showcase | ||
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/XquvrY4tyvI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe> | ||
|
||
See the [2312 Release Notes](https://github.com/FEX-Emu/FEX/releases/tag/FEX-2312) or the [detailed change log](https://github.com/FEX-Emu/FEX/compare/FEX-2311...FEX-2312) in Github. | ||
|
||
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js"> | ||
</script> | ||
<script src="https://code.highcharts.com/highcharts.js"> | ||
</script> | ||
<script src="https://code.highcharts.com/modules/exporting.js"> | ||
</script> | ||
|
||
<script type="text/javascript"> | ||
Highcharts.chart('bytemark_64bit', { | ||
chart: { | ||
type: 'column' | ||
}, | ||
title: { | ||
text: 'Cortex-X1C bytemark 64-bit uplift between FEX-2311 and FEX-2312', | ||
align: 'left' | ||
}, | ||
xAxis: { | ||
categories: ['Numeric Sort', 'String Sort', 'Bitfield', 'FPEmu', 'Fourier', 'Assign', 'Idea', 'Huffman', 'NN', 'LU Decomp'], | ||
crosshair: true, | ||
accessibility: { | ||
description: 'sub-benchmarks' | ||
} | ||
}, | ||
yAxis: { | ||
min: -5, | ||
title: { | ||
text: 'Performance improvement %' | ||
} | ||
}, | ||
tooltip: { | ||
valueSuffix: '%' | ||
}, | ||
plotOptions: { | ||
column: { | ||
pointPadding: 0.2, | ||
borderWidth: 0 | ||
} | ||
}, | ||
series: [ | ||
{ | ||
name: 'FEX-2312', | ||
data: [15.27, 60.24, 5.08, 20.52, 13.59, 39.56, 18.96, 29.24, 2.11, 12.41] | ||
} | ||
] | ||
}); | ||
|
||
Highcharts.chart('geekbench_5', { | ||
chart: { | ||
type: 'column' | ||
}, | ||
title: { | ||
text: 'Lenovo X13s Geekbench 5.4.0 uplift between FEX-2311 and FEX-2312', | ||
align: 'left' | ||
}, | ||
xAxis: { | ||
categories: ['Single-core score', 'Multi-core score'], | ||
crosshair: true, | ||
accessibility: { | ||
description: 'sub-benchmarks' | ||
} | ||
}, | ||
yAxis: { | ||
min: 0, | ||
title: { | ||
text: 'Performance improvement %' | ||
} | ||
}, | ||
tooltip: { | ||
valueSuffix: '%' | ||
}, | ||
plotOptions: { | ||
column: { | ||
pointPadding: 0.2, | ||
borderWidth: 0 | ||
} | ||
}, | ||
series: [ | ||
{ | ||
name: 'FEX-2312', | ||
data: [17.6, 17.3] | ||
} | ||
] | ||
}); | ||
|
||
</script> |