Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openxr: Update repository for working on it. #1

Open
wants to merge 1,974 commits into
base: openxr
Choose a base branch
from

Conversation

Uklosk
Copy link

@Uklosk Uklosk commented Apr 24, 2021

This pull request is for the issue dolphin-emu#8380. I have just updated your work to master and latest OpenXR version.

Pokechu22 and others added 30 commits March 6, 2021 19:27
This value will be used in the register description; so expose it in a way that can be re-used instead of calculating it in 2 places later.
Additional changes:
- For TevStageCombiner's ColorCombiner and AlphaCombiner, op/comparison and scale/compare_mode have been split as there are different meanings and enums if bias is set to compare.  (Shift has also been renamed to scale)
- In TexMode0, min_filter has been split into min_mip and min_filter.
- In TexImage1, image_type is now cache_manually_managed.
- The unused bit in GenMode is now exposed.
- LPSize's lineaspect is now named adjust_for_aspect_ratio.
BPMEM_TEV_COLOR_ENV + 6 (0xC6) was missing due to a typo.  BPMEM_BP_MASK (0xFE) does not lend itself well to documentation with the current FIFO analyzer implementation (since it requires remembering the values in BP memory) but still shouldn't be treated as unknown.  BPMEM_TX_SETMODE0_4 and BPMEM_TX_SETMODE1_4 (0xA4-0xAB) were missing entirely.
Graphics refactoring + add names and descriptions in FIFO analyzer
Fixes issue 11393.

The problem is that left and top make no sense for a width by height array; they only make sense in a larger array where from which a smaller part is extracted.  Thus, the overall size of the array is provided to CopyRegion in addition to the sub-region.  EncodeXFB already handles the extraction, so CopyRegion's only use there is to resize the image (and thus no sub-region is provided).
Fixes Jimmie Johnson's Anything with an Engine.
…er-skip-scan

Android: Move "skip scanning" logic to MainPresenter
Android: Don't save settings immediately after switching platform tab
Breakpoints: Change icon when disabled
The async operations may contain references to class members, so
any running async operations must end before destroying the class.
The loop in WIARVZFileReader::Chunk::Read could terminate
prematurely if the size argument was smaller than the size
of an exception list which had only been partially loaded.
Avoiding desyncs is more important than honoring what the user
specified on the command line.
Fixes netplay and movie overrides of SYSCONF settings not applying.
Whether the custom RTC setting is enabled shouldn't in itself
affect determinism (as long as the actual RTC value is properly
synced). Alters the logic added in 4b2906c.

I'm not entirely certain that this is correct, but the current
code doesn't really make sense to me... If we need to force the
RTC bias to 0 when custom RTC is enabled, why don't we need to
do it when custom RTC is disabled? The code for getting the
host system's current time doesn't contain any special handling
for the guest's RTC bias as far as I can tell.
When the dividend is known at compile time, we can eliminate some of the
branching and precompute the result for the overflow case.

Before:
B8 54 D3 E6 02       mov         eax,2E6D354h
85 FF                test        edi,edi
74 0C                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0C                jne         normal_path
83 FF FF             cmp         edi,0FFFFFFFFh
75 07                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
8B F8                mov         edi,eax
EB 05                jmp         done
normal_path:
99                   cdq
F7 FF                idiv        eax,edi
8B F8                mov         edi,eax
done:

After:
85 FF                test        edi,edi
75 04                jne         normal_path
33 FF                xor         edi,edi
EB 0A                jmp         done
normal_path:
B8 54 D3 E6 02       mov         eax,2E6D354h
99                   cdq
F7 FF                idiv        eax,edi
8B F8                mov         edi,eax
done:

Fairly common with constant dividend of zero. Non-zero values occur
frequently in Ocarina of Time Master Quest.
Zero divided by any number is still zero. For whatever reason, this case
shows up frequently too.

Before:
B8 00 00 00 00       mov         eax,0
85 F6                test        esi,esi
74 0C                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0C                jne         normal_path
83 FE FF             cmp         esi,0FFFFFFFFh
75 07                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
8B F8                mov         edi,eax
EB 05                jmp         done
normal_path:
99                   cdq
F7 FE                idiv        eax,esi
8B F8                mov         edi,eax
done:

After:
Nothing!
Add a function to calculate the magic constants required to optimize
signed 32-bit division.

Since this optimization is not exclusive to any particular architecture,
JitCommon seemed like a good place to put this.
Optimize division by a constant into multiplication. This method is also
used by GCC and LLVM.

We also add optimized paths for divisors 0, 1, and -1, because they
don't work using this method. They don't occur very often, but are
necessary for correctness.

- Division by 1
Before:
41 BF 01 00 00 00    mov         r15d,1
41 8B C5             mov         eax,r13d
45 85 FF             test        r15d,r15d
74 0D                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0E                jne         normal_path
41 83 FF FF          cmp         r15d,0FFFFFFFFh
75 08                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
44 8B F8             mov         r15d,eax
EB 07                jmp         done
normal_path:
99                   cdq
41 F7 FF             idiv        eax,r15d
44 8B F8             mov         r15d,eax
done:

After:
45 8B FD             mov         r15d,r13d

- Division by 30307
Before:
41 BA 63 76 00 00    mov         r10d,7663h
41 8B C5             mov         eax,r13d
45 85 D2             test        r10d,r10d
74 0D                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0E                jne         normal_path
41 83 FA FF          cmp         r10d,0FFFFFFFFh
75 08                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
44 8B C0             mov         r8d,eax
EB 07                jmp         done
normal_path:
99                   cdq
41 F7 FA             idiv        eax,r10d
44 8B C0             mov         r8d,eax
done:

After:
49 63 C5             movsxd      rax,r13d
48 69 C0 65 6B 32 45 imul        rax,rax,45326B65h
4C 8B C0             mov         r8,rax
48 C1 E8 3F          shr         rax,3Fh
49 C1 F8 2D          sar         r8,2Dh
44 03 C0             add         r8d,eax

- Division by 30323
Before:
41 BA 73 76 00 00    mov         r10d,7673h
41 8B C5             mov         eax,r13d
45 85 D2             test        r10d,r10d
74 0D                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0E                jne         normal_path
41 83 FA FF          cmp         r10d,0FFFFFFFFh
75 08                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
44 8B C0             mov         r8d,eax
EB 07                jmp         00000000161737E7
normal_path:
99                   cdq
41 F7 FA             idiv        eax,r10d
44 8B C0             mov         r8d,eax
done:

After:
49 63 C5             movsxd      rax,r13d
4C 69 C0 19 25 52 8A imul        r8,rax,0FFFFFFFF8A522519h
49 C1 E8 20          shr         r8,20h
44 03 C0             add         r8d,eax
C1 E8 1F             shr         eax,1Fh
41 C1 F8 0E          sar         r8d,0Eh
44 03 C0             add         r8d,eax
When the multiplier is positive (which is the most common case), we can
generate slightly better code.

- Division by 30307
Before:
49 63 C5             movsxd      rax,r13d
48 69 C0 65 6B 32 45 imul        rax,rax,45326B65h
4C 8B C0             mov         r8,rax
48 C1 E8 3F          shr         rax,3Fh
49 C1 F8 2D          sar         r8,2Dh
44 03 C0             add         r8d,eax

After:
49 63 C5             movsxd      rax,r13d
4C 69 C0 65 6B 32 45 imul        r8,rax,45326B65h
C1 E8 1F             shr         eax,1Fh
49 C1 F8 2D          sar         r8,2Dh
44 03 C0             add         r8d,eax
Power-of-two divisors can be done more elegantly, so handle them
separately.

- Division by 4
Before:
41 BD 04 00 00 00    mov         r13d,4
41 8B C0             mov         eax,r8d
45 85 ED             test        r13d,r13d
74 0D                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0E                jne         normal_path
41 83 FD FF          cmp         r13d,0FFFFFFFFh
75 08                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
44 8B E8             mov         r13d,eax
EB 07                jmp         done
normal_path:
99                   cdq
41 F7 FD             idiv        eax,r13d
44 8B E8             mov         r13d,eax
done:

After:
45 85 C0             test        r8d,r8d
45 8D 68 03          lea         r13d,[r8+3]
45 0F 49 E8          cmovns      r13d,r8d
41 C1 FD 02          sar         r13d,2
...and let's optimize a divisor of 2 ever so slightly for good measure.
I wouldn't have bothered, but most GameCube games seem to hit this on
launch.

- Division by 2
Before:
41 BE 02 00 00 00    mov         r14d,2
41 8B C2             mov         eax,r10d
45 85 F6             test        r14d,r14d
74 0D                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0E                jne         normal_path
41 83 FE FF          cmp         r14d,0FFFFFFFFh
75 08                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
44 8B F0             mov         r14d,eax
EB 07                jmp         done
normal_path:
99                   cdq
41 F7 FE             idiv        eax,r14d
44 8B F0             mov         r14d,eax
done:

After:
45 8B F2             mov         r14d,r10d
41 C1 EE 1F          shr         r14d,1Fh
45 03 F2             add         r14d,r10d
41 D1 FE             sar         r14d,1
Both the normal path and the overflow path end with the same
instruction, so their tails can be merged.

Before:
41 8B C7             mov         eax,r15d
45 85 C0             test        r8d,r8d
74 0D                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0E                jne         normal_path
41 83 F8 FF          cmp         r8d,0FFFFFFFFh
75 08                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
44 8B F0             mov         r14d,eax
EB 07                jmp         done
normal_path:
99                   cdq
41 F7 F8             idiv        eax,r8d
44 8B F0             mov         r14d,eax
done:

After:
41 8B C7             mov         eax,r15d
45 85 C0             test        r8d,r8d
74 0D                je          overflow
3D 00 00 00 80       cmp         eax,80000000h
75 0B                jne         normal_path
41 83 F8 FF          cmp         r8d,0FFFFFFFFh
75 05                jne         normal_path
overflow:
C1 F8 1F             sar         eax,1Fh
EB 04                jmp         done
normal_path:
99                   cdq
41 F7 F8             idiv        eax,r8d
done:
44 8B F0             mov         r14d,eax
Suggested by @MerryMage. Thanks!

Co-authored-by: merry <[email protected]>
Namespace-scope variable was only used in one function so move it there
Loop index int i was being compared against GetControllerCount() which
returned a size_t.  This was the only place GetControllerCount() was
called from so the change of return type doesn't disturb anything else.

Changing the loop index to size_t wouldn't work as well since it's
passed into GetController(), which takes an int and is called from many
places, so it would need a cast anyway on an already busy line.
@Uklosk Uklosk changed the title Openxr Openxr: Update repository for working on it. Apr 24, 2021
JosJuice and others added 24 commits April 25, 2021 13:01
When the interpreter writes to a discarded register, its type
must be changed so that it is no longer considered discarded.

Fixes a 62ce1c7 regression.
Jits: Fix interpreter fallback handling of discarded registers
Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix
other games that have problems with float/paired instructions
in JitArm64, but I haven't tested any.
This simplifies some of the following commits. It does require
an extra register, but hey, we have 32 of them.

Something I think would be nice to add to the register cache
in the future is the ability to keep both the single and double
version of a guest register in two different host registers
when that is useful. That way, the extra register we write to
here can be read by a later instruction, saving us from
having to perform the same conversion again.
Preparation for following commits.

This commit intentionally doesn't touch paired stores,
since paired stores are supposed to flush to zero.
(Consistent with Jit64.)
Needed because the next commit will make RW clobber flags.
Our old conversion approach became a lot more inaccurate when
enabling flush-to-zero, to the point of obviously breaking games.
If we can prove that FCVT will provide a correct conversion,
we can use FCVT. This makes the common case a bit faster
and the less likely cases (unfortunately including zero,
which FCVT actually can convert correctly) a bit slower.
I haven't observed this breaking any game, but it didn't match
the behavior of the interpreter as far as I could tell from
reading the code, in that denormals weren't being flushed.
JitArm64: Set flush-to-zero/rounding mode and improve float/double conversion accuracy
Jit: Optimize block link queries by using hash tables
- Fix GeometryShaderGen bug.
- Load xrGetD3D11GraphicsRequirementsKHR with xrGetInstanceProcAddr.
- Change Swapchain Format to SRGB.
- Use Quaternion type in OpenXR.cpp.
- Add OpenXR subproject to needed files.
- Frustum was applied to view matrix with z_near and z_far at 0
- Side effect: no more black behind 180º
jordan-woyak pushed a commit that referenced this pull request Jun 27, 2022
Before, both 1441 and 147f would disassemble as `lsr $acc0, #1`, when the second should be `lsr $acc0, #-1`, and both 14c1 and 14ff would be `asr $acc0, #1` when the second should be `asr $acc0, #-1`. I'm not entirely sure whether the minus signs actually make sense here, but this change is consistent with the assembler so that's an improvement at least.

devkitPro previously changed the formatting to not require negative signs for lsr and asr; this is probably something we should do in the future: devkitPro/gamecube-tools@8a65c85

This fixes the HermesText and HermesBinary tests (HermesText already wrote `lsr $ACC0, #-5`, so this is consistent with what it used before.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.