Openxr: Update repository for working on it. #1

Uklosk · 2021-04-24T20:41:09Z

This pull request is for the issue dolphin-emu#8380. I have just updated your work to master and latest OpenXR version.

This value will be used in the register description; so expose it in a way that can be re-used instead of calculating it in 2 places later.

Additional changes: - For TevStageCombiner's ColorCombiner and AlphaCombiner, op/comparison and scale/compare_mode have been split as there are different meanings and enums if bias is set to compare. (Shift has also been renamed to scale) - In TexMode0, min_filter has been split into min_mip and min_filter. - In TexImage1, image_type is now cache_manually_managed. - The unused bit in GenMode is now exposed. - LPSize's lineaspect is now named adjust_for_aspect_ratio.

BPMEM_TEV_COLOR_ENV + 6 (0xC6) was missing due to a typo. BPMEM_BP_MASK (0xFE) does not lend itself well to documentation with the current FIFO analyzer implementation (since it requires remembering the values in BP memory) but still shouldn't be treated as unknown. BPMEM_TX_SETMODE0_4 and BPMEM_TX_SETMODE1_4 (0xA4-0xAB) were missing entirely.

Graphics refactoring + add names and descriptions in FIFO analyzer

Fixes issue 11393. The problem is that left and top make no sense for a width by height array; they only make sense in a larger array where from which a smaller part is extracted. Thus, the overall size of the array is provided to CopyRegion in addition to the sub-region. EncodeXFB already handles the extraction, so CopyRegion's only use there is to resize the image (and thus no sub-region is provided).

Fixes Jimmie Johnson's Anything with an Engine.

…er-skip-scan Android: Move "skip scanning" logic to MainPresenter

Android: Don't save settings immediately after switching platform tab

Breakpoints: Change icon when disabled

The async operations may contain references to class members, so any running async operations must end before destroying the class.

The loop in WIARVZFileReader::Chunk::Read could terminate prematurely if the size argument was smaller than the size of an exception list which had only been partially loaded.

Avoiding desyncs is more important than honoring what the user specified on the command line.

Fixes netplay and movie overrides of SYSCONF settings not applying.

Whether the custom RTC setting is enabled shouldn't in itself affect determinism (as long as the actual RTC value is properly synced). Alters the logic added in 4b2906c. I'm not entirely certain that this is correct, but the current code doesn't really make sense to me... If we need to force the RTC bias to 0 when custom RTC is enabled, why don't we need to do it when custom RTC is disabled? The code for getting the host system's current time doesn't contain any special handling for the guest's RTC bias as far as I can tell.

When the dividend is known at compile time, we can eliminate some of the branching and precompute the result for the overflow case. Before: B8 54 D3 E6 02 mov eax,2E6D354h 85 FF test edi,edi 74 0C je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0C jne normal_path 83 FF FF cmp edi,0FFFFFFFFh 75 07 jne normal_path overflow: C1 F8 1F sar eax,1Fh 8B F8 mov edi,eax EB 05 jmp done normal_path: 99 cdq F7 FF idiv eax,edi 8B F8 mov edi,eax done: After: 85 FF test edi,edi 75 04 jne normal_path 33 FF xor edi,edi EB 0A jmp done normal_path: B8 54 D3 E6 02 mov eax,2E6D354h 99 cdq F7 FF idiv eax,edi 8B F8 mov edi,eax done: Fairly common with constant dividend of zero. Non-zero values occur frequently in Ocarina of Time Master Quest.

Zero divided by any number is still zero. For whatever reason, this case shows up frequently too. Before: B8 00 00 00 00 mov eax,0 85 F6 test esi,esi 74 0C je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0C jne normal_path 83 FE FF cmp esi,0FFFFFFFFh 75 07 jne normal_path overflow: C1 F8 1F sar eax,1Fh 8B F8 mov edi,eax EB 05 jmp done normal_path: 99 cdq F7 FE idiv eax,esi 8B F8 mov edi,eax done: After: Nothing!

Add a function to calculate the magic constants required to optimize signed 32-bit division. Since this optimization is not exclusive to any particular architecture, JitCommon seemed like a good place to put this.

Optimize division by a constant into multiplication. This method is also used by GCC and LLVM. We also add optimized paths for divisors 0, 1, and -1, because they don't work using this method. They don't occur very often, but are necessary for correctness. - Division by 1 Before: 41 BF 01 00 00 00 mov r15d,1 41 8B C5 mov eax,r13d 45 85 FF test r15d,r15d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FF FF cmp r15d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F8 mov r15d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FF idiv eax,r15d 44 8B F8 mov r15d,eax done: After: 45 8B FD mov r15d,r13d - Division by 30307 Before: 41 BA 63 76 00 00 mov r10d,7663h 41 8B C5 mov eax,r13d 45 85 D2 test r10d,r10d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FA FF cmp r10d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B C0 mov r8d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FA idiv eax,r10d 44 8B C0 mov r8d,eax done: After: 49 63 C5 movsxd rax,r13d 48 69 C0 65 6B 32 45 imul rax,rax,45326B65h 4C 8B C0 mov r8,rax 48 C1 E8 3F shr rax,3Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax - Division by 30323 Before: 41 BA 73 76 00 00 mov r10d,7673h 41 8B C5 mov eax,r13d 45 85 D2 test r10d,r10d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FA FF cmp r10d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B C0 mov r8d,eax EB 07 jmp 00000000161737E7 normal_path: 99 cdq 41 F7 FA idiv eax,r10d 44 8B C0 mov r8d,eax done: After: 49 63 C5 movsxd rax,r13d 4C 69 C0 19 25 52 8A imul r8,rax,0FFFFFFFF8A522519h 49 C1 E8 20 shr r8,20h 44 03 C0 add r8d,eax C1 E8 1F shr eax,1Fh 41 C1 F8 0E sar r8d,0Eh 44 03 C0 add r8d,eax

When the multiplier is positive (which is the most common case), we can generate slightly better code. - Division by 30307 Before: 49 63 C5 movsxd rax,r13d 48 69 C0 65 6B 32 45 imul rax,rax,45326B65h 4C 8B C0 mov r8,rax 48 C1 E8 3F shr rax,3Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax After: 49 63 C5 movsxd rax,r13d 4C 69 C0 65 6B 32 45 imul r8,rax,45326B65h C1 E8 1F shr eax,1Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax

Power-of-two divisors can be done more elegantly, so handle them separately. - Division by 4 Before: 41 BD 04 00 00 00 mov r13d,4 41 8B C0 mov eax,r8d 45 85 ED test r13d,r13d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FD FF cmp r13d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B E8 mov r13d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FD idiv eax,r13d 44 8B E8 mov r13d,eax done: After: 45 85 C0 test r8d,r8d 45 8D 68 03 lea r13d,[r8+3] 45 0F 49 E8 cmovns r13d,r8d 41 C1 FD 02 sar r13d,2

...and let's optimize a divisor of 2 ever so slightly for good measure. I wouldn't have bothered, but most GameCube games seem to hit this on launch. - Division by 2 Before: 41 BE 02 00 00 00 mov r14d,2 41 8B C2 mov eax,r10d 45 85 F6 test r14d,r14d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FE FF cmp r14d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F0 mov r14d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FE idiv eax,r14d 44 8B F0 mov r14d,eax done: After: 45 8B F2 mov r14d,r10d 41 C1 EE 1F shr r14d,1Fh 45 03 F2 add r14d,r10d 41 D1 FE sar r14d,1

Both the normal path and the overflow path end with the same instruction, so their tails can be merged. Before: 41 8B C7 mov eax,r15d 45 85 C0 test r8d,r8d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 F8 FF cmp r8d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F0 mov r14d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 F8 idiv eax,r8d 44 8B F0 mov r14d,eax done: After: 41 8B C7 mov eax,r15d 45 85 C0 test r8d,r8d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0B jne normal_path 41 83 F8 FF cmp r8d,0FFFFFFFFh 75 05 jne normal_path overflow: C1 F8 1F sar eax,1Fh EB 04 jmp done normal_path: 99 cdq 41 F7 F8 idiv eax,r8d done: 44 8B F0 mov r14d,eax

@MerryMage

Suggested by @MerryMage. Thanks! Co-authored-by: merry <[email protected]>

Namespace-scope variable was only used in one function so move it there

Loop index int i was being compared against GetControllerCount() which returned a size_t. This was the only place GetControllerCount() was called from so the change of return type doesn't disturb anything else. Changing the loop index to size_t wouldn't work as well since it's passed into GetController(), which takes an int and is called from many places, so it would need a cast anyway on an already busy line.

…er-alignedwidth SW: Fix alignedWidth in TextureEncoder

When the interpreter writes to a discarded register, its type must be changed so that it is no longer considered discarded. Fixes a 62ce1c7 regression.

Jits: Fix interpreter fallback handling of discarded registers

Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix other games that have problems with float/paired instructions in JitArm64, but I haven't tested any.

This simplifies some of the following commits. It does require an extra register, but hey, we have 32 of them. Something I think would be nice to add to the register cache in the future is the ability to keep both the single and double version of a guest register in two different host registers when that is useful. That way, the extra register we write to here can be read by a later instruction, saving us from having to perform the same conversion again.

Preparation for following commits. This commit intentionally doesn't touch paired stores, since paired stores are supposed to flush to zero. (Consistent with Jit64.)

Needed because the next commit will make RW clobber flags.

Our old conversion approach became a lot more inaccurate when enabling flush-to-zero, to the point of obviously breaking games.

If we can prove that FCVT will provide a correct conversion, we can use FCVT. This makes the common case a bit faster and the less likely cases (unfortunately including zero, which FCVT actually can convert correctly) a bit slower.

I haven't observed this breaking any game, but it didn't match the behavior of the interpreter as far as I could tell from reading the code, in that denormals weren't being flushed.

JitArm64: Set flush-to-zero/rounding mode and improve float/double conversion accuracy

Jit: Optimize block link queries by using hash tables

- Fix GeometryShaderGen bug. - Load xrGetD3D11GraphicsRequirementsKHR with xrGetInstanceProcAddr. - Change Swapchain Format to SRGB. - Use Quaternion type in OpenXR.cpp. - Add OpenXR subproject to needed files.

…formats

- Frustum was applied to view matrix with z_near and z_far at 0 - Side effect: no more black behind 180º

Before, both 1441 and 147f would disassemble as `lsr $acc0, #1`, when the second should be `lsr $acc0, #-1`, and both 14c1 and 14ff would be `asr $acc0, #1` when the second should be `asr $acc0, #-1`. I'm not entirely sure whether the minus signs actually make sense here, but this change is consistent with the assembler so that's an improvement at least. devkitPro previously changed the formatting to not require negative signs for lsr and asr; this is probably something we should do in the future: devkitPro/gamecube-tools@8a65c85 This fixes the HermesText and HermesBinary tests (HermesText already wrote `lsr $ACC0, #-5`, so this is consistent with what it used before.)

Pokechu22 and others added 30 commits March 6, 2021 19:27

Use XFMEM_REGISTERS_START/END in XFRegWritten and LoadXFReg

81b84a5

Rename BPMEM_EFB_BR to BPMEM_EFB_WH

762fe33

Fix typo with ztex2 op in UseVertexDepthRange

f2bea67

Add FogParam0::FloatValue and FogParam3::FloatValue

db8ced7

This value will be used in the register description; so expose it in a way that can be re-used instead of calculating it in 2 places later.

Merge pull request dolphin-emu#9497 from Pokechu22/better-fifo-analyzer

089250f

Graphics refactoring + add names and descriptions in FIFO analyzer

Software: Invert backface test when viewport is positive

5b1c632

Fixes Jimmie Johnson's Anything with an Engine.

Merge pull request dolphin-emu#9569 from JosJuice/android-mainpresent…

a5555c6

…er-skip-scan Android: Move "skip scanning" logic to MainPresenter

Merge pull request dolphin-emu#9568 from JosJuice/android-delay-save-tab

ac687bc

Android: Don't save settings immediately after switching platform tab

Merge pull request dolphin-emu#9562 from sepalani/dis-icons

6119854

Breakpoints: Change icon when disabled

VolumeVerifier: Fix potential crash when cancelling

96ebf01

The async operations may contain references to class members, so any running async operations must end before destroying the class.

DiscIO: Fix reading certain WIA chunks with many exceptions

14bfc0b

The loop in WIARVZFileReader::Chunk::Read could terminate prematurely if the size argument was smaller than the size of an exception list which had only been partially loaded.

NetPlay: Sync more settings

a9862b5

Config: Give Movie and Netplay higher priority than CommandLine

359ed53

Avoiding desyncs is more important than honoring what the user specified on the command line.

Boot: Initialize Wii root before saving SYSCONF file

46dbb45

Fixes netplay and movie overrides of SYSCONF settings not applying.

JitCommon: Signed 32-bit division magic constants

5bb8798

Add a function to calculate the magic constants required to optimize signed 32-bit division. Since this optimization is not exclusive to any particular architecture, JitCommon seemed like a good place to put this.

Jit64: divwx - Simplify divisor == -1 case

defe716

Suggested by @MerryMage. Thanks! Co-authored-by: merry <[email protected]>

Arm64Gen: Remove unused constant

dffcbcc

Arm64Gen: Move constant and make constexpr

686314b

Namespace-scope variable was only used in one function so move it there

leoetlino and others added 2 commits April 24, 2021 20:13

Merge pull request dolphin-emu#9646 from PatrickFerry/sw-textureencod…

1c6232e

…er-alignedwidth SW: Fix alignedWidth in TextureEncoder

Translation resources sync with Transifex

0f563ff

Uklosk changed the title ~~Openxr~~ Openxr: Update repository for working on it. Apr 24, 2021

JosJuice and others added 24 commits April 25, 2021 13:01

Jits: Fix interpreter fallback handling of discarded registers

b3b5016

When the interpreter writes to a discarded register, its type must be changed so that it is no longer considered discarded. Fixes a 62ce1c7 regression.

Merge pull request dolphin-emu#9644 from JosJuice/jit-fallback-discard

aa3a96f

Jits: Fix interpreter fallback handling of discarded registers

Implement ArmFPURoundMode.cpp

f96ee47

Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix other games that have problems with float/paired instructions in JitArm64, but I haven't tested any.

JitArm64: Factor out single/double conversion code to functions

949686b

Preparation for following commits. This commit intentionally doesn't touch paired stores, since paired stores are supposed to flush to zero. (Consistent with Jit64.)

JitArm64: Call RW before FCMPE in fselx

39eccf6

Needed because the next commit will make RW clobber flags.

JitArm64: Use accurate single/double conversions

6e0a587

Our old conversion approach became a lot more inaccurate when enabling flush-to-zero, to the point of obviously breaking games.

JitArm64: Optimize ConvertDoubleToSingle

28e4869

JitArm64: Optimize ConvertSingleToDouble, part 1

018e247

JitArm64: Optimize ConvertSingleToDouble, part 2

1d106ce

If we can prove that FCVT will provide a correct conversion, we can use FCVT. This makes the common case a bit faster and the less likely cases (unfortunately including zero, which FCVT actually can convert correctly) a bit slower.

JitArm64: Skip accurate single/double conversion if store-safe

2a9d887

JitArm64: Add unit tests for single/double conversion

9d6263f

JitArm64: Use ConvertSingleToDoubleLower in RW when faster

54451ac

JitArm64: Fix frspx with single precision source

69c14d6

I haven't observed this breaking any game, but it didn't match the behavior of the interpreter as far as I could tell from reading the code, in that denormals weren't being flushed.

Merge pull request dolphin-emu#9458 from JosJuice/arm-fpu-round

5da85f3

JitArm64: Set flush-to-zero/rounding mode and improve float/double conversion accuracy

Merge pull request dolphin-emu#9666 from leoetlino/jit-block-hashtable

ac679eb

Jit: Optimize block link queries by using hash tables

OpenXR: Basic functionality.

5199c6a

OpenXR: Add eye view matrices to geometry shader.

600f0a5

OpenXR: Allow cmake to disable OpenXR support.

f85c1ea

OpenXR: Use xrGetD3D11GraphicsRequirementsKHR.

3e5107a

Update OpenXR SDK to 1.0.15

493538e

Complete rebase and OpenXR update:

f301375

- Fix GeometryShaderGen bug. - Load xrGetD3D11GraphicsRequirementsKHR with xrGetInstanceProcAddr. - Change Swapchain Format to SRGB. - Use Quaternion type in OpenXR.cpp. - Add OpenXR subproject to needed files.

Fixed OpenXR Init crash when runtime not available

57c1a15

Add DXGI_FORMAT_R8G8B8A8_UNORM to the list of supported XR Swapchain …

2ed45c8

…formats

Uklosk force-pushed the openxr branch from 08837db to 2ed45c8 Compare April 25, 2021 18:15

OpenXR: Fix everything black.

33ea477

- Frustum was applied to view matrix with z_near and z_far at 0 - Side effect: no more black behind 180º

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openxr: Update repository for working on it. #1

Openxr: Update repository for working on it. #1

Uklosk commented Apr 24, 2021

Openxr: Update repository for working on it. #1

Are you sure you want to change the base?

Openxr: Update repository for working on it. #1

Conversation

Uklosk commented Apr 24, 2021