forked from dolphin-emu/dolphin
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openxr: Update repository for working on it. #1
Open
Uklosk
wants to merge
1,974
commits into
jordan-woyak:openxr
Choose a base branch
from
Uklosk:openxr
base: openxr
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This value will be used in the register description; so expose it in a way that can be re-used instead of calculating it in 2 places later.
Additional changes: - For TevStageCombiner's ColorCombiner and AlphaCombiner, op/comparison and scale/compare_mode have been split as there are different meanings and enums if bias is set to compare. (Shift has also been renamed to scale) - In TexMode0, min_filter has been split into min_mip and min_filter. - In TexImage1, image_type is now cache_manually_managed. - The unused bit in GenMode is now exposed. - LPSize's lineaspect is now named adjust_for_aspect_ratio.
BPMEM_TEV_COLOR_ENV + 6 (0xC6) was missing due to a typo. BPMEM_BP_MASK (0xFE) does not lend itself well to documentation with the current FIFO analyzer implementation (since it requires remembering the values in BP memory) but still shouldn't be treated as unknown. BPMEM_TX_SETMODE0_4 and BPMEM_TX_SETMODE1_4 (0xA4-0xAB) were missing entirely.
Graphics refactoring + add names and descriptions in FIFO analyzer
Fixes issue 11393. The problem is that left and top make no sense for a width by height array; they only make sense in a larger array where from which a smaller part is extracted. Thus, the overall size of the array is provided to CopyRegion in addition to the sub-region. EncodeXFB already handles the extraction, so CopyRegion's only use there is to resize the image (and thus no sub-region is provided).
Fixes Jimmie Johnson's Anything with an Engine.
…er-skip-scan Android: Move "skip scanning" logic to MainPresenter
Android: Don't save settings immediately after switching platform tab
Breakpoints: Change icon when disabled
The async operations may contain references to class members, so any running async operations must end before destroying the class.
The loop in WIARVZFileReader::Chunk::Read could terminate prematurely if the size argument was smaller than the size of an exception list which had only been partially loaded.
Avoiding desyncs is more important than honoring what the user specified on the command line.
Fixes netplay and movie overrides of SYSCONF settings not applying.
Whether the custom RTC setting is enabled shouldn't in itself affect determinism (as long as the actual RTC value is properly synced). Alters the logic added in 4b2906c. I'm not entirely certain that this is correct, but the current code doesn't really make sense to me... If we need to force the RTC bias to 0 when custom RTC is enabled, why don't we need to do it when custom RTC is disabled? The code for getting the host system's current time doesn't contain any special handling for the guest's RTC bias as far as I can tell.
When the dividend is known at compile time, we can eliminate some of the branching and precompute the result for the overflow case. Before: B8 54 D3 E6 02 mov eax,2E6D354h 85 FF test edi,edi 74 0C je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0C jne normal_path 83 FF FF cmp edi,0FFFFFFFFh 75 07 jne normal_path overflow: C1 F8 1F sar eax,1Fh 8B F8 mov edi,eax EB 05 jmp done normal_path: 99 cdq F7 FF idiv eax,edi 8B F8 mov edi,eax done: After: 85 FF test edi,edi 75 04 jne normal_path 33 FF xor edi,edi EB 0A jmp done normal_path: B8 54 D3 E6 02 mov eax,2E6D354h 99 cdq F7 FF idiv eax,edi 8B F8 mov edi,eax done: Fairly common with constant dividend of zero. Non-zero values occur frequently in Ocarina of Time Master Quest.
Zero divided by any number is still zero. For whatever reason, this case shows up frequently too. Before: B8 00 00 00 00 mov eax,0 85 F6 test esi,esi 74 0C je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0C jne normal_path 83 FE FF cmp esi,0FFFFFFFFh 75 07 jne normal_path overflow: C1 F8 1F sar eax,1Fh 8B F8 mov edi,eax EB 05 jmp done normal_path: 99 cdq F7 FE idiv eax,esi 8B F8 mov edi,eax done: After: Nothing!
Add a function to calculate the magic constants required to optimize signed 32-bit division. Since this optimization is not exclusive to any particular architecture, JitCommon seemed like a good place to put this.
Optimize division by a constant into multiplication. This method is also used by GCC and LLVM. We also add optimized paths for divisors 0, 1, and -1, because they don't work using this method. They don't occur very often, but are necessary for correctness. - Division by 1 Before: 41 BF 01 00 00 00 mov r15d,1 41 8B C5 mov eax,r13d 45 85 FF test r15d,r15d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FF FF cmp r15d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F8 mov r15d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FF idiv eax,r15d 44 8B F8 mov r15d,eax done: After: 45 8B FD mov r15d,r13d - Division by 30307 Before: 41 BA 63 76 00 00 mov r10d,7663h 41 8B C5 mov eax,r13d 45 85 D2 test r10d,r10d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FA FF cmp r10d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B C0 mov r8d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FA idiv eax,r10d 44 8B C0 mov r8d,eax done: After: 49 63 C5 movsxd rax,r13d 48 69 C0 65 6B 32 45 imul rax,rax,45326B65h 4C 8B C0 mov r8,rax 48 C1 E8 3F shr rax,3Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax - Division by 30323 Before: 41 BA 73 76 00 00 mov r10d,7673h 41 8B C5 mov eax,r13d 45 85 D2 test r10d,r10d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FA FF cmp r10d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B C0 mov r8d,eax EB 07 jmp 00000000161737E7 normal_path: 99 cdq 41 F7 FA idiv eax,r10d 44 8B C0 mov r8d,eax done: After: 49 63 C5 movsxd rax,r13d 4C 69 C0 19 25 52 8A imul r8,rax,0FFFFFFFF8A522519h 49 C1 E8 20 shr r8,20h 44 03 C0 add r8d,eax C1 E8 1F shr eax,1Fh 41 C1 F8 0E sar r8d,0Eh 44 03 C0 add r8d,eax
When the multiplier is positive (which is the most common case), we can generate slightly better code. - Division by 30307 Before: 49 63 C5 movsxd rax,r13d 48 69 C0 65 6B 32 45 imul rax,rax,45326B65h 4C 8B C0 mov r8,rax 48 C1 E8 3F shr rax,3Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax After: 49 63 C5 movsxd rax,r13d 4C 69 C0 65 6B 32 45 imul r8,rax,45326B65h C1 E8 1F shr eax,1Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax
Power-of-two divisors can be done more elegantly, so handle them separately. - Division by 4 Before: 41 BD 04 00 00 00 mov r13d,4 41 8B C0 mov eax,r8d 45 85 ED test r13d,r13d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FD FF cmp r13d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B E8 mov r13d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FD idiv eax,r13d 44 8B E8 mov r13d,eax done: After: 45 85 C0 test r8d,r8d 45 8D 68 03 lea r13d,[r8+3] 45 0F 49 E8 cmovns r13d,r8d 41 C1 FD 02 sar r13d,2
...and let's optimize a divisor of 2 ever so slightly for good measure. I wouldn't have bothered, but most GameCube games seem to hit this on launch. - Division by 2 Before: 41 BE 02 00 00 00 mov r14d,2 41 8B C2 mov eax,r10d 45 85 F6 test r14d,r14d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FE FF cmp r14d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F0 mov r14d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FE idiv eax,r14d 44 8B F0 mov r14d,eax done: After: 45 8B F2 mov r14d,r10d 41 C1 EE 1F shr r14d,1Fh 45 03 F2 add r14d,r10d 41 D1 FE sar r14d,1
Both the normal path and the overflow path end with the same instruction, so their tails can be merged. Before: 41 8B C7 mov eax,r15d 45 85 C0 test r8d,r8d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 F8 FF cmp r8d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F0 mov r14d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 F8 idiv eax,r8d 44 8B F0 mov r14d,eax done: After: 41 8B C7 mov eax,r15d 45 85 C0 test r8d,r8d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0B jne normal_path 41 83 F8 FF cmp r8d,0FFFFFFFFh 75 05 jne normal_path overflow: C1 F8 1F sar eax,1Fh EB 04 jmp done normal_path: 99 cdq 41 F7 F8 idiv eax,r8d done: 44 8B F0 mov r14d,eax
Suggested by @MerryMage. Thanks! Co-authored-by: merry <[email protected]>
Namespace-scope variable was only used in one function so move it there
Loop index int i was being compared against GetControllerCount() which returned a size_t. This was the only place GetControllerCount() was called from so the change of return type doesn't disturb anything else. Changing the loop index to size_t wouldn't work as well since it's passed into GetController(), which takes an int and is called from many places, so it would need a cast anyway on an already busy line.
…er-alignedwidth SW: Fix alignedWidth in TextureEncoder
When the interpreter writes to a discarded register, its type must be changed so that it is no longer considered discarded. Fixes a 62ce1c7 regression.
Jits: Fix interpreter fallback handling of discarded registers
Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix other games that have problems with float/paired instructions in JitArm64, but I haven't tested any.
This simplifies some of the following commits. It does require an extra register, but hey, we have 32 of them. Something I think would be nice to add to the register cache in the future is the ability to keep both the single and double version of a guest register in two different host registers when that is useful. That way, the extra register we write to here can be read by a later instruction, saving us from having to perform the same conversion again.
Preparation for following commits. This commit intentionally doesn't touch paired stores, since paired stores are supposed to flush to zero. (Consistent with Jit64.)
Needed because the next commit will make RW clobber flags.
Our old conversion approach became a lot more inaccurate when enabling flush-to-zero, to the point of obviously breaking games.
If we can prove that FCVT will provide a correct conversion, we can use FCVT. This makes the common case a bit faster and the less likely cases (unfortunately including zero, which FCVT actually can convert correctly) a bit slower.
I haven't observed this breaking any game, but it didn't match the behavior of the interpreter as far as I could tell from reading the code, in that denormals weren't being flushed.
JitArm64: Set flush-to-zero/rounding mode and improve float/double conversion accuracy
Jit: Optimize block link queries by using hash tables
- Fix GeometryShaderGen bug. - Load xrGetD3D11GraphicsRequirementsKHR with xrGetInstanceProcAddr. - Change Swapchain Format to SRGB. - Use Quaternion type in OpenXR.cpp. - Add OpenXR subproject to needed files.
- Frustum was applied to view matrix with z_near and z_far at 0 - Side effect: no more black behind 180º
jordan-woyak
pushed a commit
that referenced
this pull request
Jun 27, 2022
Before, both 1441 and 147f would disassemble as `lsr $acc0, #1`, when the second should be `lsr $acc0, #-1`, and both 14c1 and 14ff would be `asr $acc0, #1` when the second should be `asr $acc0, #-1`. I'm not entirely sure whether the minus signs actually make sense here, but this change is consistent with the assembler so that's an improvement at least. devkitPro previously changed the formatting to not require negative signs for lsr and asr; this is probably something we should do in the future: devkitPro/gamecube-tools@8a65c85 This fixes the HermesText and HermesBinary tests (HermesText already wrote `lsr $ACC0, #-5`, so this is consistent with what it used before.)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request is for the issue dolphin-emu#8380. I have just updated your work to master and latest OpenXR version.