Add fence delay options at capture time #1155

marius-pelegrin-arm · 2023-06-20T13:24:11Z

Add two capture options:

Fence Query Delay
Fence Query Delay Unit
Fence Query Timeout Threshold

When capturing with intent to replay on a "slow" platform, we may want to force the captured application to adopt a "slow platform behavior" - for example, not re-using some resources because we know that at replay time command buffers will take a lot longer to be executed.

To emulate this, this commit implement the notion of "fence query delay". The idea is that when the captured application queries a VkFence using vkGetFenceStatus or vkWaitForFences with timeout under the "timeout threshold", we transmit the call to the driver, but if the fence is ready, we do not return VK_SUCCESS directly. Instead, we return VK_NOT_READY for "a certain amount of time" specified by the "delay" and "delay unit" capture options.

By default this amount of time is 0 call, which means no delay at all: we give to the application the result from the driver directly. This amount of time can be specified in number of calls or in number of frames. The unit (calls/frames) is specified by fence query delay unit, and the number by fence query delay.

In addition, this commit also fixes the --sgfs and --sgfr options that were not parsed correctly, and the behavior of vkGetFenceStatus and vkWaitForFences at replay time to not "spam" vkGetFenceStatus
if the call was successful at capture time but not at replay time.

ci-tester-lunarg · 2023-06-20T13:24:14Z

Author arm-marius-pelegrin not on autobuild list. Waiting for curator authorization before starting CI build.

CLAassistant · 2023-06-20T13:24:26Z

All committers have signed the CLA.

ci-tester-lunarg · 2023-06-20T15:58:37Z

CI gfxreconstruct build queued with queue ID 20199.

ci-tester-lunarg · 2023-06-20T16:47:39Z

CI gfxreconstruct build # 2873 running.

ci-tester-lunarg · 2023-06-20T17:46:27Z

CI gfxreconstruct build # 2873 passed.

charles-lunarg · 2023-12-05T00:35:32Z

Less of a question of implementation and more of design.

The purpose of the fence delay is to force the application being recorded to issue multiple vkGetFenceStatus calls, that way during replay if a single vkGetFenceStatus call returns VK_NOT_READY, the subsequent ones may succeed?

This feels like a non-deterministic solution because there is no guarantee that the number of queries is sufficient, and because it is encoded directly into the capture, it doesn't have the ability to arbitrarily repeat calls vkGetFenceStatus.

Looking at VulkanReplayConsumerBase::OverrideGetFenceStatus, I see that the replay will retry calling vkGetFenceStatus if the return code isn't VK_SUCCESS or VK_NOT_READY.

Would it make more sense to loop until the return code is VK_SUCCESS with a possible "max retry count"? My thought is to not modify the capture file at all, and rather make replay handle slow devices more gracefully.

per-mathisen-arm · 2023-12-05T11:02:26Z

Some applications use vkGetFenceStatus to query when a resource can be reused. If this reuse happens in the same frame as the original usage, then on replay the resource might not yet be ready for reuse by the time we get to this point in time again. As you point out, the replayer will in this case wait for the call to return VK_SUCCESS, and that is normally fine. However, on slower platforms, this wait changes the workload dramatically, and makes it unusable for performance investigations. The purpose of this patch is to try to defeat such over-eager resource reuse - if the app does a one-off status check it will be rebuffed, while if it does a busy loop until ready it will quickly bypass this mechanism thus avoiding hangs.

bradgrantham-lunarg · 2023-12-05T18:31:30Z

@marius-pelegrin-arm @per-mathisen-arm I'm reluctant to merge this to dev. One of our principles is to store the application's Vulkan calls as close to verbatim as possible. If someone runs convert on the output of capturing with this PR with this option enabled, the recorded commands will look like the capture had to Get or Wait a bunch of times on the captured GPU, but that wasn't actually the case.

We do modify the stream if necessary, like for #1160. There would otherwise be missing information from the capture that would prevent replay. We have attempted otherwise to store the app's calls verbatim, and then (as you probably know) add additional query data or system information as Metadata Commands.

I don't really understand how the situation describing makes the replay unusable for performance investigations because on replay it looks like we don't create more Waits or Gets. If you mean wall-clock performance, we don't make any statements about that anyway. We do try to make the replay GPU workload (e.g. shader core operations) as similar as possible to the captured workload - is replay not doing that for Fences? I would consider that a bug.

Another possibility is to just add a delay of some kind in replay before calling Wait or Get if that's important on a very slow emulation architecture. It's much more acceptable to alter the stream on replay, since that's after recording the original application's stream in the file, and we know replay is used on wildly different environments from capture.

If you really want to force the app to perform multiple Query or Wait calls in anticipation of running on a GPU emulation or simulation environment which has a very different performance level from the CPU, I think what you're discussing feels like an emulation to apply to the GPU being captured.

If a user wanted to limit a GPU's extensions or features, we'd tell that user to use the Profiles layer to do that, rather than implementing in GFXR capture layer. I think this is the same category of functionality. I think if you want to force timing behavior dependent on CPU code it's better suited to a layer in which you can specify things like delays. @per-mathisen-arm we can talk directly about that if desired.

charles-lunarg · 2023-12-06T17:44:28Z

Wanted to add a note that this PR would need to be amended to support the changes made in #1265, namely that any capture options added by this PR need to be accounted for in the logic that keeps track of non-default capture options.

ci-tester-lunarg · 2024-01-25T15:04:54Z

CI gfxreconstruct build queued with queue ID 122282.

ci-tester-lunarg · 2024-01-25T15:05:01Z

CI gfxreconstruct build # 3690 running.

ci-tester-lunarg · 2024-01-25T15:54:47Z

CI gfxreconstruct build queued with queue ID 122320.

ci-tester-lunarg · 2024-01-25T16:59:06Z

CI gfxreconstruct build # 3692 running.

ci-tester-lunarg · 2024-01-25T18:43:43Z

CI gfxreconstruct build # 3692 failed.

bradgrantham-lunarg · 2024-01-25T19:13:43Z

3692 appears to have been an internal CI machine failure and will be re-run.

ci-tester-lunarg · 2024-01-25T19:17:37Z

CI gfxreconstruct build queued with queue ID 122458.

ci-tester-lunarg · 2024-01-25T20:36:44Z

CI gfxreconstruct build # 3695 running.

ci-tester-lunarg · 2024-01-25T22:22:17Z

CI gfxreconstruct build # 3695 passed.

ci-tester-lunarg · 2024-09-18T12:32:28Z

CI gfxreconstruct build queued with queue ID 258768.

ci-tester-lunarg · 2024-09-18T12:35:01Z

CI gfxreconstruct build # 4852 running.

ci-tester-lunarg · 2024-09-18T12:43:09Z

CI gfxreconstruct build queued with queue ID 258783.

ci-tester-lunarg · 2024-09-18T13:02:00Z

CI gfxreconstruct build # 4853 running.

ci-tester-lunarg · 2024-09-18T13:04:47Z

CI gfxreconstruct build queued with queue ID 258801.

ci-tester-lunarg · 2024-09-18T13:37:20Z

CI gfxreconstruct build # 4854 running.

ci-tester-lunarg · 2024-09-18T14:06:45Z

CI gfxreconstruct build # 4854 failed.

lunarpapillo · 2024-09-18T15:59:29Z

The LunarG CI failures seem to be due to a crashed Pixel device. Restarting.

ci-tester-lunarg · 2024-09-18T15:59:39Z

CI gfxreconstruct build queued with queue ID 259009.

ci-tester-lunarg · 2024-09-18T16:33:47Z

CI gfxreconstruct build # 4856 running.

ci-tester-lunarg · 2024-09-18T17:05:02Z

CI gfxreconstruct build # 4856 passed.

ci-tester-lunarg · 2024-09-19T12:30:36Z

CI gfxreconstruct build queued with queue ID 259632.

ci-tester-lunarg · 2024-09-19T12:35:01Z

CI gfxreconstruct build # 4865 running.

ci-tester-lunarg · 2024-09-19T13:32:00Z

CI gfxreconstruct build # 4865 failed.

lunarpapillo · 2024-09-19T23:58:49Z

Last LunarG CI failure seemed to be due to X server crash. Re-running.

ci-tester-lunarg · 2024-09-19T23:59:02Z

CI gfxreconstruct build queued with queue ID 260150.

ci-tester-lunarg · 2024-09-20T00:23:41Z

CI gfxreconstruct build # 4874 running.

ci-tester-lunarg · 2024-09-20T00:58:15Z

CI gfxreconstruct build # 4874 passed.

ci-tester-lunarg · 2025-01-03T11:53:08Z

CI gfxreconstruct build queued with queue ID 336319.

ci-tester-lunarg · 2025-01-03T11:55:01Z

CI gfxreconstruct build # 5682 running.

ci-tester-lunarg · 2025-01-03T12:05:53Z

CI gfxreconstruct build queued with queue ID 336336.

ci-tester-lunarg · 2025-01-03T12:24:58Z

CI gfxreconstruct build # 5683 running.

Add three capture options: - Fence Query Delay - Fence Query Delay Unit - Fence Query Timeout Threshold When capturing with intent to replay on a "slow" platform, we may want to force the captured application to adopt a "slow platform behavior" - for example, not re-using some resources because we know that at replay time command buffers will take a lot longer to be executed. To emulate this, this commit implement the notion of "fence query delay". The idea is that when the captured application queries a `VkFence` using `vkGetFenceStatus` or `vkWaitForFences` with `timeout` under the "timeout threshold", we transmit the call to the driver, but if the fence is ready, we do not return `VK_SUCCESS` directly. Instead, we return `VK_NOT_READY` for "a certain amount of time" specified by the "delay" and "delay unit" capture options. By default this amount of time is `0 call`, which means no delay at all: we give to the application the result from the driver directly. This amount of time can be specified in number of `calls` or in number of `frames`. The unit (calls/frames) is specified by fence query delay unit, and the number by fence query delay. In addition, this commit also fixes the `--sgfs` and `--sgfr` options that were not parsed correctly, and the behavior of `vkGetFenceStatus` and `vkWaitForFences` at replay time to not "spam" `vkGetFenceStatus` if the call was successful at capture time but not at replay time. Change-Id: I9adff58c364b6a08fde2a95502e3b79152e1cbdf

ci-tester-lunarg · 2025-01-03T12:34:23Z

CI gfxreconstruct build queued with queue ID 336356.

ci-tester-lunarg · 2025-01-03T12:59:14Z

CI gfxreconstruct build # 5684 running.

ci-tester-lunarg · 2025-01-03T13:34:09Z

CI gfxreconstruct build # 5684 passed.

naomipappe mentioned this pull request Sep 20, 2023

Adding the capture options to annotation for display with info tool #1265

Merged

bradgrantham-lunarg added the P2 A high-priority code maintenance issue or a functional problem that is recoverable or not a crash. label Oct 30, 2023

bradgrantham-lunarg assigned charles-lunarg Nov 30, 2023

bradgrantham-lunarg unassigned charles-lunarg Dec 5, 2023

marius-pelegrin-arm force-pushed the fence-query-delay branch from 9648442 to cfb7965 Compare January 25, 2024 15:04

marius-pelegrin-arm force-pushed the fence-query-delay branch from cfb7965 to 3ff41f9 Compare January 25, 2024 15:54

per-mathisen-arm mentioned this pull request Jul 9, 2024

--sync for Capture #1043

Open

marius-pelegrin-arm force-pushed the fence-query-delay branch from 3ff41f9 to b4adecc Compare September 18, 2024 12:32

marius-pelegrin-arm force-pushed the fence-query-delay branch from b4adecc to fc3e9e1 Compare September 18, 2024 12:43

marius-pelegrin-arm force-pushed the fence-query-delay branch from fc3e9e1 to cdb8de8 Compare September 18, 2024 13:04

marius-pelegrin-arm force-pushed the fence-query-delay branch from cdb8de8 to 8ce01c3 Compare September 19, 2024 12:30

marius-pelegrin-arm force-pushed the fence-query-delay branch from 8ce01c3 to 21db76d Compare January 3, 2025 11:53

marius-pelegrin-arm force-pushed the fence-query-delay branch from 21db76d to 15bb5d5 Compare January 3, 2025 12:05

marius-pelegrin-arm force-pushed the fence-query-delay branch from 15bb5d5 to 2bf7eb1 Compare January 3, 2025 12:34

Add fence delay options at capture time #1155

Are you sure you want to change the base?

Add fence delay options at capture time #1155

Conversation

marius-pelegrin-arm commented Jun 20, 2023 • edited Loading

ci-tester-lunarg commented Jun 20, 2023

CLAassistant commented Jun 20, 2023 • edited Loading

ci-tester-lunarg commented Jun 20, 2023

ci-tester-lunarg commented Jun 20, 2023

ci-tester-lunarg commented Jun 20, 2023

charles-lunarg commented Dec 5, 2023

per-mathisen-arm commented Dec 5, 2023

bradgrantham-lunarg commented Dec 5, 2023

charles-lunarg commented Dec 6, 2023

ci-tester-lunarg commented Jan 25, 2024

ci-tester-lunarg commented Jan 25, 2024

ci-tester-lunarg commented Jan 25, 2024

ci-tester-lunarg commented Jan 25, 2024

ci-tester-lunarg commented Jan 25, 2024

bradgrantham-lunarg commented Jan 25, 2024 • edited Loading

ci-tester-lunarg commented Jan 25, 2024

ci-tester-lunarg commented Jan 25, 2024

ci-tester-lunarg commented Jan 25, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

lunarpapillo commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 18, 2024

ci-tester-lunarg commented Sep 19, 2024

ci-tester-lunarg commented Sep 19, 2024

ci-tester-lunarg commented Sep 19, 2024

lunarpapillo commented Sep 19, 2024

ci-tester-lunarg commented Sep 19, 2024

ci-tester-lunarg commented Sep 20, 2024

ci-tester-lunarg commented Sep 20, 2024

ci-tester-lunarg commented Jan 3, 2025

ci-tester-lunarg commented Jan 3, 2025

ci-tester-lunarg commented Jan 3, 2025

ci-tester-lunarg commented Jan 3, 2025

ci-tester-lunarg commented Jan 3, 2025

ci-tester-lunarg commented Jan 3, 2025

ci-tester-lunarg commented Jan 3, 2025

marius-pelegrin-arm commented Jun 20, 2023 •

edited

Loading

CLAassistant commented Jun 20, 2023 •

edited

Loading

bradgrantham-lunarg commented Jan 25, 2024 •

edited

Loading