-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fence delay options at capture time #1155
base: dev
Are you sure you want to change the base?
Add fence delay options at capture time #1155
Conversation
Author arm-marius-pelegrin not on autobuild list. Waiting for curator authorization before starting CI build. |
CI gfxreconstruct build queued with queue ID 20199. |
CI gfxreconstruct build # 2873 running. |
CI gfxreconstruct build # 2873 passed. |
Less of a question of implementation and more of design. The purpose of the fence delay is to force the application being recorded to issue multiple This feels like a non-deterministic solution because there is no guarantee that the number of queries is sufficient, and because it is encoded directly into the capture, it doesn't have the ability to arbitrarily repeat calls Looking at VulkanReplayConsumerBase::OverrideGetFenceStatus, I see that the replay will retry calling Would it make more sense to loop until the return code is VK_SUCCESS with a possible "max retry count"? My thought is to not modify the capture file at all, and rather make replay handle slow devices more gracefully. |
Some applications use vkGetFenceStatus to query when a resource can be reused. If this reuse happens in the same frame as the original usage, then on replay the resource might not yet be ready for reuse by the time we get to this point in time again. As you point out, the replayer will in this case wait for the call to return VK_SUCCESS, and that is normally fine. However, on slower platforms, this wait changes the workload dramatically, and makes it unusable for performance investigations. The purpose of this patch is to try to defeat such over-eager resource reuse - if the app does a one-off status check it will be rebuffed, while if it does a busy loop until ready it will quickly bypass this mechanism thus avoiding hangs. |
@marius-pelegrin-arm @per-mathisen-arm I'm reluctant to merge this to We do modify the stream if necessary, like for #1160. There would otherwise be missing information from the capture that would prevent replay. We have attempted otherwise to store the app's calls verbatim, and then (as you probably know) add additional query data or system information as Metadata Commands. I don't really understand how the situation describing makes the replay unusable for performance investigations because on replay it looks like we don't create more Waits or Gets. If you mean wall-clock performance, we don't make any statements about that anyway. We do try to make the replay GPU workload (e.g. shader core operations) as similar as possible to the captured workload - is replay not doing that for Fences? I would consider that a bug. Another possibility is to just add a delay of some kind in replay before calling Wait or Get if that's important on a very slow emulation architecture. It's much more acceptable to alter the stream on replay, since that's after recording the original application's stream in the file, and we know replay is used on wildly different environments from capture. If you really want to force the app to perform multiple Query or Wait calls in anticipation of running on a GPU emulation or simulation environment which has a very different performance level from the CPU, I think what you're discussing feels like an emulation to apply to the GPU being captured. If a user wanted to limit a GPU's extensions or features, we'd tell that user to use the Profiles layer to do that, rather than implementing in GFXR capture layer. I think this is the same category of functionality. I think if you want to force timing behavior dependent on CPU code it's better suited to a layer in which you can specify things like delays. @per-mathisen-arm we can talk directly about that if desired. |
Wanted to add a note that this PR would need to be amended to support the changes made in #1265, namely that any capture options added by this PR need to be accounted for in the logic that keeps track of non-default capture options. |
9648442
to
cfb7965
Compare
CI gfxreconstruct build queued with queue ID 122282. |
CI gfxreconstruct build # 3690 running. |
cfb7965
to
3ff41f9
Compare
CI gfxreconstruct build queued with queue ID 122320. |
CI gfxreconstruct build # 3692 running. |
CI gfxreconstruct build # 3692 failed. |
3692 appears to have been an internal CI machine failure and will be re-run. |
CI gfxreconstruct build queued with queue ID 122458. |
CI gfxreconstruct build # 3695 running. |
CI gfxreconstruct build # 3695 passed. |
3ff41f9
to
b4adecc
Compare
CI gfxreconstruct build queued with queue ID 258768. |
CI gfxreconstruct build # 4852 running. |
b4adecc
to
fc3e9e1
Compare
CI gfxreconstruct build queued with queue ID 258783. |
CI gfxreconstruct build # 4853 running. |
fc3e9e1
to
cdb8de8
Compare
CI gfxreconstruct build queued with queue ID 258801. |
CI gfxreconstruct build # 4854 running. |
CI gfxreconstruct build # 4854 failed. |
The LunarG CI failures seem to be due to a crashed Pixel device. Restarting. |
CI gfxreconstruct build queued with queue ID 259009. |
CI gfxreconstruct build # 4856 running. |
CI gfxreconstruct build # 4856 passed. |
cdb8de8
to
8ce01c3
Compare
CI gfxreconstruct build queued with queue ID 259632. |
CI gfxreconstruct build # 4865 running. |
CI gfxreconstruct build # 4865 failed. |
Last LunarG CI failure seemed to be due to X server crash. Re-running. |
CI gfxreconstruct build queued with queue ID 260150. |
CI gfxreconstruct build # 4874 running. |
CI gfxreconstruct build # 4874 passed. |
8ce01c3
to
21db76d
Compare
CI gfxreconstruct build queued with queue ID 336319. |
CI gfxreconstruct build # 5682 running. |
21db76d
to
15bb5d5
Compare
CI gfxreconstruct build queued with queue ID 336336. |
CI gfxreconstruct build # 5683 running. |
Add three capture options: - Fence Query Delay - Fence Query Delay Unit - Fence Query Timeout Threshold When capturing with intent to replay on a "slow" platform, we may want to force the captured application to adopt a "slow platform behavior" - for example, not re-using some resources because we know that at replay time command buffers will take a lot longer to be executed. To emulate this, this commit implement the notion of "fence query delay". The idea is that when the captured application queries a `VkFence` using `vkGetFenceStatus` or `vkWaitForFences` with `timeout` under the "timeout threshold", we transmit the call to the driver, but if the fence is ready, we do not return `VK_SUCCESS` directly. Instead, we return `VK_NOT_READY` for "a certain amount of time" specified by the "delay" and "delay unit" capture options. By default this amount of time is `0 call`, which means no delay at all: we give to the application the result from the driver directly. This amount of time can be specified in number of `calls` or in number of `frames`. The unit (calls/frames) is specified by fence query delay unit, and the number by fence query delay. In addition, this commit also fixes the `--sgfs` and `--sgfr` options that were not parsed correctly, and the behavior of `vkGetFenceStatus` and `vkWaitForFences` at replay time to not "spam" `vkGetFenceStatus` if the call was successful at capture time but not at replay time. Change-Id: I9adff58c364b6a08fde2a95502e3b79152e1cbdf
15bb5d5
to
2bf7eb1
Compare
CI gfxreconstruct build queued with queue ID 336356. |
CI gfxreconstruct build # 5684 running. |
CI gfxreconstruct build # 5684 passed. |
Add two capture options:
When capturing with intent to replay on a "slow" platform, we may want to force the captured application to adopt a "slow platform behavior" - for example, not re-using some resources because we know that at replay time command buffers will take a lot longer to be executed.
To emulate this, this commit implement the notion of "fence query delay". The idea is that when the captured application queries a
VkFence
usingvkGetFenceStatus
orvkWaitForFences
withtimeout
under the "timeout threshold", we transmit the call to the driver, but if the fence is ready, we do not returnVK_SUCCESS
directly. Instead, we returnVK_NOT_READY
for "a certain amount of time" specified by the "delay" and "delay unit" capture options.By default this amount of time is
0 call
, which means no delay at all: we give to the application the result from the driver directly. This amount of time can be specified in number ofcalls
or in number offrames
. The unit (calls/frames) is specified by fence query delay unit, and the number by fence query delay.In addition, this commit also fixes the
--sgfs
and--sgfr
options that were not parsed correctly, and the behavior ofvkGetFenceStatus
andvkWaitForFences
at replay time to not "spam"vkGetFenceStatus
if the call was successful at capture time but not at replay time.