Skip to content

Commit

Permalink
Add fence delay options at capture time
Browse files Browse the repository at this point in the history
Add three capture options:
- Fence Query Delay
- Fence Query Delay Unit
- Fence Query Timeout Threshold

When capturing with intent to replay on a "slow" platform, we may
want to force the captured application to adopt a "slow platform
behavior" - for example, not re-using some resources because we
know that at replay time command buffers will take a lot longer
to be executed.

To emulate this, this commit implement the notion of "fence query
delay". The idea is that when the captured application queries a
`VkFence` using `vkGetFenceStatus` or `vkWaitForFences` with `timeout`
under the "timeout threshold", we transmit the call to the driver,
but if the fence is ready, we do not return `VK_SUCCESS` directly.
Instead, we return `VK_NOT_READY` for "a certain amount of time"
specified by the "delay" and "delay unit" capture options.

By default this amount of time is `0 call`, which means no delay at
all: we give to the application the result from the driver directly.
This amount of time can be specified in number of `calls` or in
number of `frames`. The unit (calls/frames) is specified by fence
query delay unit, and the number by fence query delay.

In addition, this commit also fixes the `--sgfs` and `--sgfr` options
that were not parsed correctly, and the behavior of `vkGetFenceStatus`
and `vkWaitForFences` at replay time to not "spam" `vkGetFenceStatus`
if the call was successful at capture time but not at replay time.

Change-Id: I9adff58c364b6a08fde2a95502e3b79152e1cbdf
  • Loading branch information
marius-pelegrin-arm committed Jan 3, 2025
1 parent acc5f4d commit 2bf7eb1
Show file tree
Hide file tree
Showing 18 changed files with 394 additions and 75 deletions.
3 changes: 3 additions & 0 deletions USAGE_android.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,9 @@ option values.
| Page guard signal handler watcher | debug.gfxrecon.page_guard_signal_handler_watcher | BOOL | When the `page_guard` memory tracking mode is enabled, setting this enviroment variable to `true` will spawn a thread which will periodically reinstall the `SIGSEGV` handler if it has been replaced by the application being traced. Default is `false` |
| Page guard signal handler watcher max restores | debug.gfxrecon.page_guard_signal_handler_watcher_max_restores | INTEGER | Sets the number of times the watcher will attempt to restore the signal handler. Setting it to a negative value will make the watcher thread run indefinitely. Default is `1` |
| Force FIFO present mode | debug.gfxrecon.force_fifo_present_mode | BOOL | When the `force_fifo_present_mode` is enabled, force all present modes in vkGetPhysicalDeviceSurfacePresentModesKHR to VK_PRESENT_MODE_FIFO_KHR, app present mode is set in vkCreateSwapchain to VK_PRESENT_MODE_FIFO_KHR. Otherwise the original present mode will be used. Default is: `true` |
| Fence Query Delay | debug.gfxrecon.fence_query_delay | INTEGER | Fences queried using `vkGetFenceStatus` and `vkWaitForFences` won't return `VK_SUCCESS` before a number of such queries and will instead return `VK_NOT_READY` and `VK_TIMEOUT`. Default is `0`. |
| Fence Query Delay Unit | debug.gfxrecon.fence_query_delay_unit | STRING | Specify the "unit of time" used for the delay fence queries option. If set to `calls` then fence query delay is the number of calls to `vkGetFenceStatus`/`vkWaitForFences` that will be delayed. If set to `frames` then fence query delay is the number of frames for which called will be delayed. Default is `calls`. |
| Fence Query Delay Timeout Threshold | debug.gfxrecon.fence_query_delay_timeout_threshold | INTEGER | Specify a timeout threshold (in nanoseconds) as to what is considered a "fence query" when calling `vkWaitForFences`. Calls to `vkWaitForFences` can either be understood as a synchronization step where you actually want to wait for the underlying command to complete and reaching the timeout is a failure in the command, or as a "delayed query" where you just want to query the fence for a certain amount of time and will try again later if timeout is reached. This option sets the threshold of the timeout value to differentiate these two usages. |

#### Settings File

Expand Down
4 changes: 4 additions & 0 deletions USAGE_desktop_Vulkan.md
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,10 @@ option values.
| Force Command Serialization | GFXRECON_FORCE_COMMAND_SERIALIZATION | BOOL | Sets exclusive locks(unique_lock) for every ApiCall. It can avoid external multi-thread to cause captured issue. |
| Queue Zero Only | GFXRECON_QUEUE_ZERO_ONLY | BOOL | Forces to using only QueueFamilyIndex: 0 and queueCount: 1 on capturing to avoid replay error for unavailble VkQueue. |
| Allow Pipeline Compile Required | GFXRECON_ALLOW_PIPELINE_COMPILE_REQUIRED | BOOL | The default behaviour forces VK_PIPELINE_COMPILE_REQUIRED to be returned from Create*Pipelines calls which have VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT set, and skips dispatching and recording the calls. This forces applications to fallback to recompiling pipelines without caching, the Vulkan calls for which will be captured. Enabling this option causes capture to record the application's calls and implementation's return values unmodified, but the resulting captures are fragile to changes in Vulkan implementations if they use pipeline caching. |
| Fence Query Delay | GFXRECON_FENCE_QUERY_DELAY | INTEGER | Fences queried using `vkGetFenceStatus` and `vkWaitForFences` won't return `VK_SUCCESS` before a number of such queries and will instead return `VK_NOT_READY` and `VK_TIMEOUT`. Default is `0`. |
| Fence Query Delay unit | GFXRECON_FENCE_QUERY_DELAY_UNIT | STRING | Specify the "unit of time" used for the delay fence queries option. If set to `calls` then fence query delay is the number of calls to `vkGetFenceStatus`/`vkWaitForFences` that will be delayed. If set to `frames` then fence query delay is the number of frames for which called will be delayed. Default is `calls`. |
| Fence Query Delay Timeout Threshold | GFXRECON_FENCE_QUERY_DELAY_TIMEOUT_THRESHOLD | INTEGER | Specify a timeout threshold (in nanoseconds) as to what is considered a "fence query" when calling `vkWaitForFences`. Calls to `vkWaitForFences` can either be understood as a synchronization step where you actually want to wait for the underlying command to complete and reaching the timeout is a failure in the command, or as a "delayed query" where you just want to query the fence for a certain amount of time and will try again later if timeout is reached. This option sets the threshold of the timeout value to differentiate these two usages. |

#### Memory Tracking Known Issues

### Capture Limitations
Expand Down
79 changes: 43 additions & 36 deletions framework/decode/vulkan_replay_consumer_base.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3338,23 +3338,6 @@ VkResult VulkanReplayConsumerBase::OverrideWaitForFences(PFN_vkWaitForFences
const VkFence* modified_fences = nullptr;
std::vector<VkFence> valid_fences;

// Check if the call is in a frame range for being skipped (see --skip-get-fence-ranges, --skip-get-fence-status)
bool in_skip_range = options_.skip_get_fence_ranges.empty();
const uint32_t current_frame = application_->GetCurrentFrameNumber() + 1;
for (const util::UintRange& range : options_.skip_get_fence_ranges)
{
if (current_frame >= range.first && current_frame <= range.last)
{
in_skip_range = true;
break;
}
}

if (in_skip_range && options_.skip_get_fence_status == SkipGetFenceStatus::SkipAll)
{
return result;
}

// Check for fences that need to be removed.
if (shadow_fences_.empty())
{
Expand Down Expand Up @@ -3386,23 +3369,42 @@ VkResult VulkanReplayConsumerBase::OverrideWaitForFences(PFN_vkWaitForFences
modified_fences = valid_fences.data();
}

if (original_result == VK_SUCCESS)
// If the timeout is 0, then we suppose this "wait for fence" is in fact a "get fence status" and should be skipped
// accordingly.
bool in_skip_range = false;
if (timeout == 0)
{
// Ensure that wait for fences waits until the fences have been signaled (or error occurs) by changing the
// timeout to UINT64_MAX.
if (modified_fence_count > 0)
// Check if the call is in a frame range for being skipped (see --skip-get-fence-ranges,
// --skip-get-fence-status)
in_skip_range = options_.skip_get_fence_ranges.empty();
const uint32_t current_frame = application_->GetCurrentFrameNumber() + 1;
for (const util::UintRange& range : options_.skip_get_fence_ranges)
{
result = func(device, modified_fence_count, modified_fences, waitAll, std::numeric_limits<uint64_t>::max());
if (current_frame >= range.first && current_frame <= range.last)
{
in_skip_range = true;
break;
}
}
}
else

if (in_skip_range && options_.skip_get_fence_status == SkipGetFenceStatus::SkipAll)
{
if (in_skip_range && options_.skip_get_fence_status == SkipGetFenceStatus::SkipUnsuccessful)
// Nothing.
}
else if (modified_fence_count > 0)
{
if (original_result == VK_SUCCESS)
{
return result;
// Ensure that wait for fences waits until the fences have been signaled (or error occurs) by changing the
// timeout to UINT64_MAX.
result = func(device, modified_fence_count, modified_fences, waitAll, std::numeric_limits<uint64_t>::max());
}

if (original_result == VK_TIMEOUT)
else if (in_skip_range && options_.skip_get_fence_status == SkipGetFenceStatus::SkipUnsuccessful)
{
// Nothing.
}
else if (original_result == VK_TIMEOUT)
{
// Try to get a timeout result with a 0 timeout.
result = func(device, modified_fence_count, modified_fences, waitAll, 0);
Expand All @@ -3427,6 +3429,11 @@ VkResult VulkanReplayConsumerBase::OverrideGetFenceStatus(PFN_vkGetFenceStatus
VkDevice device = device_info->handle;
VkFence fence = fence_info->handle;

if (shadow_fences_.find(fence) != shadow_fences_.end())
{
return result;
}

// Check if the call is in a frame range for being skipped (see --skip-get-fence-ranges, --skip-get-fence-status)
bool in_skip_range = options_.skip_get_fence_ranges.empty();
const uint32_t current_frame = application_->GetCurrentFrameNumber() + 1;
Expand All @@ -3446,17 +3453,17 @@ VkResult VulkanReplayConsumerBase::OverrideGetFenceStatus(PFN_vkGetFenceStatus
return result;
}

if (shadow_fences_.find(fence) != shadow_fences_.end())
{
return result;
}
result = func(device, fence);

// If you find this loop to be infinite consider adding a limit in the same way
// it is done for GetEventStatus and GetQueryPoolResults.
do
// We don't want the replay to continue if fence was ready at capture time but is not at replay time because future
// calls might use the resources depending on that fence...
if (original_result == VK_SUCCESS && result == VK_NOT_READY)
{
result = func(device, fence);
} while ((original_result == VK_SUCCESS) && (result == VK_NOT_READY));
const encode::VulkanDeviceTable* device_table = GetDeviceTable(device);
GFXRECON_ASSERT(device_table != nullptr);

result = device_table->WaitForFences(device, 1, &fence, VK_TRUE, UINT64_MAX);
}

return result;
}
Expand Down
2 changes: 1 addition & 1 deletion framework/encode/api_capture_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ class ApiCaptureManager

void WriteFrameMarker(format::MarkerType marker_type) { common_manager_->WriteFrameMarker(marker_type); }

void EndFrame(std::shared_lock<CommonCaptureManager::ApiCallMutexT>& current_lock)
virtual void EndFrame(std::shared_lock<CommonCaptureManager::ApiCallMutexT>& current_lock)
{
common_manager_->EndFrame(api_family_, current_lock);
}
Expand Down
Loading

0 comments on commit 2bf7eb1

Please sign in to comment.