Skip to content

Commit

Permalink
Add fence delay options at capture time
Browse files Browse the repository at this point in the history
Add two capture options:
- Fence Query Delay
- Fence Query Delay Unit

When capturing with intent to replay on a "slow" platform, we may
want to force the captured application to adopt a "slow platform
behavior" - for example, not re-using some resources because we
know that at replay time command buffers will take a lot longer
to be executed.

To emulate this, this commit implement the notion of "fence query
delay". The idea is that when the captured application queries a
`VkFence` using `vkGetFenceStatus` or `vkWaitForFences` with `timeout`
set to `0`, we transmit the call to the driver, but if the fence is
ready, we do not return `VK_SUCCESS` directly. Instead, we return
`VK_NOT_READY` for "a certain amount of time" specified by the two
capture options.

By default this amount of time is `0 call`, which means no delay at
all: we give to the application the result from the driver directly.
This amount of time can be specified in number of `calls` or in
number of `frames`. The unit (calls/frames) is specified by fence
query delay unit, and the number by fence query delay.

In addition, this commit also fixes the `--sgfs` and `--sgfr` options
that were not parsed correctly, and the behavior of `vkGetFenceStatus`
and `vkWaitForFences` at replay time to not "spam" `vkGetFenceStatus`
if the call was successful at capture time but not at replay time.

Change-Id: I9adff58c364b6a08fde2a95502e3b79152e1cbdf
  • Loading branch information
marius-pelegrin-arm committed Sep 18, 2024
1 parent 8f763b8 commit b4adecc
Show file tree
Hide file tree
Showing 18 changed files with 346 additions and 56 deletions.
2 changes: 2 additions & 0 deletions USAGE_android.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,8 @@ option values.
| Page guard signal handler watcher | debug.gfxrecon.page_guard_signal_handler_watcher | BOOL | When the `page_guard` memory tracking mode is enabled, setting this enviroment variable to `true` will spawn a thread which will periodically reinstall the `SIGSEGV` handler if it has been replaced by the application being traced. Default is `false` |
| Page guard signal handler watcher max restores | debug.gfxrecon.page_guard_signal_handler_watcher_max_restores | INTEGER | Sets the number of times the watcher will attempt to restore the signal handler. Setting it to a negative value will make the watcher thread run indefinitely. Default is `1` |
| Force FIFO present mode | debug.gfxrecon.force_fifo_present_mode | BOOL | When the `force_fifo_present_mode` is enabled, force all present modes in vkGetPhysicalDeviceSurfacePresentModesKHR to VK_PRESENT_MODE_FIFO_KHR, app present mode is set in vkCreateSwapchain to VK_PRESENT_MODE_FIFO_KHR. Otherwise the original present mode will be used. Default is: `true` |
| Fence Query Delay | debug.gfxrecon.fence_query_delay | INTEGER | Fences queried using `vkGetFenceStatus` and `vkWaitForFences` won't return `VK_SUCCESS` before a number of such queries and will instead return `VK_NOT_READY` and `VK_TIMEOUT`. Default is `0`. |
| Fence Query Delay unit | debug.gfxrecon.fence_query_delay_unit | STRING | Specify the "unit of time" used for the delay fence queries option. If set to `calls` then fence query delay is the number of calls to `vkGetFenceStatus`/`vkWaitForFences` that will be delayed. If set to `frames` then fence query delay is the number of frames for which called will be delayed. Default is `calls`. |

#### Settings File

Expand Down
3 changes: 3 additions & 0 deletions USAGE_desktop_Vulkan.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,9 @@ option values.
| Force Command Serialization | GFXRECON_FORCE_COMMAND_SERIALIZATION | BOOL | Sets exclusive locks(unique_lock) for every ApiCall. It can avoid external multi-thread to cause captured issue. |
| Queue Zero Only | GFXRECON_QUEUE_ZERO_ONLY | BOOL | Forces to using only QueueFamilyIndex: 0 and queueCount: 1 on capturing to avoid replay error for unavailble VkQueue. |
| Allow Pipeline Compile Required | GFXRECON_ALLOW_PIPELINE_COMPILE_REQUIRED | BOOL | The default behaviour forces VK_PIPELINE_COMPILE_REQUIRED to be returned from Create*Pipelines calls which have VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT set, and skips dispatching and recording the calls. This forces applications to fallback to recompiling pipelines without caching, the Vulkan calls for which will be captured. Enabling this option causes capture to record the application's calls and implementation's return values unmodified, but the resulting captures are fragile to changes in Vulkan implementations if they use pipeline caching. |
| Fence Query Delay | GFXRECON_FENCE_QUERY_DELAY | INTEGER | Fences queried using `vkGetFenceStatus` and `vkWaitForFences` won't return `VK_SUCCESS` before a number of such queries and will instead return `VK_NOT_READY` and `VK_TIMEOUT`. Default is `0`. |
| Fence Query Delay unit | GFXRECON_FENCE_QUERY_DELAY_UNIT | STRING | Specify the "unit of time" used for the delay fence queries option. If set to `calls` then fence query delay is the number of calls to `vkGetFenceStatus`/`vkWaitForFences` that will be delayed. If set to `frames` then fence query delay is the number of frames for which called will be delayed. Default is `calls`. |

#### Memory Tracking Known Issues

### Capture Limitations
Expand Down
79 changes: 43 additions & 36 deletions framework/decode/vulkan_replay_consumer_base.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3151,23 +3151,6 @@ VkResult VulkanReplayConsumerBase::OverrideWaitForFences(PFN_vkWaitForFences
const VkFence* modified_fences = nullptr;
std::vector<VkFence> valid_fences;

// Check if the call is in a frame range for being skipped (see --skip-get-fence-ranges, --skip-get-fence-status)
bool in_skip_range = options_.skip_get_fence_ranges.empty();
const uint32_t current_frame = application_->GetCurrentFrameNumber() + 1;
for (const util::UintRange& range : options_.skip_get_fence_ranges)
{
if (current_frame >= range.first && current_frame <= range.last)
{
in_skip_range = true;
break;
}
}

if (in_skip_range && options_.skip_get_fence_status == SkipGetFenceStatus::SkipAll)
{
return result;
}

// Check for fences that need to be removed.
if (shadow_fences_.empty())
{
Expand Down Expand Up @@ -3199,23 +3182,42 @@ VkResult VulkanReplayConsumerBase::OverrideWaitForFences(PFN_vkWaitForFences
modified_fences = valid_fences.data();
}

if (original_result == VK_SUCCESS)
// If the timeout is 0, then we suppose this "wait for fence" is in fact a "get fence status" and should be skipped
// accordingly.
bool in_skip_range = false;
if (timeout == 0)
{
// Ensure that wait for fences waits until the fences have been signaled (or error occurs) by changing the
// timeout to UINT64_MAX.
if (modified_fence_count > 0)
// Check if the call is in a frame range for being skipped (see --skip-get-fence-ranges,
// --skip-get-fence-status)
in_skip_range = options_.skip_get_fence_ranges.empty();
const uint32_t current_frame = application_->GetCurrentFrameNumber() + 1;
for (const util::UintRange& range : options_.skip_get_fence_ranges)
{
result = func(device, modified_fence_count, modified_fences, waitAll, std::numeric_limits<uint64_t>::max());
if (current_frame >= range.first && current_frame <= range.last)
{
in_skip_range = true;
break;
}
}
}
else

if (in_skip_range && options_.skip_get_fence_status == SkipGetFenceStatus::SkipAll)
{
if (in_skip_range && options_.skip_get_fence_status == SkipGetFenceStatus::SkipUnsuccessful)
// Nothing.
}
else if (modified_fence_count > 0)
{
if (original_result == VK_SUCCESS)
{
return result;
// Ensure that wait for fences waits until the fences have been signaled (or error occurs) by changing the
// timeout to UINT64_MAX.
result = func(device, modified_fence_count, modified_fences, waitAll, std::numeric_limits<uint64_t>::max());
}

if (original_result == VK_TIMEOUT)
else if (in_skip_range && options_.skip_get_fence_status == SkipGetFenceStatus::SkipUnsuccessful)
{
// Nothing.
}
else if (original_result == VK_TIMEOUT)
{
// Try to get a timeout result with a 0 timeout.
result = func(device, modified_fence_count, modified_fences, waitAll, 0);
Expand All @@ -3240,6 +3242,11 @@ VkResult VulkanReplayConsumerBase::OverrideGetFenceStatus(PFN_vkGetFenceStatus f
VkDevice device = device_info->handle;
VkFence fence = fence_info->handle;

if (shadow_fences_.find(fence) != shadow_fences_.end())
{
return result;
}

// Check if the call is in a frame range for being skipped (see --skip-get-fence-ranges, --skip-get-fence-status)
bool in_skip_range = options_.skip_get_fence_ranges.empty();
const uint32_t current_frame = application_->GetCurrentFrameNumber() + 1;
Expand All @@ -3259,17 +3266,17 @@ VkResult VulkanReplayConsumerBase::OverrideGetFenceStatus(PFN_vkGetFenceStatus f
return result;
}

if (shadow_fences_.find(fence) != shadow_fences_.end())
{
return result;
}
result = func(device, fence);

// If you find this loop to be infinite consider adding a limit in the same way
// it is done for GetEventStatus and GetQueryPoolResults.
do
// We don't want the replay to continue if fence was ready at capture time but is not at replay time because future
// calls might use the resources depending on that fence...
if (original_result == VK_SUCCESS && result == VK_NOT_READY)
{
result = func(device, fence);
} while ((original_result == VK_SUCCESS) && (result == VK_NOT_READY));
const encode::VulkanDeviceTable* device_table = GetDeviceTable(device);
GFXRECON_ASSERT(device_table != nullptr);

result = device_table->WaitForFences(device, 1, &fence, VK_TRUE, UINT64_MAX);
}

return result;
}
Expand Down
2 changes: 1 addition & 1 deletion framework/encode/api_capture_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ class ApiCaptureManager

void WriteFrameMarker(format::MarkerType marker_type) { common_manager_->WriteFrameMarker(marker_type); }

void EndFrame() { common_manager_->EndFrame(api_family_); }
virtual void EndFrame() { common_manager_->EndFrame(api_family_); }

// Pre/PostQueueSubmit to be called immediately before and after work is submitted to the GPU by vkQueueSubmit for
// Vulkan or by ID3D12CommandQueue::ExecuteCommandLists for DX12.
Expand Down
17 changes: 17 additions & 0 deletions framework/encode/capture_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,8 @@ bool CommonCaptureManager::Initialize(format::ApiFamilyId api_
queue_zero_only_ = trace_settings.queue_zero_only;
allow_pipeline_compile_required_ = trace_settings.allow_pipeline_compile_required;
force_fifo_present_mode_ = trace_settings.force_fifo_present_mode;
fence_query_delay_ = trace_settings.fence_query_delay;
fence_query_delay_unit_ = trace_settings.fence_query_delay_unit;

rv_annotation_info_.gpuva_mask = trace_settings.rv_anotation_info.gpuva_mask;
rv_annotation_info_.descriptor_mask = trace_settings.rv_anotation_info.descriptor_mask;
Expand Down Expand Up @@ -1400,6 +1402,21 @@ void CommonCaptureManager::WriteCaptureOptions(std::string& operation_annotation
buffer += force_command_serialization_ ? "true," : "false,";
}

if (fence_query_delay_ != default_settings.fence_query_delay)
{
buffer += "\n \"fence-query-delay\": " + std::to_string(fence_query_delay_) + ',';
buffer += "\n \"fence-query-delay-unit\": \"";
if (fence_query_delay_unit_ == CaptureSettings::FenceQueryDelayUnit::kCalls)
{
buffer += "calls";
}
else if (fence_query_delay_unit_ == CaptureSettings::FenceQueryDelayUnit::kFrames)
{
buffer += "frames";
}
buffer += "\",";
}

if (queue_zero_only_ != default_settings.queue_zero_only)
{
buffer += "\n \"queue-zero-only\": ";
Expand Down
11 changes: 11 additions & 0 deletions framework/encode/capture_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,15 @@ class CommonCaptureManager
}

static bool CreateInstance(ApiCaptureManager* api_instance_, const std::function<void()>& destroyer);
uint32_t GetFenceQueryDelay() const
{
return fence_query_delay_;
}
CaptureSettings::FenceQueryDelayUnit GetFenceQueryDelayUnit() const
{
return fence_query_delay_unit_;
}

template <typename Derived>
static bool CreateInstance()
{
Expand Down Expand Up @@ -382,6 +391,8 @@ class CommonCaptureManager
bool allow_pipeline_compile_required_;
bool quit_after_frame_ranges_;
bool force_fifo_present_mode_;
uint32_t fence_query_delay_;
CaptureSettings::FenceQueryDelayUnit fence_query_delay_unit_;

struct
{
Expand Down
Loading

0 comments on commit b4adecc

Please sign in to comment.