Skip to content

Commit

Permalink
Add fence delay options at capture time
Browse files Browse the repository at this point in the history
Add environment variables (GFXRECON_FENCE_QUERY_DELAY on desktop and
debug.gfxrecon.fence_query_delay on android) to make queries to fences
using vkGetFenceStatus and vkWaitForFences not return VK_SUCCESS until
some number of queries have been done. Instead vkGetFenceStatus will
return VK_NOT_READY and vkWaitForFences will return VK_TIMEOUT.

Change-Id: I9adff58c364b6a08fde2a95502e3b79152e1cbdf
  • Loading branch information
marius-pelegrin-arm committed Jan 25, 2024
1 parent 118b535 commit 3ff41f9
Show file tree
Hide file tree
Showing 12 changed files with 129 additions and 5 deletions.
1 change: 1 addition & 0 deletions USAGE_android.md
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,7 @@ Omit calls with NULL AHardwareBuffer* | debug.gfxrecon.omit_null_hardware_buffer
Page guard unblock SIGSEGV | debug.gfxrecon.page_guard_unblock_sigsegv | BOOL | When the `page_guard` memory tracking mode is enabled and in the case that SIGSEGV has been marked as blocked in thread's signal mask, setting this enviroment variable to `true` will forcibly re-enable the signal in the thread's signal mask. Default is `false`
Page guard signal handler watcher | debug.gfxrecon.page_guard_signal_handler_watcher | BOOL | When the `page_guard` memory tracking mode is enabled, setting this enviroment variable to `true` will spawn a thread which will periodically reinstall the `SIGSEGV` handler if it has been replaced by the application being traced. Default is `false`
Page guard signal handler watcher max restores | debug.gfxrecon.page_guard_signal_handler_watcher_max_restores | INTEGER | Sets the number of times the watcher will attempt to restore the signal handler. Setting it to a negative value will make the watcher thread run indefinitely. Default is `1`
Delay fence queries | debug.gfxrecon.fence_query_delay | INTEGER | Fences queried using `vkGetFenceStatus` and `vkWaitForFences` won't return `VK_SUCCESS` before a number of such queries and will instead return `VK_NOT_READY` and `VK_TIMEOUT`. Default is `0`.

#### Settings File

Expand Down
2 changes: 2 additions & 0 deletions USAGE_desktop_Vulkan.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@ Page Guard Signal Handler Watcher Max Restores | GFXRECON_PAGE_GUARD_SIGNAL_HAND
Force Command Serialization | GFXRECON_FORCE_COMMAND_SERIALIZATION | BOOL | Sets exclusive locks(unique_lock) for every ApiCall. It can avoid external multi-thread to cause captured issue.
Queue Zero Only | GFXRECON_QUEUE_ZERO_ONLY | BOOL | Forces to using only QueueFamilyIndex: 0 and queueCount: 1 on capturing to avoid replay error for unavailble VkQueue.
Allow Pipeline Compile Required | GFXRECON_ALLOW_PIPELINE_COMPILE_REQUIRED | BOOL | The default behaviour forces VK_PIPELINE_COMPILE_REQUIRED to be returned from Create*Pipelines calls which have VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT set, and skips dispatching and recording the calls. This forces applications to fallback to recompiling pipelines without caching, the Vulkan calls for which will be captured. Enabling this option causes capture to record the application's calls and implementation's return values unmodified, but the resulting captures are fragile to changes in Vulkan implementations if they use pipeline caching.
Delay fence queries | GFXRECON_FENCE_QUERY_DELAY | INTEGER | Fences queried using `vkGetFenceStatus` and `vkWaitForFences` won't return `VK_SUCCESS` before a number of such queries and will instead return `VK_NOT_READY` and `VK_TIMEOUT`. Default is `0`.

#### Memory Tracking Known Issues

There is a known issue with the page guard memory tracking method. The logic behind that method is to apply a memory protection to the guarded/shadowed regions so that accesses made by the user to trigger a segmentation fault which is handled by GFXReconstruct.
Expand Down
1 change: 1 addition & 0 deletions framework/encode/capture_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,7 @@ bool CaptureManager::Initialize(std::string base_filename, const CaptureSettings
force_command_serialization_ = trace_settings.force_command_serialization;
queue_zero_only_ = trace_settings.queue_zero_only;
allow_pipeline_compile_required_ = trace_settings.allow_pipeline_compile_required;
fence_query_delay_ = trace_settings.fence_query_delay;

rv_annotation_info_.gpuva_mask = trace_settings.rv_anotation_info.gpuva_mask;
rv_annotation_info_.descriptor_mask = trace_settings.rv_anotation_info.descriptor_mask;
Expand Down
3 changes: 3 additions & 0 deletions framework/encode/capture_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,8 @@ class CaptureManager
return thread_data->block_index_ == 0 ? 0 : thread_data->block_index_ - 1;
}

uint32_t GetFenceQueryDelay() const { return fence_query_delay_; }

protected:
enum CaptureModeFlags : uint32_t
{
Expand Down Expand Up @@ -364,6 +366,7 @@ class CaptureManager
bool allow_pipeline_compile_required_;
bool quit_after_frame_ranges_;
static std::function<void()> delete_instance_func_;
uint32_t fence_query_delay_;

struct
{
Expand Down
10 changes: 10 additions & 0 deletions framework/encode/capture_settings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ GFXRECON_BEGIN_NAMESPACE(encode)
#define RV_ANNOTATION_GPUVA_UPPER "RV_ANNOTATION_GPUVA"
#define RV_ANNOTATION_DESCRIPTOR_LOWER "rv_annotation_descriptor"
#define RV_ANNOTATION_DESCRIPTOR_UPPER "RV_ANNOTATION_DESCRIPTOR"
#define FENCE_QUERY_DELAY_LOWER "fence_query_delay"
#define FENCE_QUERY_DELAY_UPPER "FENCE_QUERY_DELAY"

#if defined(__ANDROID__)
// Android Properties
Expand Down Expand Up @@ -187,6 +189,7 @@ const char kAnnotationExperimentalEnvVar[] = GFXRECON_ENV_VAR_
const char kAnnotationRandEnvVar[] = GFXRECON_ENV_VAR_PREFIX RV_ANNOTATION_RAND_LOWER;
const char kAnnotationGPUVAEnvVar[] = GFXRECON_ENV_VAR_PREFIX RV_ANNOTATION_GPUVA_LOWER;
const char kAnnotationDescriptorEnvVar[] = GFXRECON_ENV_VAR_PREFIX RV_ANNOTATION_DESCRIPTOR_LOWER;
const char kFenceQueryDelayEnvVar[] = GFXRECON_ENV_VAR_PREFIX FENCE_QUERY_DELAY_LOWER;

#else
// Desktop environment settings
Expand Down Expand Up @@ -239,6 +242,7 @@ const char kAnnotationExperimentalEnvVar[] = GFXRECON_ENV_VAR_
const char kAnnotationRandEnvVar[] = GFXRECON_ENV_VAR_PREFIX RV_ANNOTATION_RAND_UPPER;
const char kAnnotationGPUVAEnvVar[] = GFXRECON_ENV_VAR_PREFIX RV_ANNOTATION_GPUVA_UPPER;
const char kAnnotationDescriptorEnvVar[] = GFXRECON_ENV_VAR_PREFIX RV_ANNOTATION_DESCRIPTOR_UPPER;
const char kFenceQueryDelayEnvVar[] = GFXRECON_ENV_VAR_PREFIX FENCE_QUERY_DELAY_UPPER;

#endif

Expand Down Expand Up @@ -290,6 +294,7 @@ const std::string kOptionKeyAnnotationExperimental = std::stri
const std::string kOptionKeyAnnotationRand = std::string(kSettingsFilter) + std::string(RV_ANNOTATION_RAND_LOWER);
const std::string kOptionKeyAnnotationGPUVA = std::string(kSettingsFilter) + std::string(RV_ANNOTATION_GPUVA_LOWER);
const std::string kOptionKeyAnnotationDescriptor = std::string(kSettingsFilter) + std::string(RV_ANNOTATION_DESCRIPTOR_LOWER);
const std::string kOptionFenceQueryDelay = std::string(kSettingsFilter) + std::string(FENCE_QUERY_DELAY_LOWER);

#if defined(GFXRECON_ENABLE_LZ4_COMPRESSION)
const format::CompressionType kDefaultCompressionType = format::CompressionType::kLz4;
Expand Down Expand Up @@ -445,6 +450,8 @@ void CaptureSettings::LoadOptionsEnvVar(OptionsMap* options)
LoadSingleOptionEnvVar(options, kAnnotationRandEnvVar, kOptionKeyAnnotationRand);
LoadSingleOptionEnvVar(options, kAnnotationGPUVAEnvVar, kOptionKeyAnnotationGPUVA);
LoadSingleOptionEnvVar(options, kAnnotationDescriptorEnvVar, kOptionKeyAnnotationDescriptor);

LoadSingleOptionEnvVar(options, kFenceQueryDelayEnvVar, kOptionFenceQueryDelay);
}

void CaptureSettings::LoadOptionsFile(OptionsMap* options)
Expand Down Expand Up @@ -616,6 +623,9 @@ void CaptureSettings::ProcessOptions(OptionsMap* options, CaptureSettings* setti
settings->trace_settings_.rv_anotation_info.descriptor_mask =
ParseUnsignedInteger16String(FindOption(options, kOptionKeyAnnotationDescriptor),
settings->trace_settings_.rv_anotation_info.descriptor_mask);

settings->trace_settings_.fence_query_delay =
ParseIntegerString(FindOption(options, kOptionFenceQueryDelay), settings->trace_settings_.fence_query_delay);
}

void CaptureSettings::ProcessLogOptions(OptionsMap* options, CaptureSettings* settings)
Expand Down
1 change: 1 addition & 0 deletions framework/encode/capture_settings.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ class CaptureSettings
bool queue_zero_only{ false };
bool allow_pipeline_compile_required{ false };
bool quit_after_frame_ranges{ false };
uint32_t fence_query_delay{ 0 };

// An optimization for the page_guard memory tracking mode that eliminates the need for shadow memory by
// overriding vkAllocateMemory so that all host visible allocations use the external memory extension with a
Expand Down
20 changes: 20 additions & 0 deletions framework/encode/custom_vulkan_encoder_commands.h
Original file line number Diff line number Diff line change
Expand Up @@ -740,6 +740,26 @@ struct CustomEncoderPostCall<format::ApiCallId::ApiCall_vkQueueBindSparse>
}
};

template <>
struct CustomEncoderPostCall<format::ApiCallId::ApiCall_vkResetFences>
{
template <typename... Args>
static void Dispatch(VulkanCaptureManager* manager, VkResult result, Args... args)
{
manager->PostProcess_vkResetFences(result, args...);
}
};

template <>
struct CustomEncoderPostCall<format::ApiCallId::ApiCall_vkGetFenceStatus>
{
template <typename... Args>
static void Dispatch(VulkanCaptureManager* manager, VkResult& result, Args... args)
{
manager->PostProcess_vkGetFenceStatus(result, args...);
}
};

template <>
struct CustomEncoderPostCall<format::ApiCallId::ApiCall_vkUpdateDescriptorSets>
{
Expand Down
45 changes: 43 additions & 2 deletions framework/encode/vulkan_capture_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1541,6 +1541,37 @@ void VulkanCaptureManager::OverrideGetPhysicalDeviceQueueFamilyProperties2KHR(
}
}

VkResult VulkanCaptureManager::OverrideWaitForFences(
VkDevice device, uint32_t fenceCount, const VkFence* pFences, VkBool32 waitAll, uint64_t timeout)
{
if (timeout == UINT64_MAX)
{
// If the caller signals that it explicitly intends to wait until success, then it is less likely to handle a
// timeout return value here.
return GetDeviceTable(device)->WaitForFences(device, fenceCount, pFences, waitAll, timeout);
}

bool delay = false;
for (uint32_t i = 0; i < fenceCount; ++i)
{
FenceWrapper* wrapper = GetWrapper<FenceWrapper>(pFences[i]);
assert(wrapper != nullptr);
if (wrapper->query_delay != 0)
{
// Make sure we decrement every fence, if multiple.
delay = true;
--wrapper->query_delay;
}
}

if (delay)
{
return VK_TIMEOUT;
}

return GetDeviceTable(device)->WaitForFences(device, fenceCount, pFences, waitAll, timeout);
}

void VulkanCaptureManager::ProcessEnumeratePhysicalDevices(VkResult result,
VkInstance instance,
uint32_t count,
Expand Down Expand Up @@ -2366,8 +2397,8 @@ void VulkanCaptureManager::PreProcess_vkQueueSubmit(VkQueue queue,
GFXRECON_UNREFERENCED_PARAMETER(queue);
GFXRECON_UNREFERENCED_PARAMETER(submitCount);
GFXRECON_UNREFERENCED_PARAMETER(pSubmits);
GFXRECON_UNREFERENCED_PARAMETER(fence);

ProcessFenceSubmit(fence);
QueueSubmitWriteFillMemoryCmd();

PreQueueSubmit();
Expand All @@ -2393,8 +2424,8 @@ void VulkanCaptureManager::PreProcess_vkQueueSubmit2(VkQueue queue,
GFXRECON_UNREFERENCED_PARAMETER(queue);
GFXRECON_UNREFERENCED_PARAMETER(submitCount);
GFXRECON_UNREFERENCED_PARAMETER(pSubmits);
GFXRECON_UNREFERENCED_PARAMETER(fence);

ProcessFenceSubmit(fence);
QueueSubmitWriteFillMemoryCmd();

PreQueueSubmit();
Expand All @@ -2420,6 +2451,16 @@ void VulkanCaptureManager::PreProcess_vkQueueSubmit2(VkQueue queue,
}
}

void VulkanCaptureManager::ProcessFenceSubmit(VkFence fence)
{
if (fence != VK_NULL_HANDLE)
{
FenceWrapper* wrapper = GetWrapper<FenceWrapper>(fence);
assert(wrapper != nullptr);
wrapper->query_delay = GetFenceQueryDelay();
}
}

void VulkanCaptureManager::QueueSubmitWriteFillMemoryCmd()
{
if (GetMemoryTrackingMode() == CaptureSettings::MemoryTrackingMode::kPageGuard)
Expand Down
41 changes: 40 additions & 1 deletion framework/encode/vulkan_capture_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,9 @@ class VulkanCaptureManager : public CaptureManager
uint32_t* pQueueFamilyPropertyCount,
VkQueueFamilyProperties2* pQueueFamilyProperties);

VkResult OverrideWaitForFences(
VkDevice device, uint32_t fenceCount, const VkFence* pFences, VkBool32 waitAll, uint64_t timeout);

void PostProcess_vkEnumeratePhysicalDevices(VkResult result,
VkInstance instance,
uint32_t* pPhysicalDeviceCount,
Expand Down Expand Up @@ -495,6 +498,8 @@ class VulkanCaptureManager : public CaptureManager
state_tracker_->TrackSemaphoreSignalState(semaphore);
state_tracker_->TrackAcquireImage(*index, swapchain, semaphore, fence, 0);
}

ProcessFenceSubmit(fence);
}

void PostProcess_vkAcquireNextImage2KHR(VkResult result,
Expand All @@ -513,6 +518,8 @@ class VulkanCaptureManager : public CaptureManager
pAcquireInfo->fence,
pAcquireInfo->deviceMask);
}

ProcessFenceSubmit(pAcquireInfo->fence);
}

void PostProcess_vkQueuePresentKHR(VkResult result, VkQueue queue, const VkPresentInfoKHR* pPresentInfo)
Expand All @@ -531,7 +538,7 @@ class VulkanCaptureManager : public CaptureManager
}

void PostProcess_vkQueueBindSparse(
VkResult result, VkQueue, uint32_t bindInfoCount, const VkBindSparseInfo* pBindInfo, VkFence)
VkResult result, VkQueue, uint32_t bindInfoCount, const VkBindSparseInfo* pBindInfo, VkFence fence)
{
if (((GetCaptureMode() & kModeTrack) == kModeTrack) && (result == VK_SUCCESS))
{
Expand All @@ -544,6 +551,36 @@ class VulkanCaptureManager : public CaptureManager
pBindInfo[i].pSignalSemaphores);
}
}

ProcessFenceSubmit(fence);
}

void PostProcess_vkResetFences(VkResult result, VkDevice device, uint32_t fenceCount, const VkFence* pFences)
{
GFXRECON_UNREFERENCED_PARAMETER(device);

for (uint32_t i = 0; i < fenceCount; ++i)
{
FenceWrapper* wrapper = GetWrapper<FenceWrapper>(pFences[i]);
assert(wrapper != nullptr);
wrapper->query_delay = 0;
}
}

void PostProcess_vkGetFenceStatus(VkResult& result, VkDevice device, VkFence fence)
{
GFXRECON_UNREFERENCED_PARAMETER(device);

if (result == VK_SUCCESS)
{
FenceWrapper* wrapper = GetWrapper<FenceWrapper>(fence);
assert(wrapper != nullptr);
if (wrapper->query_delay != 0)
{
--wrapper->query_delay;
result = VK_NOT_READY;
}
}
}

void PostProcess_vkGetBufferMemoryRequirements(VkDevice device,
Expand Down Expand Up @@ -1309,6 +1346,8 @@ class VulkanCaptureManager : public CaptureManager

bool CheckCommandBufferWrapperForFrameBoundary(const CommandBufferWrapper* command_buffer_wrapper);

void ProcessFenceSubmit(VkFence fence);

private:
void QueueSubmitWriteFillMemoryCmd();

Expand Down
5 changes: 5 additions & 0 deletions framework/encode/vulkan_handle_wrappers.h
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,11 @@ struct FenceWrapper : public HandleWrapper<VkFence>
// create parameters will need to be modified to reflect the state at snapshot write.
bool created_signaled{ false };
DeviceWrapper* device{ nullptr };

// The fence cannot be "validated" until a certain number of queries (that might correspond to a number of frames)
// to the fence have been called. So if query_delay is not zero but the fence is validated by Vulkan,
// vkGetFenceStatus will still return VK_NOT_READY.
uint32_t query_delay{ 0 };
};

struct EventWrapper : public HandleWrapper<VkEvent>
Expand Down
2 changes: 1 addition & 1 deletion framework/generated/generated_vulkan_api_call_encoders.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1368,7 +1368,7 @@ VKAPI_ATTR VkResult VKAPI_CALL WaitForFences(

CustomEncoderPreCall<format::ApiCallId::ApiCall_vkWaitForFences>::Dispatch(manager, device, fenceCount, pFences, waitAll, timeout);

VkResult result = GetDeviceTable(device)->WaitForFences(device, fenceCount, pFences, waitAll, timeout);
VkResult result = VulkanCaptureManager::Get()->OverrideWaitForFences(device, fenceCount, pFences, waitAll, timeout);

auto encoder = manager->BeginApiCallCapture(format::ApiCallId::ApiCall_vkWaitForFences);
if (encoder)
Expand Down
3 changes: 2 additions & 1 deletion framework/generated/vulkan_generators/capture_overrides.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
"vkGetPhysicalDeviceQueueFamilyProperties": "manager->OverrideGetPhysicalDeviceQueueFamilyProperties",
"vkGetPhysicalDeviceQueueFamilyProperties2": "manager->OverrideGetPhysicalDeviceQueueFamilyProperties2",
"vkGetPhysicalDeviceQueueFamilyProperties2KHR": "manager->OverrideGetPhysicalDeviceQueueFamilyProperties2KHR",
"vkCmdBuildAccelerationStructuresKHR": "manager->OverrideCmdBuildAccelerationStructuresKHR"
"vkCmdBuildAccelerationStructuresKHR": "manager->OverrideCmdBuildAccelerationStructuresKHR",
"vkWaitForFences": "VulkanCaptureManager::Get()->OverrideWaitForFences"
}
}

0 comments on commit 3ff41f9

Please sign in to comment.