Skip to content

Crash in MVKCommandEncoder::finishQueries() with VK_QUERY_TYPE_TIMESTAMP #2698

@Randalix

Description

@Randalix

Crash in MVKCommandEncoder::finishQueries() with VK_QUERY_TYPE_TIMESTAMP

Summary

Intermittent segfault (use-after-free) in MVKCommandEncoder::finishQueries() when using VK_QUERY_TYPE_TIMESTAMP query pools. The crash occurs on a dispatch worker thread during Metal command buffer completion. Likely the same class of missing-retain bug as #1178 but in the query path, and possibly related to the race condition in #2378.

Environment

  • MoltenVK: 1.4.0 and 1.4.1 (both affected)
  • macOS: 26.2 (Tahoe)
  • Hardware: Apple M1, Metal 4
  • Application: vkdt (Vulkan compute photo editor)

Reproduction

The application creates a double-buffered timestamp query pool (2000 queries each) and writes timestamps via vkCmdWriteTimestamp() around compute dispatches. The pool is reset via vkCmdResetQueryPool() at the start of each command buffer recording, and results are read with vkGetQueryPoolResults(..., VK_QUERY_RESULT_64_BIT | VK_QUERY_RESULT_WAIT_BIT) after proper timeline semaphore synchronization.

The crash is intermittent — roughly 1 in 3-5 launches. MVK_CONFIG_SYNCHRONOUS_QUEUE_SUBMITS=1 did not help.

Crash stack (identical across all reports)

 0: libMoltenVK.dylib  invocation function for block in MVKCommandEncoder::finishQueries()
 1: Metal              MTLDispatchListApply
 2: Metal              -[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:]
 3: IOGPU              -[IOGPUMetalCommandBuffer didCompleteWithStartTime:endTime:error:]
 4: Metal              -[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:]
 5: IOGPU              IOGPUNotificationQueueDispatchAvailableCompletionNotifications
 6: IOGPU              __IOGPUNotificationQueueSetDispatchQueue_block_invoke
 7: libdispatch.dylib  _dispatch_client_callout4

Faulting addresses vary and are always "not in any region" (e.g. 0x728b0b990cf8, 0xffffe6b668cd52cb, 0xba44753ce7e), consistent with use-after-free.

Application query usage pattern

// Init (double-buffered, i=0..1):
VkQueryPoolCreateInfo info = {
    .sType      = VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO,
    .queryType  = VK_QUERY_TYPE_TIMESTAMP,
    .queryCount = 2000,
};
vkCreateQueryPool(device, &info, NULL, &query[i].pool);

// Record (each frame):
vkBeginCommandBuffer(cmd, &begin_info);
vkCmdResetQueryPool(cmd, query[buf].pool, 0, query[buf].max);
// ... compute dispatches with vkCmdWriteTimestamp() pairs ...
vkEndCommandBuffer(cmd);

// Read results (after timeline semaphore wait for previous frame):
vkGetQueryPoolResults(device, query[prev_buf].pool, 0, cnt, ...
    VK_QUERY_RESULT_64_BIT | VK_QUERY_RESULT_WAIT_BIT);

Workaround

Not creating the VkQueryPool / not issuing any vkCmdWriteTimestamp() calls eliminates the crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions