🛠️ Fix: Ensure DTW cost tensor uses the same device as the input tensor (x.device) #2561

nmharmon8 · 2025-03-25T16:35:26Z

📍 Problem
In multi-GPU environments, the dtw_cuda() function in whisper/timing.py raises the following error during transcription with word timestamps enabled:

ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)

This occurs because the cost tensor is sent to .cuda() without specifying a device. By default, .cuda() places the tensor on cuda:0, which leads to a device mismatch when the input tensor x resides on a different GPU (e.g., cuda:1).

🔍 Root Cause
In dtw_cuda(), the cost tensor is being moved to GPU using:

cost = cost.cuda()

This assumes all data lives on cuda:0, which is not true in multi-GPU setups. As a result, Triton throws a device access error when trying to launch the kernel with mismatched tensors.

✅ Solution
The fix is to ensure the cost tensor is sent to the same device as the input tensor:

cost = cost.to(device=x.device)

This guarantees consistency and allows the Triton kernel to access all pointers correctly.

Fix: Ensure DTW cost tensor is on the same device as input tensor

6101824

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🛠️ Fix: Ensure DTW cost tensor uses the same device as the input tensor (x.device) #2561

🛠️ Fix: Ensure DTW cost tensor uses the same device as the input tensor (x.device) #2561

nmharmon8 commented Mar 25, 2025

🛠️ Fix: Ensure DTW cost tensor uses the same device as the input tensor (x.device) #2561

Are you sure you want to change the base?

🛠️ Fix: Ensure DTW cost tensor uses the same device as the input tensor (x.device) #2561

Conversation

nmharmon8 commented Mar 25, 2025