Skip to content

๐Ÿ› ๏ธ Fix: Ensure DTW cost tensor uses the same device as the input tensor (x.device) #2561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nmharmon8
Copy link

๐Ÿ“ Problem
In multi-GPU environments, the dtw_cuda() function in whisper/timing.py raises the following error during transcription with word timestamps enabled:

ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)

This occurs because the cost tensor is sent to .cuda() without specifying a device. By default, .cuda() places the tensor on cuda:0, which leads to a device mismatch when the input tensor x resides on a different GPU (e.g., cuda:1).

๐Ÿ” Root Cause
In dtw_cuda(), the cost tensor is being moved to GPU using:

cost = cost.cuda()

This assumes all data lives on cuda:0, which is not true in multi-GPU setups. As a result, Triton throws a device access error when trying to launch the kernel with mismatched tensors.

โœ… Solution
The fix is to ensure the cost tensor is sent to the same device as the input tensor:

cost = cost.to(device=x.device)

This guarantees consistency and allows the Triton kernel to access all pointers correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant