-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Description
Could at least de-tokenization be done directly on CUDA? Like in my hack bpedecode_vec in pytorch/pytorch#135704 (comment) which indexes into a detokenization vocab byte table via repeat_interleave
Also, maybe for better CUDAGraph-ability / no CPU syncs, there should be some static-sized pre-allocated out= version, like torch.nonzero_static?
Offtopic: it's also a bit inconsistent naming to have batch_decode and batch_encode_plus... What is the motivation for the _plus suffix?
Metadata
Metadata
Assignees
Labels
No labels