superclip/training/train.py", line 432, in maybe_compute_generative_loss
return F.cross_entropy(token_logits.permute(0, 2, 1), token_labels)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 2 is not equal to len(dims) = 3
run===== torch.Size([32, 49408]) torch.Size([32, 77]) torch.Size([32, 512])run=====