-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
After adjusting the code to make it run, I find a dimensional error on the forward pass:
Creating model instance...
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/lazy.py:180: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.
warnings.warn('Lazy modules are a new feature under heavy development '
VoViT pre-trained weights loaded
Lead Voice enhancer pre-trained weights loaded
Done
Forwarding speaker1...
/usr/local/lib/python3.10/dist-packages/torchaudio/functional/functional.py:109: UserWarning: `return_complex` argument is now deprecated and is not effective.`torchaudio.functional.spectrogram(power=None)` always returns a tensor with complex dtype. Please remove the argument in the function call.
warnings.warn(
/content/VoViT/vovit/core/models/production_model.py:102: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at ../aten/src/ATen/native/Copy.cpp:276.)
return s.to(dtype)
---------------------------------------------------------------------------
EinopsError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/einops/einops.py](https://localhost:8080/#) in reduce(tensor, pattern, reduction, **axes_lengths)
411 recipe = _prepare_transformation_recipe(pattern, reduction, axes_lengths=hashable_axes_lengths)
--> 412 return _apply_recipe(recipe, tensor, reduction_type=reduction)
413 except EinopsError as e:
15 frames
[/usr/local/lib/python3.10/dist-packages/einops/einops.py](https://localhost:8080/#) in _apply_recipe(recipe, tensor, reduction_type)
234 init_shapes, reduced_axes, axes_reordering, added_axes, final_shapes = \
--> 235 _reconstruct_from_shape(recipe, backend.shape(tensor))
236 tensor = backend.reshape(tensor, init_shapes)
[/usr/local/lib/python3.10/dist-packages/einops/einops.py](https://localhost:8080/#) in _reconstruct_from_shape_uncached(self, shape)
164 if len(shape) != len(self.input_composite_axes):
--> 165 raise EinopsError('Expected {} dimensions, got {}'.format(len(self.input_composite_axes), len(shape)))
166
EinopsError: Expected 4 dimensions, got 3
During handling of the above exception, another exception occurred:
EinopsError Traceback (most recent call last)
[<ipython-input-7-faaa648e3dcd>](https://localhost:8080/#) in <cell line: 28>()
28 with torch.no_grad():
29 print('Forwarding speaker1...')
---> 30 pred_s1 = model.forward_unlimited(mixture, speaker1_face)
31 print('Forwarding speaker2...')
32 pred_s2 = model.forward_unlimited(mixture, speaker2_face)
[/content/VoViT/vovit/__init__.py](https://localhost:8080/#) in forward_unlimited(self, mixture, visuals)
78 visuals = visuals[:n_chunks * fps * 2].view(n_chunks, fps * 2, 3, 68)
79 mixture = mixture[:n_chunks * length].view(n_chunks, -1)
---> 80 pred = self.forward(mixture, visuals)
81 pred_unraveled = {}
82 for k, v in pred.items():
[/content/VoViT/vovit/__init__.py](https://localhost:8080/#) in forward(self, mixture, visuals, extract_landmarks)
56 mixture /= mixture.abs().max()
57
---> 58 return self.vovit(mixture, ld)
59
60 def forward_unlimited(self, mixture, visuals):
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
[/content/VoViT/vovit/core/models/production_model.py](https://localhost:8080/#) in forward(self, mixture, landmarks)
378 """
379 inputs = {'src': mixture, 'landmarks': landmarks}
--> 380 return self.avse(inputs)
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
[/content/VoViT/vovit/core/models/production_model.py](https://localhost:8080/#) in forward(self, *args, **kwargs)
325
326 def forward(self, *args, **kwargs):
--> 327 return self.inference(*args, **kwargs)
328
329 def inference(self, inputs: dict, n_iter=1):
[/content/VoViT/vovit/core/models/production_model.py](https://localhost:8080/#) in inference(self, inputs, n_iter)
329 def inference(self, inputs: dict, n_iter=1):
330 with torch.no_grad():
--> 331 output = self.forward_avse(inputs, compute_istft=False)
332 estimated_sp = output['estimated_sp']
333 for i in range(n_iter):
[/content/VoViT/vovit/core/models/production_model.py](https://localhost:8080/#) in forward_avse(self, inputs, compute_istft)
321 def forward_avse(self, inputs, compute_istft: bool):
322 self.av_se.eval()
--> 323 output = self.av_se(inputs, compute_wav=compute_istft)
324 return output
325
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
[/content/VoViT/vovit/core/models/production_model.py](https://localhost:8080/#) in forward(self, inputs, compute_wav)
223 # ==========================================
224
--> 225 audio_feats = self.audio_processor.preprocess_audio(inputs['src'])
226
227 """
[/content/VoViT/vovit/core/models/production_model.py](https://localhost:8080/#) in preprocess_audio(self, n_sources, *src)
135 # Contiguous required to address memory problems in certain gpus
136 sp_mix = sp_mix_raw[:, ::2, ...].contiguous() # BxFxTx2
--> 137 x = rearrange(sp_mix, 'b f t c -> b c f t')
138 output = {'mixture': x, 'sp_mix_raw': sp_mix_raw}
139
[/usr/local/lib/python3.10/dist-packages/einops/einops.py](https://localhost:8080/#) in rearrange(tensor, pattern, **axes_lengths)
481 raise TypeError("Rearrange can't be applied to an empty list")
482 tensor = get_backend(tensor[0]).stack_on_zeroth_dimension(tensor)
--> 483 return reduce(cast(Tensor, tensor), pattern, reduction='rearrange', **axes_lengths)
484
485
[/usr/local/lib/python3.10/dist-packages/einops/einops.py](https://localhost:8080/#) in reduce(tensor, pattern, reduction, **axes_lengths)
418 message += '\n Input is list. '
419 message += 'Additional info: {}.'.format(axes_lengths)
--> 420 raise EinopsError(message + '\n {}'.format(e))
421
422
EinopsError: Error while processing rearrange-reduction pattern "b f t c -> b c f t".
Input tensor shape: torch.Size([4, 256, 128]). Additional info: {}.
Expected 4 dimensions, got 3
This error comes either in the colab notebook and when I clone the repo in my local. The main changes I did is to reformat the requirements to use newer versions of pytorch packages and cuda and fix the bugs generated by np.int
What do you suggest?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels