Using feature extraction layers #455

desh2608 · 2021-11-05T21:36:46Z

desh2608
Nov 5, 2021
Collaborator

I am trying to use lhotse.features.kaldi.layers.Wav2Spec as layer in a network as follows:

feat_extractor = lhotse.features.kaldi.layers.Wav2Spec()

x = feat_extractor.forward(x)

What should be the shape of the input x to the forward() method?

Answered by csukuangfj

Nov 6, 2021

As far as I can tell, the input shape of x should be

(batch_size, num_samples)

However, there is incorrect documentation in the code.

You can find the correct shape for x by reading the following code:

lhotse/lhotse/features/kaldi/layers.py

Line 297 in 9b6e3e4

x_strided = self.wav2win(x)

lhotse/lhotse/features/kaldi/layers.py

Lines 135 to 151 in 9b6e3e4

     def forward(self, x: torch.Tensor) -> torch.Tensor:  
   # Add dither  
   if self.dither != 0.0:  
   n = torch.randn(x.shape, device=x.device)  
   x = x + self.dither * n  
    
   # remove offset  
   if self.remove_dc_offset:  
   mu = torch.mean(x, dim=1, keepdim=True)  
   x = x - mu  
    
   if self.r…

View full answer

csukuangfj · 2021-11-06T00:41:58Z

csukuangfj
Nov 6, 2021

As far as I can tell, the input shape of x should be

(batch_size, num_samples)

However, there is incorrect documentation in the code.

You can find the correct shape for x by reading the following code:

lhotse/lhotse/features/kaldi/layers.py

Line 297 in 9b6e3e4

x_strided = self.wav2win(x)

lhotse/lhotse/features/kaldi/layers.py

Lines 135 to 151 in 9b6e3e4

    
           def forward(self, x: torch.Tensor) -> torch.Tensor: 
        
               # Add dither 
        
               if self.dither != 0.0: 
        
                   n = torch.randn(x.shape, device=x.device) 
        
                   x = x + self.dither * n 
        
               # remove offset 
        
               if self.remove_dc_offset: 
        
                   mu = torch.mean(x, dim=1, keepdim=True) 
        
                   x = x - mu 
        
               if self.return_log_energy and self.raw_energy: 
        
                   # Compute the log energy of each frame 
        
                   x_strided = _get_strided_batch( 
        
                       x, self._length, self._shift, self.snip_edges 
        
                   ) 
        
                   log_energy = _get_log_energy(x_strided, self.energy_floor)  # size (m)

You can see from line 143, x is at least a 2-D tensor.

lhotse/lhotse/features/kaldi/layers.py

Lines 580 to 595 in 9b6e3e4

    
           def _get_strided_batch(waveform, window_length, window_shift, snip_edges): 
        
               r"""Given a waveform (1D tensor of size ``num_samples``), it returns a 2D tensor (m, ``window_size``) 
        
               representing how the window is shifted along the waveform. Each row is a frame. 
        
               Args: 
        
                   waveform (torch.Tensor): Tensor of size ``num_samples`` 
        
                   window_size (int): Frame length 
        
                   window_shift (int): Frame shift 
        
                   snip_edges (bool): If True, end effects will be handled by outputting only frames that completely fit 
        
                       in the file, and the number of frames depends on the frame_length.  If False, the number of frames 
        
                       depends only on the frame_shift, and we reflect the data at the ends. 
        
               Returns: 
        
                   torch.Tensor: 3D tensor of size (m, ``window_size``) where each row is a frame 
        
               """ 
        
               assert waveform.dim() == 2 
        
               batch_size = waveform.size(0) 
        
               num_samples = waveform.size(-1)

Line 594 and 595 give you the shape you wanted.

BTW, the documentation at line 581 is incorrect. x is not a 1-D tensor, but a 2-D tensor.

lhotse/lhotse/features/kaldi/layers.py

Line 581 in 9b6e3e4

    
           r"""Given a waveform (1D tensor of size ``num_samples``), it returns a 2D tensor (m, ``window_size``)

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/features/kaldi/layers.py
needs to improve the documentation, I believe.

5 replies

desh2608 Nov 6, 2021
Collaborator Author

Thanks @csukuangfj. So I guess multi-channel tensors are not handled at the moment?

csukuangfj Nov 6, 2021

How about viewing batch size dim as channel dim so that you can extract features for each channel independently?

desh2608 Nov 6, 2021
Collaborator Author

Thanks, that makes sense. A desirable addition for the future may be spatial features (e.g. IPD).

pzelasko Nov 8, 2021
Maintainer

Thanks for reporting this, I'll try to find some time to fix the docs.

Open to PRs with new features :)

pzelasko Nov 12, 2021
Maintainer

The documentation issues should be solved in #467

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using feature extraction layers #455

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

	def forward(self, x: torch.Tensor) -> torch.Tensor:
	# Add dither
	if self.dither != 0.0:
	n = torch.randn(x.shape, device=x.device)
	x = x + self.dither * n

	# remove offset
	if self.remove_dc_offset:
	mu = torch.mean(x, dim=1, keepdim=True)
	x = x - mu

	if self.r…

Using feature extraction layers #455

desh2608 Nov 5, 2021 Collaborator

Replies: 1 comment · 5 replies

csukuangfj Nov 6, 2021

desh2608 Nov 6, 2021 Collaborator Author

csukuangfj Nov 6, 2021

desh2608 Nov 6, 2021 Collaborator Author

pzelasko Nov 8, 2021 Maintainer

pzelasko Nov 12, 2021 Maintainer

desh2608
Nov 5, 2021
Collaborator

Replies: 1 comment 5 replies

csukuangfj
Nov 6, 2021

desh2608 Nov 6, 2021
Collaborator Author

desh2608 Nov 6, 2021
Collaborator Author

pzelasko Nov 8, 2021
Maintainer

pzelasko Nov 12, 2021
Maintainer