Skip to content

Speechjoey master #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 182 commits into
base: speechjoey
Choose a base branch
from

Conversation

B-Czarnetzki
Copy link

Well the problem with this merge is that speechjoey is a good bit behind the up-to-date joeynmt since i developed it early last year mainly before the multi-gpu changes. I am not sure how easy or, with the current structure, possible a merge can be done, since i am not familiar with the multi gpu implementation. Can we force the branch to be the current speechjoey master and merge with up-to-date joeynmt down the line? Or @may- do you think a merge could be easily done?

…ev, test) using a new AudioDataset class (MonoAudioDataset is also available), logging audio info
@may-
Copy link

may- commented Jan 11, 2021

@B-Czarnetzki Ah ok, could you tell me on which commit id you want to go? I'll roll back to that point.

@may-
Copy link

may- commented Jan 11, 2021

@B-Czarnetzki, I rolled back to the commit point of Sariya's master branch. Now I see no conflicts in your pull request. I'll go through your extension and give you feedback this week.

@B-Czarnetzki
Copy link
Author

@may- Ok thanks i would have said rollback to 48baf21, but i think it should be the same.

@may-
Copy link

may- commented Jan 15, 2021

@B-Czarnetzki thank you again for your enormous effort. I'm still reading your code, but I'd like to share my comments so far, so that you can start fixing them.

  • speechjoey/data.py

    • [fix] where have you imported librosa? ([suggestion] better to replace with torchaudio, I personally think.)
    • [fix] an instance in self.examples doesn't have "audio" field, does it? what object do you refer in def getaudio()? and where this function will be called??
    • [fix] you shouldn't construct src_vocab and src_field in load_audio_data() func.
    • [question] why do you need "src" (audio_dummy) and "conv" (conv_dummy) fields?
    • [suggestion] better to get rid of the sklearn dependency...
    • [suggestion] for a big audio dataset, loading all features at once would easily trigger a memory error. How about looking up the file (np.load()) everytime the iterator constructs a batch?
    • [suggestion] how about moving the scaling part to scripts/audio_preprocessing? that is, joeynmt expects the saved numpy data is already scaled. (you know, joeynmt intentionally exclude the tokenization for text. so we could respect that philosophy of minimalism in audio processing, too.)
  • speechjoey/model.py

    • [fix] what is self.src_emb? You don't use Embeddings for audio input, do you? You shouldn't construct src_embed at all in def build_model()
  • speechjoey/encoders.py

    • [fix] typo line 225 NotImmplementedError
    • [suggestion] How about separating the Conv layers from RNN and make Conv also configurable? Currently you hard-coded the conv layers.
  • speechjoey/decoders.py

    • could you please add comments why you need self.rnn2 layer in ConditionalRecurrentDecoder? ([suggestion] you could integrate it to the RecurrentDecoder. ConditionalRecurrentDecoder is almost a duplication of it, as far as I understand.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants