Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about training dataset #453

Open
Reinliu opened this issue May 23, 2022 · 2 comments
Open

Question about training dataset #453

Reinliu opened this issue May 23, 2022 · 2 comments

Comments

@Reinliu
Copy link

Reinliu commented May 23, 2022

Hi I have a question regarding the training of DDSP. As this model has a harmonic + noise architecture I would assume this could work on bird vocalisation datasets. I have collected a small(around 1400 samples) of the same bird species sounds each 4 seconds in length and I've listened to most of them so there is no weird noise in the datasets. However after I put them in the DDSP model as in the Colab DDSP Demo train_autoencoders, the generated sounds are not really good. The spectral loss and total loss quickly goes below 5 in my case and I've trained them for 10000steps. Below is an example of the generated:
WeChat Screenshot_20220523155104
WeChat Screenshot_20220523155113
I have tried changing modifying the conditioning parameters in the Colab demo but it did not help. I'm not sure if I'm doing anything wrong with the training or it's the problem with the dataset. Could anyone offer any suggestions?

@jesseengel
Copy link
Contributor

It seems like the CREPE model might be having trouble with the pitch detection. You probably want to turn off the automatic adjustments as it's not detecting any "notes" because the f0_confidence is so low. That will stop it from pitch shifting down which it is currently doing.

It also looks like it might be only outputting noise and the harmonic amplitude is very low (you can check this). If that's the case it likely is because during training it is not detecting f0 correctly so it's learning to just not use the harmonic synthesizer which is an important part of bird sounds.

@Reinliu
Copy link
Author

Reinliu commented May 28, 2022

Hi Jesse, thank you for your message. Do you mean turn off the automatic adjustments in the training? I took a look at the gin files but couldn't find my automatic adjustments settings there. From what I could see the f1 confidence is zero when there is silence in between bird chirps, while during bird chirps the confidence quickly rises up, from 0.2 to 1, which makes sense right?

My guess is that the onset of the bird chirps are different in the datasets so although the CREPE model is detecting some pitches but they occur in different time positions. So this irregular time occurance and duration might have tricked the autoencoder so it just tends to put more emphasis on the noise synth part instead. I took a listen of the NSynth dataset and it seems all the instrument starts at the beginning and ends in the end, so it causes no trouble to the autoencoder. That's my instinct about it although I'm not sure if this would be the case in training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants