You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi I have a question regarding the training of DDSP. As this model has a harmonic + noise architecture I would assume this could work on bird vocalisation datasets. I have collected a small(around 1400 samples) of the same bird species sounds each 4 seconds in length and I've listened to most of them so there is no weird noise in the datasets. However after I put them in the DDSP model as in the Colab DDSP Demo train_autoencoders, the generated sounds are not really good. The spectral loss and total loss quickly goes below 5 in my case and I've trained them for 10000steps. Below is an example of the generated:
I have tried changing modifying the conditioning parameters in the Colab demo but it did not help. I'm not sure if I'm doing anything wrong with the training or it's the problem with the dataset. Could anyone offer any suggestions?
The text was updated successfully, but these errors were encountered:
It seems like the CREPE model might be having trouble with the pitch detection. You probably want to turn off the automatic adjustments as it's not detecting any "notes" because the f0_confidence is so low. That will stop it from pitch shifting down which it is currently doing.
It also looks like it might be only outputting noise and the harmonic amplitude is very low (you can check this). If that's the case it likely is because during training it is not detecting f0 correctly so it's learning to just not use the harmonic synthesizer which is an important part of bird sounds.
Hi Jesse, thank you for your message. Do you mean turn off the automatic adjustments in the training? I took a look at the gin files but couldn't find my automatic adjustments settings there. From what I could see the f1 confidence is zero when there is silence in between bird chirps, while during bird chirps the confidence quickly rises up, from 0.2 to 1, which makes sense right?
My guess is that the onset of the bird chirps are different in the datasets so although the CREPE model is detecting some pitches but they occur in different time positions. So this irregular time occurance and duration might have tricked the autoencoder so it just tends to put more emphasis on the noise synth part instead. I took a listen of the NSynth dataset and it seems all the instrument starts at the beginning and ends in the end, so it causes no trouble to the autoencoder. That's my instinct about it although I'm not sure if this would be the case in training.
Hi I have a question regarding the training of DDSP. As this model has a harmonic + noise architecture I would assume this could work on bird vocalisation datasets. I have collected a small(around 1400 samples) of the same bird species sounds each 4 seconds in length and I've listened to most of them so there is no weird noise in the datasets. However after I put them in the DDSP model as in the Colab DDSP Demo train_autoencoders, the generated sounds are not really good. The spectral loss and total loss quickly goes below 5 in my case and I've trained them for 10000steps. Below is an example of the generated:


I have tried changing modifying the conditioning parameters in the Colab demo but it did not help. I'm not sure if I'm doing anything wrong with the training or it's the problem with the dataset. Could anyone offer any suggestions?
The text was updated successfully, but these errors were encountered: