You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the following use case: I would like train urhythmic to a target speakers voice and do any-to-one voice conversion that ensures the target timbre, but I would like to variably change the rhythm, pitch, speed, intonation, etc between predictions based on the intonation of a separate source audio. Is this possible?
I have successfully trained the vocoder which sounds really good, and have actually tried inferencing with varying rhythm-fine models but can't really hear it affecting things as much as the rhythm of the source clip. In your sample, the outputs feel like they have a rhythm more true to the target voices (though it's unclear whether you trained on just that sample somehow, or the speaker in general).
Any tips or insights would be appreciated :)
Currently one approach I'm taking is to retrain the rhythm model at inference time for both the source and the target speaker and have it overfit to that single sample.
I also want to commend how clean this codebase is. This is how all OSS ML repos should be.
The text was updated successfully, but these errors were encountered:
I have the following use case: I would like train urhythmic to a target speakers voice and do any-to-one voice conversion that ensures the target timbre, but I would like to variably change the rhythm, pitch, speed, intonation, etc between predictions based on the intonation of a separate source audio. Is this possible?
I have successfully trained the vocoder which sounds really good, and have actually tried inferencing with varying rhythm-fine models but can't really hear it affecting things as much as the rhythm of the source clip. In your sample, the outputs feel like they have a rhythm more true to the target voices (though it's unclear whether you trained on just that sample somehow, or the speaker in general).
Any tips or insights would be appreciated :)
Currently one approach I'm taking is to retrain the rhythm model at inference time for both the source and the target speaker and have it overfit to that single sample.
I also want to commend how clean this codebase is. This is how all OSS ML repos should be.
The text was updated successfully, but these errors were encountered: