You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for opensource the excellent work!
I was trying to train an AR(GPT) model on dac and xcodec with libritts, in my exp, the top10 acc for the first token of dac and xcodec is very close, around 50%~60%, is that reasonable?
Intuitively,the acc of xcodec should be higher.
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our work and for conducting experiments with it!
Based on your results, the accuracy rates for the first token of dac and xcodec being close to 50% to 60% seem reasonable. Because we integrate both acoustic and semantic information to enhance model performance. From previous experiences, during the autoregressive (AR) stage of the training model Valle, we observed top-10 accuracy typically ranging from 60% to 70%.
Thanks for opensource the excellent work!
I was trying to train an AR(GPT) model on dac and xcodec with libritts, in my exp, the top10 acc for the first token of dac and xcodec is very close, around 50%~60%, is that reasonable?
Intuitively,the acc of xcodec should be higher.
The text was updated successfully, but these errors were encountered: