Based on the example code and API in:
https://huggingface.co/nari-labs/Dia2-2B#programmatic-usage
I had run a couple of inferences. I noticed that the prefix speaker's id did not always match what's expected. For example, if I ran
result = dia.generate(
"[S1] Hey, what's up? [S2] Not too much.",
config=config,
prefix_speaker_1="example_prefix1.wav",
prefix_speaker_2="example_prefix2.wav",
output_wav="./dia2.wav",
verbose=True,
)
It turned out that the speaker 1 ([S1] in text scripts) is the female one, which didn't match the specified prefix_speaker_1="example_prefix1.wav". example_prefix1.wav contains the male voice.
Any know causes to this issue? Or, any investigation and fix to this issue?
Thanks!
Based on the example code and API in:
https://huggingface.co/nari-labs/Dia2-2B#programmatic-usage
I had run a couple of inferences. I noticed that the prefix speaker's id did not always match what's expected. For example, if I ran
It turned out that the speaker 1 ([S1] in text scripts) is the female one, which didn't match the specified
prefix_speaker_1="example_prefix1.wav". example_prefix1.wav contains the male voice.Any know causes to this issue? Or, any investigation and fix to this issue?
Thanks!