-
Notifications
You must be signed in to change notification settings - Fork 316
Description
Hi there,
I am working on a building a new dataset in Spanish (polysyllabic language). I have gone though MakeDiffSinger but I still have some gaps. I would be grateful if you could sanity check me on my understanding and share any thoughts you might have
Questions for clarifications:
-
ph_seq: These are sequences of phonemes or syllables?
Currently I using phonemes and their timestamps as provided by MFA. I am using a pre-trained Spanish model available by MFA. Would you recommend training a new one on my specific data? -
note_dur: The midi notes should be estimated over phonemes, syllables, or words?
Now I estimated one note for each phoneme and assumed ph_dur==note_dure -
ph_num: The number of phonemes in each word or in each syllable?
Now I assumed the number of phonemes in each word -
note_seq: Do you think SOME would suffice to get a first shot at this ? I would speculate yes?
-
is_slur: how would you define slur in this context? I have not found plenty of resources on this topic
Now I assumed no slurs at all -
SPs and APs: Would you recommend doing that manually or using the enhance script might be OK for a first shot?
Thanks!