Speech synthesis and machine learning #31

r12a · 2023-03-15T10:38:41Z

[@dontcallmedom suggested that i should raise this here. It was originally a comment at https://github.com/w3c/strategy/issues/367#issuecomment-1431300909]

The description of areas in scope in the charter appears a little lop-sided to me, since Speech Recognition is called out but not Speech Synthesis. Specifically, just as you need to infer text from wave input in speech recognition, you need to infer meaning and pronunciation guides ('text understanding') as a prep for speech synthesis, and i wonder why machine learning is not being applied to that and included in the scope of the charter (ie. for the linguistic analysis of the text, rather than what drives the audio hardware, and could provide input to other TTS specs at the W3C - such as SSML, CSS, etc). If that doesn't fall within the scope of the work, I think the charter should probably indicate briefly why it is not addressing the use of machine learning for that function, whereas it does make inferences for speech recognition.

anssiko · 2023-03-15T11:33:14Z

Thanks for your review and comment! Speech synthesis with deep learning is another interesting and developing area, spearheaded by WaveNet. Generation of raw audio waveforms has many applications.

The list in the Motivation and Background section was not meant to be all inclusive, but mention a few use cases that are widely understood to set the context. The intent is certainly not to discourage the use of the WebNN API for other use cases it may be applicable to.

Given the charter is under AC review I leave it to @dontcallmedom to tell whether such informative tweaks are appropriate at this stage, or whether we should keep this text as is for now.

dontcallmedom · 2023-04-05T08:39:30Z

I agree that it's a bit late to consider this addition to the charter at this stage of the process. With that said:

we could leave the issue open to see integration in a possible next revision
@r12a I think may be best to file a use case issue on the WebNN repo where a more detailed discussion of whether the current operations in WebNN match what's typically used in Speech Synthesis models

dontcallmedom added the deferred label Apr 5, 2023

anssiko mentioned this issue Jan 9, 2025

Add Speech Synthesis to Motivation and Background #42

Merged

anssiko removed the deferred label Jan 9, 2025

dontcallmedom closed this as completed in #42 Jan 9, 2025

dontcallmedom closed this as completed in c476764 Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech synthesis and machine learning #31

Speech synthesis and machine learning #31

r12a commented Mar 15, 2023

anssiko commented Mar 15, 2023

dontcallmedom commented Apr 5, 2023

Speech synthesis and machine learning #31

Speech synthesis and machine learning #31

Comments

r12a commented Mar 15, 2023

anssiko commented Mar 15, 2023

dontcallmedom commented Apr 5, 2023