Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech synthesis and machine learning #31

Closed
r12a opened this issue Mar 15, 2023 · 2 comments · Fixed by #42
Closed

Speech synthesis and machine learning #31

r12a opened this issue Mar 15, 2023 · 2 comments · Fixed by #42

Comments

@r12a
Copy link

r12a commented Mar 15, 2023

[@dontcallmedom suggested that i should raise this here. It was originally a comment at https://github.com/w3c/strategy/issues/367#issuecomment-1431300909]

The description of areas in scope in the charter appears a little lop-sided to me, since Speech Recognition is called out but not Speech Synthesis. Specifically, just as you need to infer text from wave input in speech recognition, you need to infer meaning and pronunciation guides ('text understanding') as a prep for speech synthesis, and i wonder why machine learning is not being applied to that and included in the scope of the charter (ie. for the linguistic analysis of the text, rather than what drives the audio hardware, and could provide input to other TTS specs at the W3C - such as SSML, CSS, etc). If that doesn't fall within the scope of the work, I think the charter should probably indicate briefly why it is not addressing the use of machine learning for that function, whereas it does make inferences for speech recognition.

@anssiko
Copy link
Member

anssiko commented Mar 15, 2023

Thanks for your review and comment! Speech synthesis with deep learning is another interesting and developing area, spearheaded by WaveNet. Generation of raw audio waveforms has many applications.

The list in the Motivation and Background section was not meant to be all inclusive, but mention a few use cases that are widely understood to set the context. The intent is certainly not to discourage the use of the WebNN API for other use cases it may be applicable to.

Given the charter is under AC review I leave it to @dontcallmedom to tell whether such informative tweaks are appropriate at this stage, or whether we should keep this text as is for now.

@dontcallmedom
Copy link
Member

I agree that it's a bit late to consider this addition to the charter at this stage of the process. With that said:

  • we could leave the issue open to see integration in a possible next revision
  • @r12a I think may be best to file a use case issue on the WebNN repo where a more detailed discussion of whether the current operations in WebNN match what's typically used in Speech Synthesis models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants