Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce start session algorithm #138

Merged
merged 7 commits into from
Feb 18, 2025
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 30 additions & 8 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ This does not preclude adding support for this as a future API enhancement, and
User consent can include, for example:
<ul>
<li>User click on a visible speech input element which has an obvious graphical representation showing that it will start speech input.</li>
<li>Accepting a permission prompt shown as the result of a call to <code>SpeechRecognition.start</code>.</li>
<li>Accepting a permission prompt shown as the result of a call to <a method for=SpeechRecognition>start()</a>.</li>
<li>Consent previously granted to always allow speech input for this web page.</li>
</ul>
</li>
Expand Down Expand Up @@ -148,6 +148,14 @@ This does not preclude adding support for this as a future API enhancement, and
The term "final result" indicates a SpeechRecognitionResult in which the final attribute is true.
The term "interim result" indicates a SpeechRecognitionResult in which the final attribute is false.

{{SpeechRecognition}} has the following internal slots:

<dl dfn-type=attribute dfn-for="SpeechRecognition">
: <dfn>[[started]]</dfn>
::
A boolean flag representing wether the speech recognition started. The initial value is <code>false</code>.
</dl>

<pre class="idl">
[Exposed=Window]
interface SpeechRecognition : EventTarget {
Expand Down Expand Up @@ -307,15 +315,19 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072

<dl>
<dt><dfn method for=SpeechRecognition>start()</dfn> method</dt>
<dd>When the start method is called it represents the moment in time the web application wishes to begin recognition.
When the speech input is streaming live through the input media stream, then this start call represents the moment in time that the service must begin to listen and try to match the grammars associated with this request.
Once the system is successfully listening to the recognition the user agent must raise a start event.
If the start method is called on an already started object (that is, start has previously been called, and no <a event for=SpeechRecognition>error</a> or <a event for=SpeechRecognition>end</a> event has fired on the object), the user agent must throw an "{{InvalidStateError!!exception}}" {{DOMException}} and ignore the call.</dd>
<dd>
1. Let <var>requestMicrophonePermission</var> to <code>true</code>.
1. Run the <a>start session algorithm</a> with <var>requestMicrophonePermission</var>.
</dd>

<dt><dfn method for=SpeechRecognition>start({{MediaStreamTrack}} audioTrack)</dfn> method</dt>
<dd>The overloaded start method does the same thing as the parameterless start method except it performs speech recognition on provided {{MediaStreamTrack}} instead of the input media stream.
If the {{MediaStreamTrack/kind}} attribute of the {{MediaStreamTrack}} is not "audio" or the {{MediaStreamTrack/readyState}} attribute is not "live", the user agent must throw an "{{InvalidStateError!!exception}}" {{DOMException}} and ignore the call.
Unlike the parameterless start method, the user agent does not check whether [=this=]'s [=relevant global object=]'s [=associated Document=] is [=allowed to use=] the [=policy-controlled feature=] named "<code>microphone</code>".</dd>
<dd>
1. Let <var>audioTrack</var> be the first argument.
1. If <var>audioTrack</var>'s {{MediaStreamTrack/kind}} attribute is NOT <code>"audio"</code>, throw an {{InvalidStateError}} and abort these steps.
1. If <var>audioTrack</var>'s {{MediaStreamTrack/readyState}} attribute is NOT <code>"live"</code>, throw an {{InvalidStateError}} and abort these steps.
1. Let <var>requestMicrophonePermission</var> be <code>false</code>.
1. Run the <a>start session algorithm</a> with <var>requestMicrophonePermission</var>.
</dd>

<dt><dfn method for=SpeechRecognition>stop()</dfn> method</dt>
<dd>The stop method represents an instruction to the recognition service to stop listening to more audio, and to try and return a result using just the audio that it has already received for this recognition.
Expand All @@ -339,6 +351,16 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072

</dl>

<p>When the <dfn>start session algorithm</dfn> with <var>requestMicrophonePermission</var> is invoked, the user agent MUST run the following steps:

1. If the [=current settings object=]'s [=relevant global object=]'s [=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}} and abort these steps.
1. If {{[[started]]}} is <code>true</code> and no <a event for=SpeechRecognition>error</a> or <a event for=SpeechRecognition>end</a> event has fired, throw an {{InvalidStateError}} and abort these steps.
1. Set {{[[started]]}} to <code>true</code>.
1. If <var>requestMicrophonePermission</var> is <code>true</code> and [=request permission to use=] "<code>microphone</code>" is [=permission/"denied"=], abort these steps.
1. Once the system is successfully listening to the recognition, [=fire an event=] named <a event for=SpeechRecognition>start</a> at [=this=].

</p>

<h4 id="speechreco-events">SpeechRecognition Events</h4>

<p>The DOM Level 2 Event Model is used for speech recognition events.
Expand Down