WebAudio · hoch · Feb 18, 2025 · Feb 5, 2025 · Feb 6, 2025 · Feb 6, 2025
diff --git a/index.bs b/index.bs
@@ -103,7 +103,7 @@ This does not preclude adding support for this as a future API enhancement, and
   User consent can include, for example:
   <ul>
     <li>User click on a visible speech input element which has an obvious graphical representation showing that it will start speech input.</li>
-    <li>Accepting a permission prompt shown as the result of a call to <code>SpeechRecognition.start</code>.</li>
+    <li>Accepting a permission prompt shown as the result of a call to <a method for=SpeechRecognition>start()</a>.</li>
     <li>Consent previously granted to always allow speech input for this web page.</li>
   </ul>
   </li>
@@ -148,6 +148,14 @@ This does not preclude adding support for this as a future API enhancement, and
 The term "final result" indicates a SpeechRecognitionResult in which the final attribute is true.
 The term "interim result" indicates a SpeechRecognitionResult in which the final attribute is false.
 
+{{SpeechRecognition}} has the following internal slots:
+
+<dl dfn-type=attribute dfn-for="SpeechRecognition">
+    : <dfn>[[started]]</dfn>
+    ::
+        A boolean flag representing wether the speech recognition started. The initial value is <code>false</code>.
+</dl>
+
 <pre class="idl">
 [Exposed=Window]
 interface SpeechRecognition : EventTarget {
@@ -307,15 +315,19 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072
 
 <dl>
   <dt><dfn method for=SpeechRecognition>start()</dfn> method</dt>
-  <dd>When the start method is called it represents the moment in time the web application wishes to begin recognition.
-  When the speech input is streaming live through the input media stream, then this start call represents the moment in time that the service must begin to listen and try to match the grammars associated with this request.
-  Once the system is successfully listening to the recognition the user agent must raise a start event.
-  If the start method is called on an already started object (that is, start has previously been called, and no <a event for=SpeechRecognition>error</a> or <a event for=SpeechRecognition>end</a> event has fired on the object), the user agent must throw an "{{InvalidStateError!!exception}}" {{DOMException}} and ignore the call.</dd>
+  <dd>
+    1. Let <var>requestMicrophonePermission</var> to <code>true</code>.
+    1. Run the <a>start session algorithm</a> with <var>requestMicrophonePermission</var>.
+  </dd>
 
   <dt><dfn method for=SpeechRecognition>start({{MediaStreamTrack}} audioTrack)</dfn> method</dt>
-  <dd>The overloaded start method does the same thing as the parameterless start method except it performs speech recognition on provided {{MediaStreamTrack}} instead of the input media stream.
-  If the {{MediaStreamTrack/kind}} attribute of the {{MediaStreamTrack}} is not "audio" or the {{MediaStreamTrack/readyState}} attribute is not "live", the user agent must throw an "{{InvalidStateError!!exception}}" {{DOMException}} and ignore the call.
-  Unlike the parameterless start method, the user agent does not check whether [=this=]'s [=relevant global object=]'s [=associated Document=] is [=allowed to use=] the [=policy-controlled feature=] named "<code>microphone</code>".</dd>
+  <dd>
+    1. Let <var>audioTrack</var> be the first argument.
+    1. If <var>audioTrack</var>'s {{MediaStreamTrack/kind}} attribute is NOT <code>"audio"</code>, throw an {{InvalidStateError}} and abort these steps.
+    1. If <var>audioTrack</var>'s {{MediaStreamTrack/readyState}} attribute is NOT <code>"live"</code>, throw an {{InvalidStateError}} and abort these steps.
+    1. Let <var>requestMicrophonePermission</var> be <code>false</code>.
+    1. Run the <a>start session algorithm</a> with <var>requestMicrophonePermission</var>.
+  </dd>
 
   <dt><dfn method for=SpeechRecognition>stop()</dfn> method</dt>
   <dd>The stop method represents an instruction to the recognition service to stop listening to more audio, and to try and return a result using just the audio that it has already received for this recognition.
@@ -339,6 +351,16 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072
 
 </dl>
 
+<p>When the <dfn>start session algorithm</dfn> with <var>requestMicrophonePermission</var> is invoked, the user agent MUST run the following steps:
+
+1. If the [=current settings object=]'s [=relevant global object=]'s [=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}} and abort these steps.
+1. If {{[[started]]}} is <code>true</code> and no <a event for=SpeechRecognition>error</a> or <a event for=SpeechRecognition>end</a> event has fired, throw an {{InvalidStateError}} and abort these steps.
+1. Set {{[[started]]}} to <code>true</code>.
+1. If <var>requestMicrophonePermission</var> is <code>true</code> and [=request permission to use=] "<code>microphone</code>" is [=permission/"denied"=], abort these steps.
+1. Once the system is successfully listening to the recognition, [=fire an event=] named <a event for=SpeechRecognition>start</a> at [=this=].
+
+</p>
+
 <h4 id="speechreco-events">SpeechRecognition Events</h4>
 
 <p>The DOM Level 2 Event Model is used for speech recognition events.