Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/release_workflow.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
name: Release Alpha and Propose Stable

on:
workflow_dispatch:
pull_request:
types: [closed]
branches: [dev]

jobs:
publish_alpha:
if: github.event.pull_request.merged == true
uses: TigreGotico/gh-automations/.github/workflows/publish-alpha.yml@master
secrets: inherit
with:
Expand Down
34 changes: 31 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,40 @@
# Changelog

## [0.4.1a1](https://github.com/OpenVoiceOS/ovos-dinkum-listener/tree/0.4.1a1) (2025-06-08)
## [0.4.2a5](https://github.com/OpenVoiceOS/ovos-dinkum-listener/tree/0.4.2a5) (2025-11-04)

[Full Changelog](https://github.com/OpenVoiceOS/ovos-dinkum-listener/compare/0.4.0...0.4.1a1)
[Full Changelog](https://github.com/OpenVoiceOS/ovos-dinkum-listener/compare/0.4.2a4...0.4.2a5)

**Merged pull requests:**

- fix: opm 1.X.X compat [\#177](https://github.com/OpenVoiceOS/ovos-dinkum-listener/pull/177) ([JarbasAl](https://github.com/JarbasAl))
- Default to onnx [\#193](https://github.com/OpenVoiceOS/ovos-dinkum-listener/pull/193) ([JarbasAl](https://github.com/JarbasAl))

## [0.4.2a4](https://github.com/OpenVoiceOS/ovos-dinkum-listener/tree/0.4.2a4) (2025-11-04)

[Full Changelog](https://github.com/OpenVoiceOS/ovos-dinkum-listener/compare/0.4.2a3...0.4.2a4)

**Merged pull requests:**

- Add pre-wakeword VAD state for reduced false activations [\#189](https://github.com/OpenVoiceOS/ovos-dinkum-listener/pull/189) ([JarbasAl](https://github.com/JarbasAl))

## [0.4.2a3](https://github.com/OpenVoiceOS/ovos-dinkum-listener/tree/0.4.2a3) (2025-06-18)

[Full Changelog](https://github.com/OpenVoiceOS/ovos-dinkum-listener/compare/0.4.2a2...0.4.2a3)

**Merged pull requests:**

- Update pytest requirement from ~=7.1 to ~=8.1 in /requirements [\#95](https://github.com/OpenVoiceOS/ovos-dinkum-listener/pull/95) ([dependabot[bot]](https://github.com/apps/dependabot))

## [0.4.2a2](https://github.com/OpenVoiceOS/ovos-dinkum-listener/tree/0.4.2a2) (2025-06-18)

[Full Changelog](https://github.com/OpenVoiceOS/ovos-dinkum-listener/compare/0.4.2a1...0.4.2a2)

## [0.4.2a1](https://github.com/OpenVoiceOS/ovos-dinkum-listener/tree/0.4.2a1) (2025-06-12)

[Full Changelog](https://github.com/OpenVoiceOS/ovos-dinkum-listener/compare/0.4.1...0.4.2a1)

**Merged pull requests:**

- fix: ensure minimum extra plugins version [\#179](https://github.com/OpenVoiceOS/ovos-dinkum-listener/pull/179) ([JarbasAl](https://github.com/JarbasAl))



Expand Down
18 changes: 6 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,10 @@ Documentation can be found in [the technical manual](https://openvoiceos.github.

## Install

`pip install ovos-dinkum-listener[extras]` to install this package and the default
plugins. Note that by default, either `tensorflow` or `tflite_runtime` will need
to be installed separately for wakeword detection.
`pip install ovos-dinkum-listener[extras]` to install this package and the default plugins.

> If unable to install tflite_runtime in your platform, you can find wheels
> here https://whl.smartgic.io/. eg, for pyhon 3.11 in x86
> `pip install https://whl.smartgic.io/tflite_runtime-2.13.0-cp311-cp311-linux_x86_64.whl`
Without `extras` you will also need to manually install, and possibly configure STT, WW, and VAD modules as described below.

Without `extras`, wakeword and STT audio upload will be disabled unless you install
[`ovos-backend-client`](https://github.com/OpenVoiceOS/ovos-backend-client) separately. You will also need to manually install,
and possibly configure STT, WW, and VAD modules as described below.

Using [ovos-vad-plugin-silero](https://github.com/OpenVoiceOS/ovos-vad-plugin-silero)
is strongly recommended

## Configuration

Expand All @@ -42,6 +32,10 @@ non exhaustive list of config options
"microphone": {
"module": "ovos-microphone-plugin-alsa"
},
// If enabled will only check for wakeword if VAD also detected speech
// this should reduce false activations
"vad_pre_wake_enabled": true,
// Voice Activity Detection is used to determine when users are speaking
VAD": {
// recommended plugin: "ovos-vad-plugin-silero"
"module": "ovos-vad-plugin-silero",
Expand Down
6 changes: 3 additions & 3 deletions ovos_dinkum_listener/version.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# START_VERSION_BLOCK
VERSION_MAJOR = 0
VERSION_MINOR = 4
VERSION_BUILD = 1
VERSION_ALPHA = 0
VERSION_MINOR = 5
VERSION_BUILD = 0
VERSION_ALPHA = 2
# END_VERSION_BLOCK
94 changes: 79 additions & 15 deletions ovos_dinkum_listener/voice_loop/voice_loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,23 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import audioop

import time
from collections import deque
from dataclasses import dataclass, field
from enum import Enum
from threading import Event
from typing import Callable, Deque, Optional

import audioop
from ovos_config import Configuration
from ovos_plugin_manager.stt import StreamingSTT
from ovos_plugin_manager.templates.microphone import Microphone
from ovos_plugin_manager.vad import VADEngine
from ovos_utils.log import LOG
from ovos_bus_client.session import SessionManager
from ovos_dinkum_listener.transformers import AudioTransformersService
from ovos_dinkum_listener.voice_loop.hotwords import HotwordContainer, HotwordState, HotWordException
from ovos_plugin_manager.templates.microphone import Microphone

from ovos_dinkum_listener.plugins import FakeStreamingSTT
from ovos_dinkum_listener.transformers import AudioTransformersService
from ovos_dinkum_listener.voice_loop.hotwords import HotwordContainer, HotwordState, HotWordException


class ListeningState(str, Enum):
Expand All @@ -44,6 +43,7 @@ class ListeningState(str, Enum):
BEFORE_COMMAND = "before_cmd"
IN_COMMAND = "in_cmd"
AFTER_COMMAND = "after_cmd"
PRE_WAKE_VAD = "pre_wake_vad"


class ListeningMode(str, Enum):
Expand Down Expand Up @@ -144,11 +144,18 @@ class DinkumVoiceLoop(VoiceLoop):
is_muted: bool = False
_is_running: bool = False
_chunk_info: ChunkInfo = field(default_factory=ChunkInfo)
_vad_window_start: float = 0.0

# config flag for VAD-before-wakeword feature
vad_pre_wake_enabled: bool = field(default_factory=lambda: Configuration().get("listener", {}).get("vad_pre_wake_enabled", False))

@property
def running(self) -> bool:
"""
Return true while the loop is running
Indicates whether the voice loop is currently running.

Returns:
`true` if the loop is running, `false` otherwise.
"""
return self._is_running is True

Expand All @@ -159,13 +166,12 @@ def reset_speech_timer(self):

def start(self):
"""
Start the Voice Loop; sets the listening mode based on configuration and
prepares the loop to be run.
Initialize and start the voice loop using configured listening mode.

Sets the internal running flag, selects ListeningMode from configuration (continuous, hybrid, or wakeword), sets the initial ListeningState to PRE_WAKE_VAD when vad_pre_wake_enabled is true otherwise DETECT_WAKEWORD, and resets the last wake-word timestamp.
"""

self._is_running = True
self.state = ListeningState.DETECT_WAKEWORD
self.last_ww = -1
listener_config = Configuration().get("listener", {})
if listener_config.get("continuous_listen", False):
self.listen_mode = ListeningMode.CONTINUOUS
Expand All @@ -174,19 +180,65 @@ def start(self):
else:
self.listen_mode = ListeningMode.WAKEWORD

# choose initial state based on config
if self.vad_pre_wake_enabled:
self.state = ListeningState.PRE_WAKE_VAD
else:
self.state = ListeningState.DETECT_WAKEWORD

self.last_ww = -1
LOG.info(f"Listening mode: {self.listen_mode}")
LOG.info(f"VAD pre-wake enabled: {self.vad_pre_wake_enabled}")
LOG.debug(f"STATE: {self.state}")

def _pre_wake_vad(self, chunk: bytes):
"""
Monitor an audio chunk with VAD and transition to wake-word detection when speech is detected.

Sets self._chunk_info.is_speech according to the VAD result. If speech is detected, sets the loop state to ListeningState.DETECT_WAKEWORD and records the current time in self._vad_window_start. If no speech is detected, forwards the chunk to the audio transformers. On VAD errors, logs the error and treats the chunk as non-speech.

Parameters:
chunk (bytes): Raw audio bytes for VAD analysis.
"""
self.hotword_chunks.append(chunk) # we still keep chunks for wake word detection
try:
self._chunk_info.is_speech = not self.vad.is_silence(chunk)
except Exception as e:
LOG.error(f"VAD error in pre-wake: {e}")
self._chunk_info.is_speech = False
if self._chunk_info.is_speech:
LOG.debug("Speech detected - switching to wake word detection")
self.state = ListeningState.DETECT_WAKEWORD
self._vad_window_start = time.time()

rewind_chunks = []
while self.hotword_chunks:
rewind_chunks.append(self.hotword_chunks.popleft())

n_to_rewind = 5 # TODO from config
for chunk in rewind_chunks[-n_to_rewind:]:
# feed some pre-VAD detection audio to hotwords
# VAD is usually a bit too late
self.hotwords.update(chunk)
else:
self.transformers.feed_audio(chunk)

def run(self):
"""
Run the VoiceLoop so long as `self._is_running` is True
Run the voice loop state machine, processing incoming audio chunks until the loop is stopped.

This method reads audio chunks from the microphone and advances the listening finite-state machine (pre-wake VAD, wakeword/hotword detection, waiting/recording/command handling, confirmation and teardown). It feeds audio to transformers and STT as appropriate, updates timers and per-chunk metadata, and invokes configured callbacks (chunk_callback, wake_callback, stt/audio/text callbacks, etc.). The loop continues while self._is_running and exits when the loop is stopped or the microphone read returns no audio.
"""
# Voice command state
self.speech_seconds_left = self.speech_seconds
self.silence_seconds_left = self.silence_seconds
self.timeout_seconds_left = self.timeout_seconds
self.timeout_seconds_with_silence_left = self.timeout_seconds_with_silence
self.state = ListeningState.DETECT_WAKEWORD
self.timeout_seconds_with_silence_left = self.timeout_seconds_with_silence

if self.vad_pre_wake_enabled:
self.state = ListeningState.PRE_WAKE_VAD
else:
self.state = ListeningState.DETECT_WAKEWORD

# Keep hotword/STT audio so they can (optionally) be saved to disk
self.hotword_chunks = deque(maxlen=self.num_hotword_keep_chunks)
Expand Down Expand Up @@ -226,7 +278,12 @@ def run(self):
# AFTER_COMMAND -> DETECT_HOTWORD
#

if self.state == ListeningState.DETECT_WAKEWORD:
if self.state == ListeningState.PRE_WAKE_VAD:
self._pre_wake_vad(chunk) # might change state to ListeningState.DETECT_WAKEWORD

elif self.state == ListeningState.DETECT_WAKEWORD:
if self.vad_pre_wake_enabled and not self._vad_window_start:
self._vad_window_start = time.time()
try:
if self.listen_mode == ListeningMode.CONTINUOUS:
LOG.info(f"Continuous listening mode, updating state")
Expand All @@ -238,6 +295,11 @@ def run(self):
LOG.info("Hotword detected")
else:
self.transformers.feed_audio(chunk)
# handle timeout to return to VAD stage
if self.vad_pre_wake_enabled and time.time() - self._vad_window_start > 5:
LOG.debug("Wakeword not found within 5s - returning to PRE_WAKE_VAD")
self.state = ListeningState.PRE_WAKE_VAD
self._vad_window_start = 0
except HotWordException as e:
if self.hotwords.reload_on_failure:
LOG.warning(e)
Expand Down Expand Up @@ -808,6 +870,8 @@ def _after_cmd(self, chunk: bytes):
if self.listen_mode == ListeningMode.CONTINUOUS or \
self.listen_mode == ListeningMode.HYBRID:
self.state = ListeningState.WAITING_CMD
elif self.vad_pre_wake_enabled:
self.state = ListeningState.PRE_WAKE_VAD
else:
self.state = ListeningState.DETECT_WAKEWORD
LOG.debug(f"STATE: {self.state}")
Expand Down
10 changes: 4 additions & 6 deletions requirements/extras.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
# STT plugins
ovos-stt-plugin-server>=0.1.2,<1.0.0
ovos-stt-plugin-chromium>=0.0.1,<1.0.0

# VAD plugins
ovos-vad-plugin-noise>=0.1.2,<1.0.0
ovos-vad-plugin-silero>=0.0.5,<1.0.0

# Microphone plugins
ovos-microphone-plugin-sounddevice>=0.0.1,<1.0.0

# Wake Word plugins
# note: tflite_runtime also need to be installed
ovos-ww-plugin-precise-lite>=0.1,<1.0.0
ovos-ww-plugin-vosk>=0.1,<1.0.0
# WakeWord plugins
ovos-ww-plugin-vosk>=0.1.7,<1.0.0
ovos_ww_plugin_precise_onnx>=0.0.1,<1.0.0
7 changes: 0 additions & 7 deletions requirements/onnx.txt

This file was deleted.

8 changes: 4 additions & 4 deletions requirements/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
ovos-plugin-manager>=1.0.2,<2.0.0
ovos-utils>=0.0.38,<1.0.0
ovos-config>=0.4.3,<2.0.0
ovos_bus_client>=0.0.10,<2.0.0
ovos-plugin-manager>=1.0.2,<3.0.0
ovos-utils>=0.8.1,<1.0.0
ovos-config>=1.2.2,<3.0.0
ovos_bus_client>=1.3.4,<2.0.0
SpeechRecognition~=3.9
2 changes: 1 addition & 1 deletion requirements/tests.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
pytest~=7.1
pytest~=8.4
ovos-vad-plugin-webrtcvad
xdoctest~=1.1.5
pytest-cov>=3.0.0
Expand Down
1 change: 0 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ def get_description():
extras_require={
"extras": required("requirements/extras.txt"),
"linux": required("requirements/linux.txt"),
"onnx": required("requirements/onnx.txt"),
"tests": required("requirements/tests.txt"),
},
classifiers=[
Expand Down