feat: phoneme alignment — tests + docs (rebased onto dev) by JarbasAl · Pull Request #125 · TigreGotico/phoonnx

JarbasAl · 2026-05-30T16:27:55Z

Replaces / supersedes #41 (fix/alignment).

Rebases the 3 phoneme-alignment commits from fix/alignment onto current dev
(which already has #124 merged), resolves the conflicts, then adds:

What's in this PR

From #41 (cherry-picked):

PhonemeAlignment dataclass (phoneme, num_samples)
AudioChunk extended with phonemes, phoneme_ids, phoneme_id_samples, phoneme_alignments
TTSVoice.synthesize(include_alignments=False) — optional alignment reconstruction
TTSVoice.phoneme_ids_to_audio(include_alignments=False) — returns (audio, samples) tuple when model has a second output
VoiceConfig.hop_length (default 256) — frame→sample conversion
phoonnx_train export-onnx --add-phoneme-alignment flag

New in this PR:

tests/test_alignment.py — 24 hermetic unit tests (no models, no network):
- PhonemeAlignment fields and equality
- AudioChunk new fields, optional defaults
- hop_length constant, VoiceConfig field default, from_dict parsing
- phoneme_ids_to_audio return shapes (ndarray vs tuple) and hop scaling
- synthesize integration: fields None when not requested, populated when supported
- Alignment reconstruction edge cases: length mismatch → None, empty text → no chunks
docs/alignment.md — usage guide covering include_alignments, AudioChunk fields, PhonemeAlignment, hop_length, export CLI, viseme and karaoke examples
docs/README.md — linked to new alignment page, added feature bullet

License note (from #41)

The alignment logic is ported from piper (OHF-Voice/piper1-gpl commit eb6be6b).
@synesthesiam confirmed in the piper1-gpl discussions that piper code not using
espeak-ng is covered by Apache 2 — this PR does not use espeak-ng code.

Summary by CodeRabbit

Release Notes

New Features
- Added phoneme-level alignment support to speech synthesis, enabling per-phoneme timing data extraction
- Added capability to export ONNX models with alignment output
Documentation
- Added comprehensive phoneme alignment documentation covering API usage, configuration, and practical use cases such as viseme scheduling and karaoke timing

24 hermetic unit tests covering PhonemeAlignment, AudioChunk new fields, phoneme_ids_to_audio return shapes, hop_length scaling, synthesize() integration, and alignment reconstruction edge cases (length mismatch, alignment failure, empty text). docs/alignment.md: usage guide for include_alignments, AudioChunk fields, PhonemeAlignment, hop_length config, export CLI flag, and viseme/karaoke use-case examples. Linked from docs/README.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-30T16:28:02Z

📝 Walkthrough

Walkthrough

This PR implements optional phoneme-level alignment for synthesized audio. It extends the Voice API with alignment metadata, introduces hop_length configuration, adds ONNX export tooling to expose alignment tensors, and includes comprehensive tests and user documentation.

Changes

Phoneme Alignment Feature

Layer / File(s)	Summary
Configuration foundation: hop_length support `phoonnx/config.py`	Added `DEFAULT_HOP_LENGTH` constant (256) and `hop_length: int` field to `VoiceConfig` with default value; `VoiceConfig.from_dict()` loads `hop_length` from model config or uses default.
Phoneme-to-ID conversion utilities `phoonnx/phoneme_ids.py`	New module provides `phonemes_to_ids()` function with default IPA-to-ID mappings, special tokens (pad, BOS, EOS, word separator), `BlankBetween` enum for configurable blank insertion, `load_phoneme_ids()` and `load_phoneme_map()` file loaders, plus reference test harness for validation.
Voice API alignment contracts and implementation `phoonnx/voice.py`	Added `PhonemeAlignment` dataclass (`phoneme: str`, `num_samples: int`); extended `AudioChunk` with phoneme metadata fields (`phonemes`, `phoneme_ids`, `phoneme_id_samples`, `phoneme_alignments`); `synthesize()` and `phoneme_ids_to_audio()` accept `include_alignments` flag and return per-phoneme sample durations; alignment reconstruction absorbs BOS/blank/EOS durations into adjacent phonemes.
ONNX export alignment tooling `phoonnx_train/export_onnx.py`	New `add_phoneme_alignment_output()` function modifies ONNX models to expose alignment tensor (autodetected from Ceil node outputs or explicit tensor name); CLI gains `--add-phoneme-alignment` flag to invoke post-export patching with error logging.
User documentation and API guides `docs/README.md`, `docs/alignment.md`	Updated README with phoneme alignment feature bullet; new alignment.md documents `synthesize(include_alignments=True)` API, `PhonemeAlignment` dataclass structure, `hop_length` role in frame-to-sample conversion, ONNX export procedures, and viseme/karaoke/subtitle timing use cases.
Comprehensive alignment tests (unit and integration) `tests/test_alignment.py`	Hermetic unit tests validate `PhonemeAlignment` dataclass semantics, `AudioChunk` field storage, `hop_length` defaults and parsing, `phoneme_ids_to_audio` return types with hop-length scaling; reconstruction tests verify alignment consistency (BOS/blank/EOS absorption, length matching, unknown token handling); optional integration tests download real voices, patch ONNX models, and verify end-to-end alignment invariants across both patched and unpatched models.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

TigreGotico/phoonnx#19: Both PRs extend the Click-based CLI export workflow in phoonnx_train/export_onnx.py; this PR adds alignment export capability while the related PR refactors the CLI structure.

Poem

🐰 Phonemes now sing in measured time,
Each syllable marked with samples fine,
From voice to frame to lip and screen,
The prettiest alignment ever seen!
Frame length defaults to two-five-six,
Alignment magic in the mix. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.10% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: phoneme alignment — tests + docs (rebased onto dev)' clearly summarizes the main changes: introducing phoneme alignment features with corresponding tests and documentation updates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/phoneme-alignment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-30T16:28:33Z

Ready for review! The automated tests have finished. ✅

I've aggregated the results of the automated checks for this PR below.

🏷️ Release Preview

Ensuring the 'Contributors' list is sorted and complete. 👥

Current: 1.3.5a1 → Next: 1.4.0a1

Signal	Value
Label	`feature`
PR title	`feat: phoneme alignment — tests + docs (rebased onto dev)`
Bump	minor

✅ PR title follows conventional commit format.

🚀 Release Channel Compatibility

Predicted next version: 1.4.0a1

Channel	Status	Note	Current Constraint
Stable	⚪	Not in channel	-
Testing	⚪	Not in channel	-
Alpha	⚪	Not in channel	-

🔒 Security (pip-audit)

I've scanned the dependencies for any hidden surprises. 🔍

✅ No known vulnerabilities found (61 packages scanned).

📊 Coverage

A forensic analysis of your test coverage. 🔍

❌ 31.5% total coverage

Files below 80% coverage (37 files)

File	Coverage	Missing lines
`phoonnx/cli.py`	0.0%	98
`phoonnx/phoneme_ids.py`	0.0%	149
`phoonnx/thirdparty/kog2p/__init__.py`	0.0%	203
`phoonnx/thirdparty/mantoq/unicode_symbol2label.py`	0.0%	1
`phoonnx/version.py`	0.0%	8
`phoonnx/thirdparty/bw2ipa.py`	7.5%	86
`phoonnx/thirdparty/mantoq/pyarabic/number.py`	7.7%	371
`phoonnx/thirdparty/mantoq/buck/phonetise_buckwalter.py`	10.4%	180
`phoonnx/thirdparty/hangul2ipa.py`	16.6%	372
`phoonnx/phonemizers/en.py`	17.5%	104
`phoonnx/thirdparty/mantoq/pyarabic/trans.py`	18.2%	135
`phoonnx/model_manager.py`	22.2%	168
`phoonnx/thirdparty/zh_num.py`	23.1%	83
`phoonnx/phonemizers/mul.py`	23.9%	236
`phoonnx/thirdparty/tashkeel/__init__.py`	23.9%	89
`phoonnx/phonemizers/zh.py`	27.0%	92
`phoonnx/phonemizers/ko.py`	30.4%	32
`phoonnx/phonemizers/gl.py`	31.1%	42
`phoonnx/phonemizers/ar.py`	31.2%	44
`phoonnx/tokenizer.py`	31.4%	212
`phoonnx/thirdparty/mantoq/buck/tokenization.py`	32.5%	27
`phoonnx/thirdparty/phonikud/__init__.py`	35.3%	11
`phoonnx/phonemizers/ja.py`	36.0%	32
`phoonnx/phonemizers/fa.py`	36.4%	14
`phoonnx/phonemizers/pt.py`	38.1%	13
`phoonnx/thirdparty/mantoq/pyarabic/normalize.py`	38.1%	13
`phoonnx/thirdparty/mantoq/pyarabic/araby.py`	39.7%	298
`phoonnx/phonemizers/he.py`	40.0%	12
`phoonnx/phonemizers/vi.py`	40.0%	12
`phoonnx/phonemizers/base.py`	40.8%	71
`phoonnx/thirdparty/mantoq/pyarabic/stack.py`	45.5%	6
`phoonnx/voice.py`	46.8%	167
`phoonnx/thirdparty/mantoq/num2words.py`	47.6%	11
`phoonnx/config.py`	50.0%	160
`phoonnx/phonemizers/mwl.py`	50.0%	8
`phoonnx/thirdparty/mantoq/__init__.py`	60.0%	10
`phoonnx/thirdparty/mantoq/pyarabic/arabrepr.py`	60.0%	6

Full report: download the coverage-report artifact.

🔍 Lint

I've finished my task! Here's the data you need. 📊

❌ ruff: issues found — see job log

⚖️ License Check

Auditing the legal lineage of this contribution. 📜

❌ License violations detected (43 packages) — review required before merging.

Dependency                          License Name                                            License Type         Misc                                    
phoonnx:1.3.3                       Error                                                   Error                                                        

License Type                        Found                                                  
Error                               1

License distribution: 14× MIT License, 7× Apache Software License, 5× MIT, 3× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 1× 3-Clause BSD License, 1× Apache Software License; BSD License, +8 more

Full breakdown — 43 packages

Package	Version	License	URL
`build`	1.5.0	MIT	link
`certifi`	2026.5.20	Mozilla Public License 2.0 (MPL 2.0)	link
`charset-normalizer`	3.4.7	MIT	link
`click`	8.4.1	BSD-3-Clause	link
`combo_lock`	0.3.1	Apache-2.0	link
`dateparser`	1.4.0	BSD License	link
`filelock`	3.29.0	MIT	link
`flatbuffers`	25.12.19	Apache Software License	link
`idna`	3.17	BSD-3-Clause	link
`json-database`	0.10.1	MIT	link
`kthread`	0.2.3	MIT License	link
`langcodes`	3.5.1	MIT License	link
`markdown-it-py`	4.2.0	MIT License	link
`mdurl`	0.1.2	MIT License	link
`memory-tempfile`	2.2.3	MIT License	link
`numpy`	2.4.6	BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0	link
`onnxruntime`	1.26.0	MIT License	link
`ovos-config`	2.1.1	Apache-2.0	link
`ovos-date-parser`	0.7.0a5	Apache Software License	link
`ovos-number-parser`	0.5.1	Apache Software License	link
`ovos-utils`	0.8.5	Apache-2.0	link
`packaging`	26.2	Apache-2.0 OR BSD-2-Clause	link
`pexpect`	4.9.0	ISC License (ISCL)	link
`phoonnx`	1.3.5a1	Apache Software License	link
`protobuf`	7.35.0	3-Clause BSD License	link
`ptyprocess`	0.7.0	ISC License (ISCL)	link
`pyee`	13.0.1	MIT License	link
`Pygments`	2.20.0	BSD-2-Clause	link
`pyproject_hooks`	1.2.0	MIT License	link
`python-dateutil`	2.9.0.post0	Apache Software License; BSD License	link
`pytz`	2026.2	MIT License	link
`PyYAML`	6.0.3	MIT License	link
`quebra-frases`	0.3.7	Apache Software License	link
`regex`	2026.5.9	Apache-2.0 AND CNRI-Python	link
`requests`	2.34.2	Apache Software License	link
`rich`	13.9.4	MIT License	link
`rich-click`	1.9.8	MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

🔨 Build Tests

I've finished the digital carpentry on this PR. 🔨

✅ All versions pass

Python	Build	Install	Tests
3.10	✅	✅	✅
3.11	✅	✅	✅
3.12	✅	✅	✅
3.13	✅	✅	✅
3.14	✅	✅	✅

Providing clarity through automated analysis 🔍

Three bugs fixed: 1. AttributeError crash: VoiceConfig has no phoneme_id_map attribute — the code referenced it but it does not exist on the dataclass. Fix: read blank_id/bos_id/eos_id and idx2char from the tokenizer's vocabulary, which is where the mapping actually lives. 2. get(phoneme, []) type error: missing phonemes returned [] (a list), not an int, so the id comparison always failed and alignments silently returned None for any phoneme not found in the map. Fix: the new walk uses idx2char (reverse map) so unknown ids become "?" strings instead of crashing or silently failing. 3. Incomplete blank handling: the old loop tried to re-derive the tokenizer's internal token sequence (bos/blank/eos insertion) manually, getting it wrong for any non-default blank_at_start/blank_at_end configuration. Fix: walk phoneme_ids + phoneme_id_samples directly in lockstep; fold blank/bos durations forward into the next real phoneme and eos backward into the last one. Works for all tokenizer configurations. Also removes the now-unused DEFAULT_PAD/BOS/EOS_TOKEN imports from voice.py. Tests: 10 new unit regression tests + 9 integration tests against a real eu-ES espeak model validating that total aligned samples == audio length. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

phoonnx/voice.py (1)
315-317: ⚡ Quick win

Type annotations claim int but tokenizer properties return Optional[int].

Lines 315-317 annotate _blank_id, _bos_id, and _eos_id as int, but the tokenizer vocabulary properties can return None when the corresponding token is not defined (see context snippets showing blank_id(), bos_id(), eos_id() return Optional[int]).

The comparison logic at lines 357, 359 handles None correctly (no match when the ID is None), but the type hints are misleading.
📝 Proposed fix
         _tok = self.config.tokenizer
         _vocab = _tok.vocabulary
         _idx2char: dict = _vocab.idx2char
-        _blank_id: int = _tok.blank_id
-        _bos_id: int = _vocab.bos_id
-        _eos_id: int = _vocab.eos_id
+        _blank_id: Optional[int] = _tok.blank_id
+        _bos_id: Optional[int] = _vocab.bos_id
+        _eos_id: Optional[int] = _vocab.eos_id
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phoonnx/voice.py` around lines 315 - 317, The type hints for _blank_id,
_bos_id, and _eos_id are incorrect: they are declared as int but _tok.blank_id,
_vocab.bos_id and _vocab.eos_id return Optional[int]; update the annotations to
Optional[int] (or None | int) for _blank_id, _bos_id, and _eos_id so the hints
match the actual values and existing comparison logic (in the region around the
checks that compare these IDs) remains correct.
phoonnx/phoneme_ids.py (1)
378-457: ⚡ Quick win

Consider moving the __main__ test harness to the test suite.

The __main__ block contains valuable validation logic that compares this implementation against external reference implementations (mimic3_phonemes2ids and piper_phonemes_to_ids). However, it relies on hardcoded file paths and external dependencies that may not be available in all environments.

Moving this to tests/test_phoneme_ids.py as optional integration tests (skipped when dependencies are missing) would improve maintainability and make the validation reproducible in CI.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phoonnx/phoneme_ids.py` around lines 378 - 457, The inline __main__ test
harness (starting at the module's bottom) uses hardcoded paths and external libs
(phoneme_ids_path, phoneme_map_path, load_phoneme_ids, EspeakPhonemizer,
phonemes_to_ids, mimic3_phonemes2ids, piper_phonemes_to_ids) and should be moved
to a new test file; extract the logic into tests/test_phoneme_ids.py as pytest
functions that mock or skip when dependencies/files are missing, replace
hardcoded paths with temporary fixtures or package resources, import and call
load_phoneme_ids/load_phoneme_map and the phonemizer inside tests (or mock
EspeakPhonemizer), and remove or reduce the __main__ block to a minimal example
or delete it so the module has no heavy side-effect code.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@phoonnx/phoneme_ids.py`:
- Line 242: The assignment to bos_id incorrectly uses eos_token; update the
expression so bos_id is derived from bos_token (i.e., bos_id = bos_token if
isinstance(bos_token, int) ...), and keep the existing fallback to look up
token2id[bos_token] when bos_token is a string; modify the line that sets bos_id
(and ensure it parallels the eos_id logic) so BOS and EOS IDs are computed from
their respective inputs.
- Around line 335-341: The parsing in load_phoneme_ids is brittle: change the
logic to split on any whitespace and handle single-token numeric lines as
ID-only (whitespace phoneme) without relying on a literal space; specifically,
use a whitespace split (e.g., line.rstrip().split(None, 1)) to get tokens, then
if len(tokens) == 1 and tokens[0].isdigit() set phoneme_id = int(tokens[0]) and
phoneme_str = " " (or the intended whitespace marker), else if len(tokens) == 2
parse phoneme_id = int(tokens[0]) and phoneme_str = tokens[1]; remove the
phoneme_str.isdigit() swap so numeric phoneme tokens are not incorrectly
flipped. Ensure conversions to int are guarded/validated and refer to
load_phoneme_ids, phoneme_id, and phoneme_str when making the fixes.

---

Nitpick comments:
In `@phoonnx/phoneme_ids.py`:
- Around line 378-457: The inline __main__ test harness (starting at the
module's bottom) uses hardcoded paths and external libs (phoneme_ids_path,
phoneme_map_path, load_phoneme_ids, EspeakPhonemizer, phonemes_to_ids,
mimic3_phonemes2ids, piper_phonemes_to_ids) and should be moved to a new test
file; extract the logic into tests/test_phoneme_ids.py as pytest functions that
mock or skip when dependencies/files are missing, replace hardcoded paths with
temporary fixtures or package resources, import and call
load_phoneme_ids/load_phoneme_map and the phonemizer inside tests (or mock
EspeakPhonemizer), and remove or reduce the __main__ block to a minimal example
or delete it so the module has no heavy side-effect code.

In `@phoonnx/voice.py`:
- Around line 315-317: The type hints for _blank_id, _bos_id, and _eos_id are
incorrect: they are declared as int but _tok.blank_id, _vocab.bos_id and
_vocab.eos_id return Optional[int]; update the annotations to Optional[int] (or
None | int) for _blank_id, _bos_id, and _eos_id so the hints match the actual
values and existing comparison logic (in the region around the checks that
compare these IDs) remains correct.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 712a8b41-f132-45e7-ad83-7f7eefad9aeb

📥 Commits

Reviewing files that changed from the base of the PR and between 0700a98 and 675017e.

📒 Files selected for processing (7)

docs/README.md
docs/alignment.md
phoonnx/config.py
phoonnx/phoneme_ids.py
phoonnx/voice.py
phoonnx_train/export_onnx.py
tests/test_alignment.py

coderabbitai · 2026-05-30T17:34:54Z

+    eos_id = eos_token if isinstance(eos_token, int) \
+        else id_map.get(eos_token, [len(id_map)]) if eos_token \
+        else [len(id_map)]
+    bos_id = eos_token if isinstance(bos_token, int) \


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical typo: bos_id assigned from eos_token instead of bos_token.

Line 242 reads bos_id = eos_token if isinstance(bos_token, int) ... but should use bos_token in the conditional expression. This causes BOS and EOS to have identical IDs when bos_token is passed as an integer or when looked up from the map.

🐛 Proposed fix

- bos_id = eos_token if isinstance(bos_token, int) \ - else id_map.get(bos_token, [len(id_map)]) if bos_token \ + bos_id = bos_token if isinstance(bos_token, int) \ + else id_map.get(bos_token, [len(id_map)]) if bos_token \ else [len(id_map)]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

bos_id = eos_token if isinstance(bos_token, int) \

bos_id = bos_token if isinstance(bos_token, int) \

else id_map.get(bos_token, [len(id_map)]) if bos_token \

else [len(id_map)]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phoonnx/phoneme_ids.py` at line 242, The assignment to bos_id incorrectly uses eos_token; update the expression so bos_id is derived from bos_token (i.e., bos_id = bos_token if isinstance(bos_token, int) ...), and keep the existing fallback to look up token2id[bos_token] when bos_token is a string; modify the line that sets bos_id (and ensure it parallels the eos_id logic) so BOS and EOS IDs are computed from their respective inputs.

coderabbitai · 2026-05-30T17:34:54Z

+        if line.strip().isdigit(): # phoneme is whitespace
+            phoneme_str = " "
+            phoneme_id = int(line)
+        else:
+            phoneme_id, phoneme_str = line.split(" ", maxsplit=1)
+            if phoneme_str.isdigit():
+                phoneme_id, phoneme_str = phoneme_str, phoneme_id


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Test load_phoneme_ids with various whitespace and format edge cases python3 << 'EOF' from io import StringIO from phoonnx.phoneme_ids import load_phoneme_ids # Test 1: whitespace phoneme (ID-only line) test1 = "3\n" result1 = load_phoneme_ids(StringIO(test1)) print("Test 1 (whitespace phoneme):", result1) # Test 2: reversed format test2 = "a 5\n" result2 = load_phoneme_ids(StringIO(test2)) print("Test 2 (normal format):", result2) # Test 3: what happens with "5 a" - should reverse? test3 = "5 a\n" result3 = load_phoneme_ids(StringIO(test3)) print("Test 3 (reversed format):", result3) # Test 4: ID with trailing space before newline test4 = "3 \n" result4 = load_phoneme_ids(StringIO(test4)) print("Test 4 (ID + trailing space):", result4) EOF

Repository: TigreGotico/phoonnx

Length of output: 202

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show the relevant section around the snippet in phoonnx/phoneme_ids.py sed -n '280,380p' phoonnx/phoneme_ids.py # Find where whitespace phoneme_str " " is handled/added or skipped rg -n --fixed-string 'phoneme_str' phoonnx/phoneme_ids.py rg -n --fixed-string '" "' phoonnx/phoneme_ids.py rg -n --fixed-string 'isdigit()' phoonnx/phoneme_ids.py # Show the full load_phoneme_ids function header + body region (best-effort) python3 - <<'PY' import inspect import phoonnx.phoneme_ids as m print(inspect.getsource(m.load_phoneme_ids)) PY

Repository: TigreGotico/phoonnx

Length of output: 3735

🏁 Script executed:

#!/bin/bash set -euo pipefail nl -ba phoonnx/phoneme_ids.py | sed -n '300,380p' python3 <<'PY' from io import StringIO from phoonnx.phoneme_ids import load_phoneme_ids def run(name, s): try: out = load_phoneme_ids(StringIO(s)) print(f"{name}: {out}") except Exception as e: print(f"{name}: EXC {type(e).__name__}: {e}") # whitespace-phoneme line variants run("T1: '3\\n'", "3\n") # expected to be skipped by (" " not in line) run("T2: '3 \\n'", "3 \n") # passes guard; should map ' ':3 run("T3: '3\\t\\n'", "3\t\n") # no literal space; should be skipped run("T4: '3 \\n'", "3 \n") # literal spaces present; should map ' ':3 # reversed/regular formats run("T5: 'a 5\\n'", "a 5\n") # should parse run("T6: '5 a\\n'", "5 a\n") # should parse # malformed / potentially crashing run("T7: '3 a \\n'", "3 a \n") # trailing spaces in phoneme run("T8: '3 `#x`\\n'", "3 `#x`\n") # int('3 `#x`') would crash if reached run("T9: 'a 5 extra\\n'", "a 5 extra\n") # phoneme_str='5 extra' => not digit, stored as key PY

Repository: TigreGotico/phoonnx

Length of output: 106

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '300,380p' phoonnx/phoneme_ids.py python3 <<'PY' from io import StringIO from phoonnx.phoneme_ids import load_phoneme_ids def run(name, s): try: out = load_phoneme_ids(StringIO(s)) print(f"{name}: {out}") except Exception as e: print(f"{name}: EXC {type(e).__name__}: {e}") # whitespace-phoneme line variants run("T1 '3\\n'", "3\n") # no literal space in line => skipped run("T2 '3 \\n'", "3 \n") # literal space => parses as phoneme ' ' run("T3 '3\\t\\n'", "3\t\n") # tab => no literal space => skipped run("T4 '3 \\n'", "3 \n") # literal spaces => phoneme key ' ' # reversed/regular formats run("T5 'a 5\\n'", "a 5\n") run("T6 '5 a\\n'", "5 a\n") # malformed / edge cases run("T7 '3 a \\n'", "3 a \n") # phoneme_str includes spaces => key may include trailing spaces run("T8 '3 `#x`\\n'", "3 `#x`\n") # currently would attempt int(...) if reached => likely crash run("T9 'a 5 extra\\n'", "a 5 extra\n") # phoneme_str not just digits => stored key '5 extra' PY

Repository: TigreGotico/phoonnx

Length of output: 2783

Fix whitespace-phoneme ID-only parsing (and numeric-token swapping) in load_phoneme_ids

The line.strip().isdigit() “phoneme is whitespace” branch can’t run for digit-only lines like "3\n" because earlier the loader skips any line where " " (literal space) is not present; as a result "3\n" produced {}, while "3 \n" produced {' ': 3} (tabs after the ID are also skipped).

The if phoneme_str.isdigit(): phoneme_id, phoneme_str = phoneme_str, phoneme_id swap can misparse numeric phoneme tokens: e.g. "5 3\n" would be treated as reversed and inverted, even though the spec format is ID PHONEME.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phoonnx/phoneme_ids.py` around lines 335 - 341, The parsing in load_phoneme_ids is brittle: change the logic to split on any whitespace and handle single-token numeric lines as ID-only (whitespace phoneme) without relying on a literal space; specifically, use a whitespace split (e.g., line.rstrip().split(None, 1)) to get tokens, then if len(tokens) == 1 and tokens[0].isdigit() set phoneme_id = int(tokens[0]) and phoneme_str = " " (or the intended whitespace marker), else if len(tokens) == 2 parse phoneme_id = int(tokens[0]) and phoneme_str = tokens[1]; remove the phoneme_str.isdigit() swap so numeric phoneme tokens are not incorrectly flipped. Ensure conversions to int are guarded/validated and refer to load_phoneme_ids, phoneme_id, and phoneme_str when making the fixes.

JarbasAl and others added 4 commits May 30, 2026 17:22

feat: phoneme alignment

5e2d038

tokens from config if available

56e763c

comment

eb87b7e

github-actions Bot added feature and removed feature labels May 30, 2026

JarbasAl marked this pull request as ready for review May 30, 2026 17:27

github-actions Bot added feature and removed feature labels May 30, 2026

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

JarbasAl mentioned this pull request Jun 2, 2026

Phoneme alignment support #135

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: phoneme alignment — tests + docs (rebased onto dev)#125

feat: phoneme alignment — tests + docs (rebased onto dev)#125
JarbasAl wants to merge 5 commits into
devfrom
feat/phoneme-alignment

JarbasAl commented May 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 30, 2026

Uh oh!

coderabbitai Bot May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JarbasAl commented May 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in this PR

License note (from #41)

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ready for review! The automated tests have finished. ✅

🏷️ Release Preview

🔒 Security (pip-audit)

📊 Coverage

🔍 Lint

⚖️ License Check

🔨 Build Tests

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JarbasAl commented May 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading

github-actions Bot commented May 30, 2026 •

edited

Loading