Skip to content

feat: phoneme alignment — tests + docs (rebased onto dev)#125

Open
JarbasAl wants to merge 5 commits into
devfrom
feat/phoneme-alignment
Open

feat: phoneme alignment — tests + docs (rebased onto dev)#125
JarbasAl wants to merge 5 commits into
devfrom
feat/phoneme-alignment

Conversation

@JarbasAl

@JarbasAl JarbasAl commented May 30, 2026

Copy link
Copy Markdown
Contributor

Replaces / supersedes #41 (fix/alignment).

Rebases the 3 phoneme-alignment commits from fix/alignment onto current dev
(which already has #124 merged), resolves the conflicts, then adds:

What's in this PR

From #41 (cherry-picked):

  • PhonemeAlignment dataclass (phoneme, num_samples)
  • AudioChunk extended with phonemes, phoneme_ids, phoneme_id_samples, phoneme_alignments
  • TTSVoice.synthesize(include_alignments=False) — optional alignment reconstruction
  • TTSVoice.phoneme_ids_to_audio(include_alignments=False) — returns (audio, samples) tuple when model has a second output
  • VoiceConfig.hop_length (default 256) — frame→sample conversion
  • phoonnx_train export-onnx --add-phoneme-alignment flag

New in this PR:

  • tests/test_alignment.py — 24 hermetic unit tests (no models, no network):
    • PhonemeAlignment fields and equality
    • AudioChunk new fields, optional defaults
    • hop_length constant, VoiceConfig field default, from_dict parsing
    • phoneme_ids_to_audio return shapes (ndarray vs tuple) and hop scaling
    • synthesize integration: fields None when not requested, populated when supported
    • Alignment reconstruction edge cases: length mismatch → None, empty text → no chunks
  • docs/alignment.md — usage guide covering include_alignments, AudioChunk fields, PhonemeAlignment, hop_length, export CLI, viseme and karaoke examples
  • docs/README.md — linked to new alignment page, added feature bullet

License note (from #41)

The alignment logic is ported from piper (OHF-Voice/piper1-gpl commit eb6be6b).
@synesthesiam confirmed in the piper1-gpl discussions that piper code not using
espeak-ng is covered by Apache 2 — this PR does not use espeak-ng code.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added phoneme-level alignment support to speech synthesis, enabling per-phoneme timing data extraction
    • Added capability to export ONNX models with alignment output
  • Documentation

    • Added comprehensive phoneme alignment documentation covering API usage, configuration, and practical use cases such as viseme scheduling and karaoke timing

Review Change Stack

JarbasAl and others added 4 commits May 30, 2026 17:22
24 hermetic unit tests covering PhonemeAlignment, AudioChunk new fields,
phoneme_ids_to_audio return shapes, hop_length scaling, synthesize()
integration, and alignment reconstruction edge cases (length mismatch,
alignment failure, empty text).

docs/alignment.md: usage guide for include_alignments, AudioChunk fields,
PhonemeAlignment, hop_length config, export CLI flag, and viseme/karaoke
use-case examples. Linked from docs/README.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR implements optional phoneme-level alignment for synthesized audio. It extends the Voice API with alignment metadata, introduces hop_length configuration, adds ONNX export tooling to expose alignment tensors, and includes comprehensive tests and user documentation.

Changes

Phoneme Alignment Feature

Layer / File(s) Summary
Configuration foundation: hop_length support
phoonnx/config.py
Added DEFAULT_HOP_LENGTH constant (256) and hop_length: int field to VoiceConfig with default value; VoiceConfig.from_dict() loads hop_length from model config or uses default.
Phoneme-to-ID conversion utilities
phoonnx/phoneme_ids.py
New module provides phonemes_to_ids() function with default IPA-to-ID mappings, special tokens (pad, BOS, EOS, word separator), BlankBetween enum for configurable blank insertion, load_phoneme_ids() and load_phoneme_map() file loaders, plus reference test harness for validation.
Voice API alignment contracts and implementation
phoonnx/voice.py
Added PhonemeAlignment dataclass (phoneme: str, num_samples: int); extended AudioChunk with phoneme metadata fields (phonemes, phoneme_ids, phoneme_id_samples, phoneme_alignments); synthesize() and phoneme_ids_to_audio() accept include_alignments flag and return per-phoneme sample durations; alignment reconstruction absorbs BOS/blank/EOS durations into adjacent phonemes.
ONNX export alignment tooling
phoonnx_train/export_onnx.py
New add_phoneme_alignment_output() function modifies ONNX models to expose alignment tensor (autodetected from Ceil node outputs or explicit tensor name); CLI gains --add-phoneme-alignment flag to invoke post-export patching with error logging.
User documentation and API guides
docs/README.md, docs/alignment.md
Updated README with phoneme alignment feature bullet; new alignment.md documents synthesize(include_alignments=True) API, PhonemeAlignment dataclass structure, hop_length role in frame-to-sample conversion, ONNX export procedures, and viseme/karaoke/subtitle timing use cases.
Comprehensive alignment tests (unit and integration)
tests/test_alignment.py
Hermetic unit tests validate PhonemeAlignment dataclass semantics, AudioChunk field storage, hop_length defaults and parsing, phoneme_ids_to_audio return types with hop-length scaling; reconstruction tests verify alignment consistency (BOS/blank/EOS absorption, length matching, unknown token handling); optional integration tests download real voices, patch ONNX models, and verify end-to-end alignment invariants across both patched and unpatched models.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • TigreGotico/phoonnx#19: Both PRs extend the Click-based CLI export workflow in phoonnx_train/export_onnx.py; this PR adds alignment export capability while the related PR refactors the CLI structure.

Poem

🐰 Phonemes now sing in measured time,
Each syllable marked with samples fine,
From voice to frame to lip and screen,
The prettiest alignment ever seen!
Frame length defaults to two-five-six,
Alignment magic in the mix. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: phoneme alignment — tests + docs (rebased onto dev)' clearly summarizes the main changes: introducing phoneme alignment features with corresponding tests and documentation updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/phoneme-alignment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added feature and removed feature labels May 30, 2026
@github-actions

github-actions Bot commented May 30, 2026

Copy link
Copy Markdown

Ready for review! The automated tests have finished. ✅

I've aggregated the results of the automated checks for this PR below.

🏷️ Release Preview

Ensuring the 'Contributors' list is sorted and complete. 👥

Current: 1.3.5a1Next: 1.4.0a1

Signal Value
Label feature
PR title feat: phoneme alignment — tests + docs (rebased onto dev)
Bump minor

✅ PR title follows conventional commit format.


🚀 Release Channel Compatibility

Predicted next version: 1.4.0a1

Channel Status Note Current Constraint
Stable Not in channel -
Testing Not in channel -
Alpha Not in channel -

🔒 Security (pip-audit)

I've scanned the dependencies for any hidden surprises. 🔍

✅ No known vulnerabilities found (61 packages scanned).

📊 Coverage

A forensic analysis of your test coverage. 🔍

31.5% total coverage

Files below 80% coverage (37 files)
File Coverage Missing lines
phoonnx/cli.py 0.0% 98
phoonnx/phoneme_ids.py 0.0% 149
phoonnx/thirdparty/kog2p/__init__.py 0.0% 203
phoonnx/thirdparty/mantoq/unicode_symbol2label.py 0.0% 1
phoonnx/version.py 0.0% 8
phoonnx/thirdparty/bw2ipa.py 7.5% 86
phoonnx/thirdparty/mantoq/pyarabic/number.py 7.7% 371
phoonnx/thirdparty/mantoq/buck/phonetise_buckwalter.py 10.4% 180
phoonnx/thirdparty/hangul2ipa.py 16.6% 372
phoonnx/phonemizers/en.py 17.5% 104
phoonnx/thirdparty/mantoq/pyarabic/trans.py 18.2% 135
phoonnx/model_manager.py 22.2% 168
phoonnx/thirdparty/zh_num.py 23.1% 83
phoonnx/phonemizers/mul.py 23.9% 236
phoonnx/thirdparty/tashkeel/__init__.py 23.9% 89
phoonnx/phonemizers/zh.py 27.0% 92
phoonnx/phonemizers/ko.py 30.4% 32
phoonnx/phonemizers/gl.py 31.1% 42
phoonnx/phonemizers/ar.py 31.2% 44
phoonnx/tokenizer.py 31.4% 212
phoonnx/thirdparty/mantoq/buck/tokenization.py 32.5% 27
phoonnx/thirdparty/phonikud/__init__.py 35.3% 11
phoonnx/phonemizers/ja.py 36.0% 32
phoonnx/phonemizers/fa.py 36.4% 14
phoonnx/phonemizers/pt.py 38.1% 13
phoonnx/thirdparty/mantoq/pyarabic/normalize.py 38.1% 13
phoonnx/thirdparty/mantoq/pyarabic/araby.py 39.7% 298
phoonnx/phonemizers/he.py 40.0% 12
phoonnx/phonemizers/vi.py 40.0% 12
phoonnx/phonemizers/base.py 40.8% 71
phoonnx/thirdparty/mantoq/pyarabic/stack.py 45.5% 6
phoonnx/voice.py 46.8% 167
phoonnx/thirdparty/mantoq/num2words.py 47.6% 11
phoonnx/config.py 50.0% 160
phoonnx/phonemizers/mwl.py 50.0% 8
phoonnx/thirdparty/mantoq/__init__.py 60.0% 10
phoonnx/thirdparty/mantoq/pyarabic/arabrepr.py 60.0% 6

Full report: download the coverage-report artifact.

🔍 Lint

I've finished my task! Here's the data you need. 📊

ruff: issues found — see job log

⚖️ License Check

Auditing the legal lineage of this contribution. 📜

❌ License violations detected (43 packages) — review required before merging.

Dependency                          License Name                                            License Type         Misc                                    
phoonnx:1.3.3                       Error                                                   Error                                                        

License Type                        Found                                                  
Error                               1

License distribution: 14× MIT License, 7× Apache Software License, 5× MIT, 3× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 1× 3-Clause BSD License, 1× Apache Software License; BSD License, +8 more

Full breakdown — 43 packages
Package Version License URL
build 1.5.0 MIT link
certifi 2026.5.20 Mozilla Public License 2.0 (MPL 2.0) link
charset-normalizer 3.4.7 MIT link
click 8.4.1 BSD-3-Clause link
combo_lock 0.3.1 Apache-2.0 link
dateparser 1.4.0 BSD License link
filelock 3.29.0 MIT link
flatbuffers 25.12.19 Apache Software License link
idna 3.17 BSD-3-Clause link
json-database 0.10.1 MIT link
kthread 0.2.3 MIT License link
langcodes 3.5.1 MIT License link
markdown-it-py 4.2.0 MIT License link
mdurl 0.1.2 MIT License link
memory-tempfile 2.2.3 MIT License link
numpy 2.4.6 BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0 link
onnxruntime 1.26.0 MIT License link
ovos-config 2.1.1 Apache-2.0 link
ovos-date-parser 0.7.0a5 Apache Software License link
ovos-number-parser 0.5.1 Apache Software License link
ovos-utils 0.8.5 Apache-2.0 link
packaging 26.2 Apache-2.0 OR BSD-2-Clause link
pexpect 4.9.0 ISC License (ISCL) link
phoonnx 1.3.5a1 Apache Software License link
protobuf 7.35.0 3-Clause BSD License link
ptyprocess 0.7.0 ISC License (ISCL) link
pyee 13.0.1 MIT License link
Pygments 2.20.0 BSD-2-Clause link
pyproject_hooks 1.2.0 MIT License link
python-dateutil 2.9.0.post0 Apache Software License; BSD License link
pytz 2026.2 MIT License link
PyYAML 6.0.3 MIT License link
quebra-frases 0.3.7 Apache Software License link
regex 2026.5.9 Apache-2.0 AND CNRI-Python link
requests 2.34.2 Apache Software License link
rich 13.9.4 MIT License link
rich-click 1.9.8 MIT License

Copyright (c) 2022 Phil Ewels

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
| link |
| six | 1.17.0 | MIT License | link |
| typing_extensions | 4.15.0 | PSF-2.0 | link |
| tzlocal | 5.3.1 | MIT License | link |
| unicode-rbnf | 2.4.0 | MIT License | |
| urllib3 | 2.7.0 | MIT | link |
| watchdog | 6.0.0 | Apache Software License | link |

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

🔨 Build Tests

I've finished the digital carpentry on this PR. 🔨

✅ All versions pass

Python Build Install Tests
3.10
3.11
3.12
3.13
3.14

Providing clarity through automated analysis 🔍

Three bugs fixed:

1. AttributeError crash: VoiceConfig has no phoneme_id_map attribute —
   the code referenced it but it does not exist on the dataclass.
   Fix: read blank_id/bos_id/eos_id and idx2char from the tokenizer's
   vocabulary, which is where the mapping actually lives.

2. get(phoneme, []) type error: missing phonemes returned [] (a list),
   not an int, so the id comparison always failed and alignments silently
   returned None for any phoneme not found in the map.
   Fix: the new walk uses idx2char (reverse map) so unknown ids become
   "?" strings instead of crashing or silently failing.

3. Incomplete blank handling: the old loop tried to re-derive the tokenizer's
   internal token sequence (bos/blank/eos insertion) manually, getting it
   wrong for any non-default blank_at_start/blank_at_end configuration.
   Fix: walk phoneme_ids + phoneme_id_samples directly in lockstep; fold
   blank/bos durations forward into the next real phoneme and eos backward
   into the last one. Works for all tokenizer configurations.

Also removes the now-unused DEFAULT_PAD/BOS/EOS_TOKEN imports from voice.py.

Tests: 10 new unit regression tests + 9 integration tests against a real
eu-ES espeak model validating that total aligned samples == audio length.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JarbasAl JarbasAl marked this pull request as ready for review May 30, 2026 17:27
@github-actions github-actions Bot added feature and removed feature labels May 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
phoonnx/voice.py (1)

315-317: ⚡ Quick win

Type annotations claim int but tokenizer properties return Optional[int].

Lines 315-317 annotate _blank_id, _bos_id, and _eos_id as int, but the tokenizer vocabulary properties can return None when the corresponding token is not defined (see context snippets showing blank_id(), bos_id(), eos_id() return Optional[int]).

The comparison logic at lines 357, 359 handles None correctly (no match when the ID is None), but the type hints are misleading.

📝 Proposed fix
         _tok = self.config.tokenizer
         _vocab = _tok.vocabulary
         _idx2char: dict = _vocab.idx2char
-        _blank_id: int = _tok.blank_id
-        _bos_id: int = _vocab.bos_id
-        _eos_id: int = _vocab.eos_id
+        _blank_id: Optional[int] = _tok.blank_id
+        _bos_id: Optional[int] = _vocab.bos_id
+        _eos_id: Optional[int] = _vocab.eos_id
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phoonnx/voice.py` around lines 315 - 317, The type hints for _blank_id,
_bos_id, and _eos_id are incorrect: they are declared as int but _tok.blank_id,
_vocab.bos_id and _vocab.eos_id return Optional[int]; update the annotations to
Optional[int] (or None | int) for _blank_id, _bos_id, and _eos_id so the hints
match the actual values and existing comparison logic (in the region around the
checks that compare these IDs) remains correct.
phoonnx/phoneme_ids.py (1)

378-457: ⚡ Quick win

Consider moving the __main__ test harness to the test suite.

The __main__ block contains valuable validation logic that compares this implementation against external reference implementations (mimic3_phonemes2ids and piper_phonemes_to_ids). However, it relies on hardcoded file paths and external dependencies that may not be available in all environments.

Moving this to tests/test_phoneme_ids.py as optional integration tests (skipped when dependencies are missing) would improve maintainability and make the validation reproducible in CI.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phoonnx/phoneme_ids.py` around lines 378 - 457, The inline __main__ test
harness (starting at the module's bottom) uses hardcoded paths and external libs
(phoneme_ids_path, phoneme_map_path, load_phoneme_ids, EspeakPhonemizer,
phonemes_to_ids, mimic3_phonemes2ids, piper_phonemes_to_ids) and should be moved
to a new test file; extract the logic into tests/test_phoneme_ids.py as pytest
functions that mock or skip when dependencies/files are missing, replace
hardcoded paths with temporary fixtures or package resources, import and call
load_phoneme_ids/load_phoneme_map and the phonemizer inside tests (or mock
EspeakPhonemizer), and remove or reduce the __main__ block to a minimal example
or delete it so the module has no heavy side-effect code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@phoonnx/phoneme_ids.py`:
- Line 242: The assignment to bos_id incorrectly uses eos_token; update the
expression so bos_id is derived from bos_token (i.e., bos_id = bos_token if
isinstance(bos_token, int) ...), and keep the existing fallback to look up
token2id[bos_token] when bos_token is a string; modify the line that sets bos_id
(and ensure it parallels the eos_id logic) so BOS and EOS IDs are computed from
their respective inputs.
- Around line 335-341: The parsing in load_phoneme_ids is brittle: change the
logic to split on any whitespace and handle single-token numeric lines as
ID-only (whitespace phoneme) without relying on a literal space; specifically,
use a whitespace split (e.g., line.rstrip().split(None, 1)) to get tokens, then
if len(tokens) == 1 and tokens[0].isdigit() set phoneme_id = int(tokens[0]) and
phoneme_str = " " (or the intended whitespace marker), else if len(tokens) == 2
parse phoneme_id = int(tokens[0]) and phoneme_str = tokens[1]; remove the
phoneme_str.isdigit() swap so numeric phoneme tokens are not incorrectly
flipped. Ensure conversions to int are guarded/validated and refer to
load_phoneme_ids, phoneme_id, and phoneme_str when making the fixes.

---

Nitpick comments:
In `@phoonnx/phoneme_ids.py`:
- Around line 378-457: The inline __main__ test harness (starting at the
module's bottom) uses hardcoded paths and external libs (phoneme_ids_path,
phoneme_map_path, load_phoneme_ids, EspeakPhonemizer, phonemes_to_ids,
mimic3_phonemes2ids, piper_phonemes_to_ids) and should be moved to a new test
file; extract the logic into tests/test_phoneme_ids.py as pytest functions that
mock or skip when dependencies/files are missing, replace hardcoded paths with
temporary fixtures or package resources, import and call
load_phoneme_ids/load_phoneme_map and the phonemizer inside tests (or mock
EspeakPhonemizer), and remove or reduce the __main__ block to a minimal example
or delete it so the module has no heavy side-effect code.

In `@phoonnx/voice.py`:
- Around line 315-317: The type hints for _blank_id, _bos_id, and _eos_id are
incorrect: they are declared as int but _tok.blank_id, _vocab.bos_id and
_vocab.eos_id return Optional[int]; update the annotations to Optional[int] (or
None | int) for _blank_id, _bos_id, and _eos_id so the hints match the actual
values and existing comparison logic (in the region around the checks that
compare these IDs) remains correct.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 712a8b41-f132-45e7-ad83-7f7eefad9aeb

📥 Commits

Reviewing files that changed from the base of the PR and between 0700a98 and 675017e.

📒 Files selected for processing (7)
  • docs/README.md
  • docs/alignment.md
  • phoonnx/config.py
  • phoonnx/phoneme_ids.py
  • phoonnx/voice.py
  • phoonnx_train/export_onnx.py
  • tests/test_alignment.py

Comment thread phoonnx/phoneme_ids.py
eos_id = eos_token if isinstance(eos_token, int) \
else id_map.get(eos_token, [len(id_map)]) if eos_token \
else [len(id_map)]
bos_id = eos_token if isinstance(bos_token, int) \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical typo: bos_id assigned from eos_token instead of bos_token.

Line 242 reads bos_id = eos_token if isinstance(bos_token, int) ... but should use bos_token in the conditional expression. This causes BOS and EOS to have identical IDs when bos_token is passed as an integer or when looked up from the map.

🐛 Proposed fix
-    bos_id = eos_token if isinstance(bos_token, int) \
-        else id_map.get(bos_token, [len(id_map)]) if bos_token \
+    bos_id = bos_token if isinstance(bos_token, int) \
+        else id_map.get(bos_token, [len(id_map)]) if bos_token \
         else [len(id_map)]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bos_id = eos_token if isinstance(bos_token, int) \
bos_id = bos_token if isinstance(bos_token, int) \
else id_map.get(bos_token, [len(id_map)]) if bos_token \
else [len(id_map)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phoonnx/phoneme_ids.py` at line 242, The assignment to bos_id incorrectly
uses eos_token; update the expression so bos_id is derived from bos_token (i.e.,
bos_id = bos_token if isinstance(bos_token, int) ...), and keep the existing
fallback to look up token2id[bos_token] when bos_token is a string; modify the
line that sets bos_id (and ensure it parallels the eos_id logic) so BOS and EOS
IDs are computed from their respective inputs.

Comment thread phoonnx/phoneme_ids.py
Comment on lines +335 to +341
if line.strip().isdigit(): # phoneme is whitespace
phoneme_str = " "
phoneme_id = int(line)
else:
phoneme_id, phoneme_str = line.split(" ", maxsplit=1)
if phoneme_str.isdigit():
phoneme_id, phoneme_str = phoneme_str, phoneme_id

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Test load_phoneme_ids with various whitespace and format edge cases

python3 << 'EOF'
from io import StringIO
from phoonnx.phoneme_ids import load_phoneme_ids

# Test 1: whitespace phoneme (ID-only line)
test1 = "3\n"
result1 = load_phoneme_ids(StringIO(test1))
print("Test 1 (whitespace phoneme):", result1)

# Test 2: reversed format
test2 = "a 5\n"
result2 = load_phoneme_ids(StringIO(test2))
print("Test 2 (normal format):", result2)

# Test 3: what happens with "5 a" - should reverse?
test3 = "5 a\n"
result3 = load_phoneme_ids(StringIO(test3))
print("Test 3 (reversed format):", result3)

# Test 4: ID with trailing space before newline
test4 = "3 \n"
result4 = load_phoneme_ids(StringIO(test4))
print("Test 4 (ID + trailing space):", result4)
EOF

Repository: TigreGotico/phoonnx

Length of output: 202


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the relevant section around the snippet in phoonnx/phoneme_ids.py
sed -n '280,380p' phoonnx/phoneme_ids.py

# Find where whitespace phoneme_str " " is handled/added or skipped
rg -n --fixed-string 'phoneme_str' phoonnx/phoneme_ids.py
rg -n --fixed-string '" "' phoonnx/phoneme_ids.py
rg -n --fixed-string 'isdigit()' phoonnx/phoneme_ids.py

# Show the full load_phoneme_ids function header + body region (best-effort)
python3 - <<'PY'
import inspect
import phoonnx.phoneme_ids as m
print(inspect.getsource(m.load_phoneme_ids))
PY

Repository: TigreGotico/phoonnx

Length of output: 3735


🏁 Script executed:

#!/bin/bash
set -euo pipefail

nl -ba phoonnx/phoneme_ids.py | sed -n '300,380p'

python3 <<'PY'
from io import StringIO
from phoonnx.phoneme_ids import load_phoneme_ids

def run(name, s):
    try:
        out = load_phoneme_ids(StringIO(s))
        print(f"{name}: {out}")
    except Exception as e:
        print(f"{name}: EXC {type(e).__name__}: {e}")

# whitespace-phoneme line variants
run("T1: '3\\n'", "3\n")                 # expected to be skipped by (" " not in line)
run("T2: '3 \\n'", "3 \n")              # passes guard; should map ' ':3
run("T3: '3\\t\\n'", "3\t\n")           # no literal space; should be skipped
run("T4: '3  \\n'", "3  \n")            # literal spaces present; should map ' ':3

# reversed/regular formats
run("T5: 'a 5\\n'", "a 5\n")           # should parse
run("T6: '5 a\\n'", "5 a\n")           # should parse

# malformed / potentially crashing
run("T7: '3 a  \\n'", "3 a  \n")       # trailing spaces in phoneme
run("T8: '3  `#x`\\n'", "3  `#x`\n")      # int('3  `#x`') would crash if reached
run("T9: 'a 5 extra\\n'", "a 5 extra\n") # phoneme_str='5 extra' => not digit, stored as key
PY

Repository: TigreGotico/phoonnx

Length of output: 106


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '300,380p' phoonnx/phoneme_ids.py

python3 <<'PY'
from io import StringIO
from phoonnx.phoneme_ids import load_phoneme_ids

def run(name, s):
    try:
        out = load_phoneme_ids(StringIO(s))
        print(f"{name}: {out}")
    except Exception as e:
        print(f"{name}: EXC {type(e).__name__}: {e}")

# whitespace-phoneme line variants
run("T1 '3\\n'", "3\n")                 # no literal space in line => skipped
run("T2 '3 \\n'", "3 \n")              # literal space => parses as phoneme ' '
run("T3 '3\\t\\n'", "3\t\n")           # tab => no literal space => skipped
run("T4 '3  \\n'", "3  \n")            # literal spaces => phoneme key ' '

# reversed/regular formats
run("T5 'a 5\\n'", "a 5\n")
run("T6 '5 a\\n'", "5 a\n")

# malformed / edge cases
run("T7 '3 a  \\n'", "3 a  \n")       # phoneme_str includes spaces => key may include trailing spaces
run("T8 '3  `#x`\\n'", "3  `#x`\n")      # currently would attempt int(...) if reached => likely crash
run("T9 'a 5 extra\\n'", "a 5 extra\n") # phoneme_str not just digits => stored key '5 extra'
PY

Repository: TigreGotico/phoonnx

Length of output: 2783


Fix whitespace-phoneme ID-only parsing (and numeric-token swapping) in load_phoneme_ids

  • The line.strip().isdigit() “phoneme is whitespace” branch can’t run for digit-only lines like "3\n" because earlier the loader skips any line where " " (literal space) is not present; as a result "3\n" produced {}, while "3 \n" produced {' ': 3} (tabs after the ID are also skipped).
  • The if phoneme_str.isdigit(): phoneme_id, phoneme_str = phoneme_str, phoneme_id swap can misparse numeric phoneme tokens: e.g. "5 3\n" would be treated as reversed and inverted, even though the spec format is ID PHONEME.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phoonnx/phoneme_ids.py` around lines 335 - 341, The parsing in
load_phoneme_ids is brittle: change the logic to split on any whitespace and
handle single-token numeric lines as ID-only (whitespace phoneme) without
relying on a literal space; specifically, use a whitespace split (e.g.,
line.rstrip().split(None, 1)) to get tokens, then if len(tokens) == 1 and
tokens[0].isdigit() set phoneme_id = int(tokens[0]) and phoneme_str = " " (or
the intended whitespace marker), else if len(tokens) == 2 parse phoneme_id =
int(tokens[0]) and phoneme_str = tokens[1]; remove the phoneme_str.isdigit()
swap so numeric phoneme tokens are not incorrectly flipped. Ensure conversions
to int are guarded/validated and refer to load_phoneme_ids, phoneme_id, and
phoneme_str when making the fixes.

@JarbasAl JarbasAl mentioned this pull request Jun 2, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant