Skip to content

feat(train): Mixer-TTS training (Lightning)#146

Draft
JarbasAl wants to merge 1 commit into
devfrom
feat/mixertts-training
Draft

feat(train): Mixer-TTS training (Lightning)#146
JarbasAl wants to merge 1 commit into
devfrom
feat/mixertts-training

Conversation

@JarbasAl

@JarbasAl JarbasAl commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Adds Mixer-TTS training to phoonnx_train, ported from nipponjo/mixer-tts-pytorch into phoonnx_train's Lightning framework (complements the inference engine in the merged Mixer-TTS PR).

What's in

  • phoonnx_train/mixertts/ — vendored pure-torch model + losses + unsupervised aligner + dataset (rewritten to the package namespace).
  • MixerTTSModel(LightningModule) — wraps the model's training forward + _metrics (mel/duration/pitch/energy/CTC/binarization losses) and an optional LSGAN PatchDiscriminator for mel naturalness. Manual optimization with generator + discriminator optimizers, matching the upstream GAN loop.
  • train CLIpython -m phoonnx_train.mixertts.train --dataset-dir … --quality {x-low,medium,high} (1.74M/3.17M/20.6M params).
  • [train] extra gains einops + numba (the aligner).
  • Docs: training section in docs/mixertts.md.

Verified

4 smoke tests — the Lightning module builds the model + GAN critic, exposes two optimizers (one without GAN), and the inference/export path runs. Full suite 215 passed.

Scope: this wires up the training pipeline + CLI. Full convergence needs a preprocessed dataset (mel + pitch dirs) and a GPU — out of scope for CI. A trained checkpoint exports to ONNX via the same contract as the mirrored voices.

🤖 Generated with Claude Code

Add phoonnx_train/mixertts/ — Mixer-TTS training ported from
nipponjo/mixer-tts-pytorch into phoonnx_train's Lightning framework:

- MixerTTSModel(LightningModule): wraps the model's training forward + _metrics
  (mel/duration/pitch/energy/CTC/binarization losses + unsupervised aligner) and
  an optional LSGAN PatchDiscriminator (manual optimization, generator + disc
  optimizers), matching the upstream GAN loop.
- vendored pure-torch model + losses + aligner + dataset under mixertts/models
  and mixertts/utils (rewritten to the package namespace).
- train CLI (phoonnx_train.mixertts.train) with quality tiers (1.74M/3.17M/20.6M).
- [train] extra gains einops + numba (aligner).

4 smoke tests (builds, g+d optimizers, no-gan single optimizer, inference path);
full suite 215 passed. Convergence needs a preprocessed dataset + GPU.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 09925b58-0e13-47cb-bf9c-f5a5d4463479

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/mixertts-training

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

Look what I found! The automated check results are in. 🔍

I've aggregated the results of the automated checks for this PR below.

📋 Repo Health

Scanning for any signs of 'large file' weight gain. ⚖️

⚠️ Some required files are missing.

Latest Version: 1.10.0a1

phoonnx/version.py — Version file
README.md — README
LICENSE — License file
pyproject.toml — pyproject.toml
⚠️ setup.py — setup.py
CHANGELOG.md — Changelog
phoonnx/version.py has valid version block markers

🏷️ Release Preview

I've generated a preview of the upcoming changes. 🎬

Current: 1.10.0a1Next: 1.10.0a2

Signal Value
Label (none)
PR title feat(train): Mixer-TTS training (Lightning)
Bump alpha

⚠️ No conventional commit prefix — alpha-only bump.
Suggested: fix: update the thing or feat: update the thing


🚀 Release Channel Compatibility

Predicted next version: 1.10.0a2

Channel Status Note Current Constraint
Stable Not in channel -
Testing Not in channel -
Alpha Not in channel -

📊 Coverage

Diving deep into the code to see what's covered! 🤿

39.4% total coverage

Files below 80% coverage (37 files)
File Coverage Missing lines
phoonnx/cli.py 0.0% 98
phoonnx/thirdparty/kog2p/__init__.py 0.0% 203
phoonnx/thirdparty/mantoq/unicode_symbol2label.py 0.0% 1
phoonnx/thirdparty/bw2ipa.py 7.5% 86
phoonnx/thirdparty/mantoq/pyarabic/number.py 7.7% 371
phoonnx/thirdparty/mantoq/buck/phonetise_buckwalter.py 10.4% 180
phoonnx/thirdparty/hangul2ipa.py 16.6% 372
phoonnx/phonemizers/en.py 17.5% 104
phoonnx/thirdparty/mantoq/pyarabic/trans.py 18.2% 135
phoonnx/model_manager.py 20.0% 212
phoonnx/voice.py 21.7% 220
phoonnx/thirdparty/zh_num.py 23.1% 83
phoonnx/phonemizers/mul.py 23.9% 236
phoonnx/thirdparty/tashkeel/__init__.py 23.9% 89
phoonnx/phonemizers/zh.py 27.0% 92
phoonnx/phonemizers/ko.py 30.4% 32
phoonnx/phonemizers/gl.py 31.1% 42
phoonnx/phonemizers/ar.py 31.2% 44
phoonnx/thirdparty/mantoq/buck/tokenization.py 32.5% 27
phoonnx/thirdparty/phonikud/__init__.py 35.3% 11
phoonnx/phonemizers/ja.py 36.0% 32
phoonnx/phonemizers/fa.py 36.4% 14
phoonnx/phonemizers/pt.py 38.1% 13
phoonnx/thirdparty/mantoq/pyarabic/normalize.py 38.1% 13
phoonnx/thirdparty/mantoq/pyarabic/araby.py 39.7% 298
phoonnx/phonemizers/he.py 40.0% 12
phoonnx/phonemizers/vi.py 40.0% 12
phoonnx/phonemizers/base.py 40.8% 71
phoonnx/thirdparty/mantoq/pyarabic/stack.py 45.5% 6
phoonnx/thirdparty/mantoq/num2words.py 47.6% 11
phoonnx/phonemizers/mwl.py 50.0% 8
phoonnx/tokenizer.py 52.4% 147
phoonnx/thirdparty/mantoq/__init__.py 60.0% 10
phoonnx/thirdparty/mantoq/pyarabic/arabrepr.py 60.0% 6
phoonnx/config.py 61.0% 130
phoonnx/engines/vocoders/griffinlim.py 61.4% 27
phoonnx/engines/optispeech.py 69.6% 24

Full report: download the coverage-report artifact.

🔍 Lint

Ensuring your contribution is moving forward. 🚀

ruff: issues found — see job log

🔒 Security (pip-audit)

Ensuring our encryption is top-notch. 🔐

✅ No known vulnerabilities found (61 packages scanned).

⚖️ License Check

Checking for any restrictive patent clauses. 📜

❌ License violations detected (43 packages) — review required before merging.

Dependency                          License Name                                            License Type         Misc                                    
phoonnx:1.3.3                       Error                                                   Error                                                        

License Type                        Found                                                  
Error                               1

License distribution: 14× MIT License, 7× Apache Software License, 5× MIT, 3× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 1× 3-Clause BSD License, 1× Apache Software License; BSD License, +8 more

Full breakdown — 43 packages
Package Version License URL
build 1.5.0 MIT link
certifi 2026.5.20 Mozilla Public License 2.0 (MPL 2.0) link
charset-normalizer 3.4.7 MIT link
click 8.4.1 BSD-3-Clause link
combo_lock 0.3.1 Apache-2.0 link
dateparser 1.4.0 BSD License link
filelock 3.29.1 MIT link
flatbuffers 25.12.19 Apache Software License link
idna 3.18 BSD-3-Clause link
json-database 0.10.1 MIT link
kthread 0.2.3 MIT License link
langcodes 3.5.1 MIT License link
markdown-it-py 4.2.0 MIT License link
mdurl 0.1.2 MIT License link
memory-tempfile 2.2.3 MIT License link
numpy 2.4.6 BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0 link
onnxruntime 1.26.0 MIT License link
ovos-config 2.1.1 Apache-2.0 link
ovos-date-parser 0.7.0a5 Apache Software License link
ovos-number-parser 0.5.1 Apache Software License link
ovos-utils 0.8.5 Apache-2.0 link
packaging 26.2 Apache-2.0 OR BSD-2-Clause link
pexpect 4.9.0 ISC License (ISCL) link
phoonnx 1.10.0a1 Apache Software License link
protobuf 7.35.0 3-Clause BSD License link
ptyprocess 0.7.0 ISC License (ISCL) link
pyee 13.0.1 MIT License link
Pygments 2.20.0 BSD-2-Clause link
pyproject_hooks 1.2.0 MIT License link
python-dateutil 2.9.0.post0 Apache Software License; BSD License link
pytz 2026.2 MIT License link
PyYAML 6.0.3 MIT License link
quebra-frases 0.3.7 Apache Software License link
regex 2026.5.9 Apache-2.0 AND CNRI-Python link
requests 2.34.2 Apache Software License link
rich 13.9.4 MIT License link
rich-click 1.9.8 MIT License

Copyright (c) 2022 Phil Ewels

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
| link |
| six | 1.17.0 | MIT License | link |
| typing_extensions | 4.15.0 | PSF-2.0 | link |
| tzlocal | 5.3.1 | MIT License | link |
| unicode-rbnf | 2.4.0 | MIT License | |
| urllib3 | 2.7.0 | MIT | link |
| watchdog | 6.0.0 | Apache Software License | link |

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

🔨 Build Tests

Checking the plumbing of your data flows. 🚰

✅ All versions pass

Python Build Install Tests
3.10
3.11
3.12
3.13
3.14

From the digital workshop of OpenVoiceOS. 🛠️

@JarbasAl

JarbasAl commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Companion tracking issue: #169

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant