New ML yaml + changes to allow for Spectral Codec training with text context #14894

blisc · 2025-10-07T16:41:27Z

No description provided.

Signed-off-by: Jason <[email protected]>

Copilot

Pull Request Overview

This PR introduces changes to support Spectral Codec training with text context in the Magpie-TTS model. The changes include modifications to codec model loading to disable loss modules during inference and variable reorganization to handle different codebook configurations.

Codec model loading enhancement to disable SCL loss during inference for memory optimization
Variable restructuring to distinguish between data and model codebook configurations
New multilingual configuration file for Magpie-TTS with text conditioning support

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
nemo/collections/tts/models/magpietts.py	Modified codec loading logic and reorganized codebook variables to support different training scenarios
examples/tts/conf/magpietts/magpietts_multilingual_v2_lhotse.yaml	Added new configuration file for multilingual Magpie-TTS with text conditioning

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

nemo/collections/tts/models/magpietts.py

Signed-off-by: Jason <[email protected]>

Signed-off-by: blisc <[email protected]>

rlangman

Looks good to me.

rlangman · 2025-10-20T22:52:44Z

nemo/collections/tts/data/text_to_speech_dataset.py

+                # @blisc: Added a +1. If we send in exactly 882 samples, then a conv layer complains about padding.
+                #         Adding 883 works. This occurs when we use text context during inference.
+                context_audio = torch.zeros(self.codec_model_samples_per_frame + 1, dtype=torch.float32)


I am not sure if there is a clean way to handle reflect/replicate padding when input is short. We could have each codec architecture define a minimum length it can handle, and then have pad_audio zero pad to at least that length: https://github.com/blisc/NeMo/blob/magpietts_2503/nemo/collections/tts/models/audio_codec.py#L453

I agree, we should do that.

blisc added 8 commits September 22, 2025 11:58

Add new config

cebb6ee

Signed-off-by: Jason <[email protected]>

update wandb configs

d1f6e49

Signed-off-by: Jason <[email protected]>

update config

2b01cb1

Signed-off-by: Jason <[email protected]>

add separate tokenizer for text condition

7d2b988

Signed-off-by: Jason <[email protected]>

update codec loading

3480b46

Signed-off-by: Jason <[email protected]>

merge latest changes

9131c64

Signed-off-by: Jason <[email protected]>

add it tokenizer

a436004

Signed-off-by: Jason <[email protected]>

fix attempt 1

5c895e1

Signed-off-by: Jason <[email protected]>

github-actions bot added the TTS label Oct 7, 2025

blisc requested a review from subhankar-ghosh October 7, 2025 16:41

blisc added the Run CICD label Oct 7, 2025

blisc requested a review from Copilot October 7, 2025 16:41

Copilot AI reviewed Oct 7, 2025

View reviewed changes

nemo/collections/tts/models/magpietts.py Outdated Show resolved Hide resolved

nemo/collections/tts/models/magpietts.py Show resolved Hide resolved

blisc temporarily deployed to test October 7, 2025 16:42 — with GitHub Actions Inactive

add an additional +1 for dataset

f977815

Signed-off-by: Jason <[email protected]>

blisc requested a review from rlangman October 10, 2025 18:00

chtruong814 added Run CICD and removed Run CICD labels Oct 10, 2025

Merge branch 'magpietts_2508' into magpietts_2508_jasondev0

f4c7181

chtruong814 added Run CICD and removed Run CICD labels Oct 10, 2025

Apply isort and black reformatting

c91a808

Signed-off-by: blisc <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Oct 10, 2025

chtruong814 temporarily deployed to test October 10, 2025 18:04 — with GitHub Actions Inactive

rlangman approved these changes Oct 20, 2025

View reviewed changes

blisc merged commit 22be3f4 into NVIDIA-NeMo:magpietts_2508 Oct 21, 2025
63 checks passed

blisc deleted the magpietts_2508_jasondev0 branch October 21, 2025 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New ML yaml + changes to allow for Spectral Codec training with text context #14894

New ML yaml + changes to allow for Spectral Codec training with text context #14894

Uh oh!

blisc commented Oct 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

rlangman left a comment

Uh oh!

rlangman Oct 20, 2025

Uh oh!

blisc Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New ML yaml + changes to allow for Spectral Codec training with text context #14894

New ML yaml + changes to allow for Spectral Codec training with text context #14894

Uh oh!

Conversation

blisc commented Oct 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

rlangman left a comment

Choose a reason for hiding this comment

Uh oh!

rlangman Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

blisc Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants