[TTS][Magpietts] Unify Longform and Standard Inference logic by subhankar-ghosh · Pull Request #15375 · NVIDIA-NeMo/NeMo

subhankar-ghosh · 2026-02-09T22:43:15Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

This pull request refactors and unifies the text chunking and inference logic for TTS (Text-to-Speech) in the MagpieTTS pipeline. The main change is the replacement of the previous "longform" inference logic with a new, language-aware, unified chunked inference path. This affects dataset preparation, model state management, argument parsing, and the inference runner, making the codebase simpler and more robust for both short and long texts.

Key changes:

Unified Inference and Text Chunking

Replaced the old longform inference logic with a unified, automatic text chunking approach that determines chunking based on per-sample language thresholds. Short texts are processed as single chunks, while long texts are split into sentences automatically. (examples/tts/magpietts_inference.py, nemo/collections/tts/data/text_to_speech_dataset.py, nemo/collections/tts/models/magpietts.py) [1] [2] [3]
Removed all command-line arguments related to explicit longform control (--longform_mode, --longform_word_threshold, etc.), simplifying the inference interface. (examples/tts/magpietts_inference.py) [1] [2] [3]

Dataset and Collation Refactor

Introduced ChunkedTTSInferenceDataset (replacing LongFormTTSInferenceDataset) with per-sample, language-aware chunking and tokenizer selection. The dataset now automatically decides chunking strategy based on language and text length. (nemo/collections/tts/data/text_to_speech_dataset.py) [1] [2] [3]
Updated the dataset's collate_fn to handle variable-length chunked batches, padding as needed, and to generalize beyond the previous longform-specific logic. (nemo/collections/tts/data/text_to_speech_dataset.py) [1] [2]

Model and State Naming Consistency

Renamed all "longform" classes and configs to "chunked" (e.g., LongformDecoderState → ChunkedDecoderState, LongformConfig → ChunkedInferenceConfig) throughout the model code for clarity and consistency with the new unified approach. (nemo/collections/tts/models/magpietts.py) [1] [2] [3] [4]
Removed the _needs_longform_inference method and all language threshold logic from the model, as chunking is now handled in a unified, language-aware way. (nemo/collections/tts/models/magpietts.py)

Utility and Import Updates

Added and updated utility imports for chunked inference and tokenizer selection to support the new pipeline. (nemo/collections/tts/models/magpietts.py, nemo/collections/tts/data/text_to_speech_dataset.py) [1] [2]

These changes make the TTS inference pipeline easier to use and maintain, while improving support for multilingual and variable-length text inputs.

Collection: TTS

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

Copilot

Pull request overview

Refactors MagpieTTS inference to use a single “chunked” inference path for both short and long texts, with dataset-driven automatic sentence chunking based on per-sample language thresholds.

Changes:

Introduces language-aware thresholding + unified chunk_text_for_inference() chunking utility (replacing prior longform detection logic).
Replaces LongFormTTSInferenceDataset with ChunkedTTSInferenceDataset and updates the inference runner to always use the unified multi/single-chunk loop.
Updates CLI/example script to remove explicit longform args and align with the unified inference flow.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`nemo/collections/tts/parts/utils/tts_dataset_utils.py`	Adds language-aware sentence splitting, thresholds, tokenizer mapping, and unified chunking helper.
`nemo/collections/tts/data/text_to_speech_dataset.py`	Replaces longform inference dataset with unified chunked inference dataset + mixed-chunk collation.
`nemo/collections/tts/modules/magpietts_inference/inference.py`	Removes standard/longform branching; always runs unified chunk loop via `generate_speech()`.
`nemo/collections/tts/models/magpietts.py`	Renames longform state/config to chunked equivalents; updates `do_tts()` to the unified chunked generation path.
`examples/tts/magpietts_inference.py`	Removes longform CLI controls and updates messaging for unified chunking behavior.
`tests/collections/tts/parts/utils/test_tts_dataset_utils.py`	Adds unit tests for new thresholds and unified chunking helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nemo/collections/tts/parts/utils/tts_dataset_utils.py

nemo/collections/tts/models/magpietts.py

nemo/collections/tts/modules/magpietts_inference/inference.py

nemo/collections/tts/data/text_to_speech_dataset.py

nemo/collections/tts/parts/utils/tts_dataset_utils.py

blisc · 2026-02-10T16:10:57Z

Can you resolve conflicts?
Can you add a unit test for magpietts.generate_speech that contains a batch of short and long texts?

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

blisc · 2026-02-12T14:25:21Z

The github UI still says that there are conflicts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

nemo/collections/tts/models/magpietts.py

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>

nemo/collections/tts/models/magpietts.py

 import wandb
 from hydra.utils import instantiate
-from lightning.pytorch import Trainer
+from lhotse.serialization import load_yaml


To fix an unused import, we simply remove the import statement (or the specific symbol) that is not used anywhere in the file. This reduces unnecessary dependencies and cleans up the code without affecting runtime behavior.

In this case, the best fix is to delete the import line from lhotse.serialization import load_yaml at line 28 in nemo/collections/tts/models/magpietts.py. No other code changes are necessary, since we are not altering any used functionality and there are no visible references to load_yaml. Ensure that only this line is removed and that all other imports remain untouched.

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

…o into magpietts_longform_unify

Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>

blisc · 2026-02-18T17:11:03Z

nemo/collections/tts/data/text_to_speech_dataset.py

+        # First try sample's tokenizer_names (from dataset config)
+        if data.tokenizer_names is not None:
+            return data.tokenizer_names[0]  # Use first (deterministic for inference)


Where does this tokenizer_names parameter come from? It seems to assume that you read it from the dataset json which doesn't seem ideal

And do we only support this in the non-Lhotse path? What happens if we try a Lhotse dataset?

We do inference on non-Lhotse path only.

github-actions · 2026-02-20T21:53:42Z

[🤖]: Hi @subhankar-ghosh 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

…o into magpietts_longform_unify

subhankar-ghosh · 2026-02-20T22:24:16Z

nemo/collections/tts/modules/magpietts_inference/inference.py

-            chunk_state = self.model.create_longform_chunk_state(batch_size=batch_size)
+            # Clear stale KV cache from prior inference calls (e.g., the previous batch or dataset
+            # may have left with populated tensors).
+            print(f"Resetting KV cache for decoder: {self.model.use_kv_cache_for_inference}")


@XuesongYang Let me know if this piece of code looks good to you.

LGTM!

@rfejgin We never reset cache for local transformer either for standard inference or longform inference before. Do we always disable the kv cache, or depends on how many frames stacked?

Thanks for being careful about this @XuesongYang and @subhankar-ghosh. Still, I think no special handling is needed for the local transformer since the LT already resets its kv cache automatically every timestep (or frame stack), since separate timesteps are completely independent for the LT. The reset step is at the start of local_transformer_sample_autoregressive() and local_transformer_sample_maskgit().

Do we always disable the kv cache, or depends on how many frames stacked?

It depends on the LT type. For Maskgit, we keep KV cache off because the that type of LT is non-causal, which makes standard KV caching impossible. For autoregressive LT it the kv cache is on and we reset it on every timestep (i.e. once per frame or stack).

subhankar-ghosh added 4 commits February 2, 2026 09:51

Unify longform with standard inference

74e0204

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

Unify longform with standard inference - small fixes

7d8e746

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

Hi and Ja

e920a7f

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

Rename longform with chunk

6c0549c

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

subhankar-ghosh requested review from blisc, Copilot and rlangman February 9, 2026 22:43

github-actions bot added the TTS label Feb 9, 2026

Copilot started reviewing on behalf of subhankar-ghosh February 9, 2026 22:43 View session

Apply isort and black reformatting

03a487f

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

Copilot AI reviewed Feb 9, 2026

View reviewed changes

subhankar-ghosh added 2 commits February 12, 2026 01:36

Added Long and short test cases, review comments

9c40f43

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

merge conflict

f8a76a9

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

subhankar-ghosh and others added 2 commits February 12, 2026 09:58

merge conflicts

e7966fc

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

Apply isort and black reformatting

de64eae

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

subhankar-ghosh added skip-linting Run CICD labels Feb 12, 2026

subhankar-ghosh had a problem deploying to test February 12, 2026 18:01 — with GitHub Actions Error

github-advanced-security bot found potential problems Feb 12, 2026

View reviewed changes

nemo/collections/tts/models/magpietts.py Fixed Show fixed Hide fixed

nemo/collections/tts/models/magpietts.py Fixed Show fixed Hide fixed

Potential fix for code scanning alert no. 16979: Unused import

1768f6b

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>

chtruong814 added Run CICD and removed Run CICD labels Feb 12, 2026

Potential fix for code scanning alert no. 16980: Unused import

02572fb

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>

chtruong814 added Run CICD and removed Run CICD labels Feb 12, 2026

chtruong814 temporarily deployed to test February 12, 2026 18:08 — with GitHub Actions Inactive

github-advanced-security bot found potential problems Feb 12, 2026

View reviewed changes

Fix unit tests

5f13ea2

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

Merge branch 'magpietts_longform_unify' of github.com:NVIDIA-NeMo/NeM…

7aa2c36

…o into magpietts_longform_unify

chtruong814 added Run CICD and removed Run CICD labels Feb 12, 2026

chtruong814 temporarily deployed to test February 12, 2026 18:43 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Feb 12, 2026

XuesongYang mentioned this pull request Feb 13, 2026

[MagpieTTS] Mixture-of-Experts #15370

Merged

subhankar-ghosh enabled auto-merge (squash) February 13, 2026 20:32

subhankar-ghosh and others added 2 commits February 17, 2026 02:57

Change ssim_target

6106c18

Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>

Merge branch 'main' into magpietts_longform_unify

50ff865

blisc reviewed Feb 18, 2026

View reviewed changes

blisc mentioned this pull request Feb 18, 2026

[MagpieTTS][bugfix] reset kv cache for longform inference and add missing utmosv2 score #15385

Draft

subhankar-ghosh added the Run CICD label Feb 19, 2026

subhankar-ghosh temporarily deployed to test February 19, 2026 17:35 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Feb 19, 2026

subhankar-ghosh added 2 commits February 20, 2026 14:21

Reset kv cache after a batch.

f51c961

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

Merge branch 'magpietts_longform_unify' of github.com:NVIDIA-NeMo/NeM…

648083d

…o into magpietts_longform_unify

subhankar-ghosh added the Run CICD label Feb 20, 2026

subhankar-ghosh commented Feb 20, 2026

View reviewed changes

subhankar-ghosh deployed to test February 20, 2026 22:24 — with GitHub Actions Active

github-actions bot removed the Run CICD label Feb 21, 2026

@@ -25,7 +25,6 @@
             import torch
             import wandb
             from hydra.utils import instantiate
-            from lhotse.serialization import load_yaml
             from lightning.pytorch.loggers import TensorBoardLogger, WandbLogger
             from omegaconf import DictConfig, OmegaConf, open_dict

Comments

Conversation

subhankar-ghosh commented Feb 9, 2026

What does this PR do ?

Unified Inference and Text Chunking

Dataset and Collation Refactor

Model and State Naming Consistency

Utility and Import Updates

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blisc commented Feb 10, 2026

Uh oh!

blisc commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Check notice

Copilot Autofix

blisc Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

blisc Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

subhankar-ghosh Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

subhankar-ghosh Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

XuesongYang Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfejgin Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

XuesongYang Feb 20, 2026 •

edited

Loading

rfejgin Feb 20, 2026 •

edited

Loading