Skip to content

Conversation

@subhankar-ghosh
Copy link
Collaborator

@subhankar-ghosh subhankar-ghosh commented Aug 25, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add streaming algorithm to magpietts

Collection: [Note which collection this PR will affect]

TTS

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
python scripts/magpietts/infer_and_evaluate_streaming.py \
--checkpoint_files ${CKPT} \
--hparams_files ${HPARAM} \
--codecmodel_path ${CODEC} \
--out_dir ${OUT_DIR} \
--datasets ${DATASET} \
--use_cfg \
--disable_fcd \
--apply_attention_prior

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds streaming inference functionality to MagpieTTS, enabling real-time text-to-speech generation by processing text input incrementally rather than all at once. The streaming algorithm maintains sliding windows for both text and audio history while managing attention priors to ensure coherent audio generation across text chunks.

Key Changes:

  • Implementation of streaming inference algorithm with windowing mechanisms for text and audio tokens
  • Addition of specialized attention prior handling for streaming mode with exponential weight support
  • Extraction of common argument parsing functionality to support both streaming and non-streaming inference scripts

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
scripts/magpietts/infer_and_evaluate_streaming.py New streaming inference script with chunked text processing and windowed generation
nemo/collections/tts/models/magpietts.py Core streaming methods including windowed text processing and streaming-specific attention prior construction
scripts/magpietts/infer_and_evaluate.py Refactored to extract common argument parsing logic and removed combined violin plot functionality
scripts/magpietts/README.md Added documentation and usage example for the new streaming inference capability

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@rfejgin
Copy link
Collaborator

rfejgin commented Aug 27, 2025

Hi Subhankar, nice work!

I have not dived into all the details but here are a few things that come to mind:

  1. There seems to be some code duplication between the streaming and non-streaming version. I wonder if it will become hard to maintain over time. Specifically, in:
  • infer_and_evaluate_streaming.py vs infer_and_evaluate.py - things like setting up checkpoint name, logging of metrics, etc. I do see that there is reuse certain functions from infer_and_evaluate.py but maybe there is more commonality to extract?
  • in magpietts.py does construct_streaming_inference_prior() have major differences (that we actively use) from construct_streaming_prior() aside from including the offset?

I know that there is a tradeoff between eliminating code duplication vs making unified code overly complex, but maybe the above are worth another look?

  1. A README or pointer to documentation on the design of the streaming algorithm would be of interest since it's non trivial.
  2. More of a minor point, but I wonder if logic that needs to know model-specific details like creating a BOS token would be better to put inside the MagpieTTSModel class (accessible to the external script via some API). That way it would also be easier for it to be reused when folks use infer_batch() directly (not through infer_and_evaluate_streaming.py), which is what I believe they do in Riva.

subhankar-ghosh and others added 22 commits November 4, 2025 05:13
Signed-off-by: subhankar-ghosh <[email protected]>
Add padding to text context tokens based on max duration.

Signed-off-by: Subhankar Ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: Subhankar Ghosh <[email protected]>
Removed unused imports and variables from the script.

Signed-off-by: Subhankar Ghosh <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

[🤖]: Hi @subhankar-ghosh 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@subhankar-ghosh subhankar-ghosh merged commit 0e8bab6 into magpietts_2508 Nov 4, 2025
62 checks passed
@subhankar-ghosh subhankar-ghosh deleted the magpitts_2503_small branch November 4, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants