feat: enable timestamp support for batched beam search in RNN-T and TDT by pherber3 · Pull Request #15411 · NVIDIA-NeMo/NeMo

pherber3 · 2026-02-17T22:41:07Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Enable compute_timestamps=True for MALSD batch and MAES batch beam search strategies in RNN-T and TDT models (such as parakeet v3). Previously, these strategies raised NotImplementedError("Preserve alignments is not supported"), blocking timestamp generation even though the beam search infrastructure already tracks timestamp data internally via BatchedBeamHyps.

Collection: ASR

Changelog

Replace NotImplementedError with a warning in ModifiedALSDBatchedRNNTComputer, ModifiedALSDBatchedTDTComputer, and ModifiedAESBatchedRNNTComputer. Full alignment logprobs are unavailable in beam search, but timestamps are.
Add token_durations tensor to BatchedBeamHyps for TDT models so _compute_offsets_tdt() receives both start-frame timestamps and per-token durations.
Store start-frame (not end-frame) in TDT beam timestamps, following the greedy decoding implementation.
Populate Hypothesis.token_duration in to_hyps_list() and to_nbest_hyps_list() for TDT.
Add 'malsd_batch' and 'maes_batch' to the beam strategy lists for preserve_alignments / compute_timestamps config resolution in RNNTDecoding.
Add tests for timestamp generation with both RNN-T and TDT beam decoding.

Usage

import nemo.collections.asr as nemo_asr
from omegaconf import open_dict

model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v3")

cfg = model.cfg.decoding
with open_dict(cfg):
    cfg.strategy = "malsd_batch"
    cfg.compute_timestamps = True
    cfg.preserve_alignments = True
    cfg.beam.beam_size = 4
    cfg.beam.search_type = "malsd_batch"
    cfg.beam.return_best_hypothesis = True
model.change_decoding_strategy(cfg)

output = model.transcribe(["audio.wav"], timestamps=True)
print(output[0].timestamp)  # {'char': [...], 'word': [...], 'segment': [...]}

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? (N/A - didn't find any related docs to change for this)
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc) (No)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

@nithinraok - Tagging according to above guidelines for ASR-related changes

Additional Information

No existing issue, just discovered while trying to use model.transcribe(timestamps=True) with malsd_batch strategy that it wasn't supported even though the capabilities are there under the hood I believe.
All new tensors are pre-allocated with fixed size, should preserve CUDA graph compatibility.
Tested on L40S GPU with stt_en_conformer_transducer_small (RNN-T) and nvidia/stt_en_fastconformer_tdt_large (TDT).
Also tested on production call center audio (~60 min stereo calls mixed to mono) with nvidia/parakeet-tdt-0.6b-v3 using malsd_batch + GPU-PB phrase boosting + NGPU-LM shallow fusion trained using the kenlm script. Timestamps are generated correctly and match the greedy decoder's segment/word boundaries.
Note 1: malsd_batch with high LM weights (e.g., ngram_lm_alpha=0.75) can cause content dropout on long audio where segments of speech get skipped entirely. I believe this is a pre-existing beam pruning interaction with LM scoring, not related to this PR's changes. Lower LM weights (0.2), greedy decoding with the same LM, or just using the phrase boosting alone do not exhibit this behavior.
Note 2: MAES batch is RNN-T only, there is no TDT MAES computer (i.e., tdt_maes_batched_computer.py does not exist). If that is made then this should support it but until then TDT models would only be supported with MALSD.

Signed-off-by: Patrick Herbert <pherbert@gohealth.com>

github-actions bot added the ASR label Feb 17, 2026

fix: enable timestamps for MALSD/MAES batch beam search in RNN-T and TDT

a26a895

Signed-off-by: Patrick Herbert <pherbert@gohealth.com>

pherber3 force-pushed the main branch from 9455ec5 to a26a895 Compare February 17, 2026 22:51

pherber3 marked this pull request as ready for review February 17, 2026 23:05

pherber3 changed the title ~~fix: enable timestamps for MALSD/MAES batch beam search in RNN-T and TDT~~ feat: enable timestamps for MALSD/MAES batch beam search in RNN-T and TDT Feb 17, 2026

pherber3 changed the title ~~feat: enable timestamps for MALSD/MAES batch beam search in RNN-T and TDT~~ feat: enable timestamp support for batched beam search in RNN-T and TDT Feb 17, 2026

nithinraok requested a review from artbataev February 19, 2026 05:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: enable timestamp support for batched beam search in RNN-T and TDT#15411

feat: enable timestamp support for batched beam search in RNN-T and TDT#15411
pherber3 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
pherber3:main

pherber3 commented Feb 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

pherber3 commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pherber3 commented Feb 17, 2026 •

edited

Loading