Skip to content
Discussion options

You must be logged in to vote

Any audio above 20-30 seconds will oom with Conformer. You can use buffered audio evaluation with conformer to step around that issue

https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_buffered_infer.py

Note: change the modek stride to 4 instead of 8 for conformer and use larger chunk size (upto 10-15 sec) for more accurate transcriptions.

Tutorial describing this - https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Streaming_ASR.ipynb

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@eschmidbauer
Comment options

@eschmidbauer
Comment options

Answer selected by titu1994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants