OOM when running model stt_en_conformer_ctc_large
#2834
-
|
Hi, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
Any audio above 20-30 seconds will oom with Conformer. You can use buffered audio evaluation with conformer to step around that issue https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_buffered_infer.py Note: change the modek stride to 4 instead of 8 for conformer and use larger chunk size (upto 10-15 sec) for more accurate transcriptions. Tutorial describing this - https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Streaming_ASR.ipynb |
Beta Was this translation helpful? Give feedback.

Any audio above 20-30 seconds will oom with Conformer. You can use buffered audio evaluation with conformer to step around that issue
https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_buffered_infer.py
Note: change the modek stride to 4 instead of 8 for conformer and use larger chunk size (upto 10-15 sec) for more accurate transcriptions.
Tutorial describing this - https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Streaming_ASR.ipynb