OOM when running model `stt_en_conformer_ctc_large` #2834

Answered by titu1994

eschmidbauer asked this question in Q&A

eschmidbauer
Sep 16, 2021

Hi,
im trying to test ASR with the stt_en_conformer_ctc_large model on an instance with 15G RAM, it works well for small files but when i try a 2.8mb file, the python script uses all 15G RAM and the kernel's OOM kills the process. I've read that the solution to larger files is to use VAD but I haven't been able to find any documentation or examples on how this can be done
Here is an example output where OOM kills the script:

.

Answered by titu1994

Any audio above 20-30 seconds will oom with Conformer. You can use buffered audio evaluation with conformer to step around that issue

https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_buffered_infer.py

Note: change the modek stride to 4 instead of 8 for conformer and use larger chunk size (upto 10-15 sec) for more accurate transcriptions.

Tutorial describing this - https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Streaming_ASR.ipynb

View full answer

Replies: 1 comment 2 replies

titu1994
Sep 16, 2021
Maintainer

Any audio above 20-30 seconds will oom with Conformer. You can use buffered audio evaluation with conformer to step around that issue

https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_buffered_infer.py

Note: change the modek stride to 4 instead of 8 for conformer and use larger chunk size (upto 10-15 sec) for more accurate transcriptions.

Tutorial describing this - https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Streaming_ASR.ipynb

2 replies

eschmidbauer Sep 17, 2021
Author

I will give this a try out and report an update- thank you!

eschmidbauer Sep 17, 2021
Author

I was able to put together a test script using the links you shared as a reference. thank you for the guidance!

Answer selected by titu1994

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment