-
-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash during alignment of near-30-sec audio sources #429
Comments
-fixed `nonspeech_skip` causing alignment to skip sections of speech -fixed "'last_ts' referenced before assignment" error for alignment (#429)
Thanks for the detailed report. |
Thank you @jianfch for the quick fix! |
Dear Friend, |
Good call. Alignment is a major feature so this fix justifies a minor version bump. |
Hi - Thanks for the great library and work - You are awesome.
I would like to point out a crash I stumbled upon, with as much detail as I can provide.
Version - 2.18.1
Repro is not platform specific, or model specific, or engine specific - so should be easy to repro.
(Attached a sample audio file)
The repro is quite simple from user-land perspective - start with a simple align call:
This, in some specific scenario can cause a crash which I will do my best to describe below.
First - Call Stack would go to
and return on:
stable-ts/stable_whisper/non_whisper/alignment.py
Line 905 in e7ff3dd
Where the code would detect a final (and only!) "non-speech" timing which, if taking into consideration the min-word duration ends after the audio end.
Code would look like this at this point:
Now, for an audio source which is 30s - it is the first iteration within the main loop of
Aligner.align
.This loop will "continue" upon the first iteration since the
_skip_nonspeech
method above returns None.stable-ts/stable_whisper/non_whisper/alignment.py
Line 298 in e7ff3dd
At this point - the loop re-initializes, but the
_seek_sample
has been pushed by_skip_nonspeech
to the end of the audio source. Thus, the_time_offset
is at the end, and no more chunks are available to process. The loop will break.stable-ts/stable_whisper/non_whisper/alignment.py
Line 292 in e7ff3dd
Finally, the pbar update outside the loop will reference
last_ts
which was never initialized - and will crash the align method.Throws:
UnboundLocalError: local variable 'last_ts' referenced before assignment
So far my analysis of the crash. My speculation is that the logic that pushes the seek to the end might be flawed, since the audio sources where this happens - are indeed short, but do have speech on them. So there should be no reason to skip the first (and only) 30s.
The logic looks for the "first" non speech to see how much to "skip ahead" but when this non speech is the only one and at the end - something does not make sense.
I am not sure, since this part is not documented - but perhaps you will see it clearly.
I am attaching a sample audio file (in Hebrew) that reproduces this (regardless of text to be aligned). I call the align with default settings and with the "tiny" open ai mode, on a cpu.
https://drive.google.com/file/d/11ZcSy0AUSZGZ3qQqbBEoqYVcwO6uY9lZ/view?usp=sharing
The text was updated successfully, but these errors were encountered: