You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Perhaps this is already possible, but I'm struggling to get 'neater' and more 'logical' splits/merges in an automatic way for generated subtitles.
When I generate subtitles with faster-whisper, it mostly looks great, but there are often a good few parts where it's not quite as well split/merged as it could be. Here are a few examples:
Bad splitting of sentences.
74
00:05:43,370 --> 00:05:47,069
You take two hydrogen atoms, you ram them
together, and what's left over is a helium
75
00:05:47,070 --> 00:05:48,070
atom.
Ideally, something like this would work out better:
74
00:05:43,370 --> 00:05:45,851
You take two hydrogen atoms,
you ram them together
75
00:05:45,876 --> 00:05:48,070
and what's left over is a helium atom.
Perhaps an 'automated' way to do this would be a function that does the following (although I'm sure there's a simpler way to do this!):
Checks if there are any sections with less than 3 words in them.
If a section meets that condition then...
Check if the section before it is less than 50ms away.
If that is true then...
Check if merging them would clash with the line max length, if it would, split off some of the previous longer section and merge it with the shorter section. Otherwise just merge the shorter section into the longer section.
——————————————
Sections with long gaps between lines (kind of the opposite issue of the above)
71
00:05:25,800 --> 00:05:32,440
And it's created by one of the most violent
reactions in the universe... nuclear fusion.
Ideally should be more like this:
71
00:05:25,800 --> 00:05:30,253
And it's created by one of the most
violent reactions in the universe...
72
00:05:31,186 --> 00:05:33,113
nuclear fusion.
Perhaps this is more of an issue with Whisper (maybe there's a setting to fix it?), but it'd be great to be able to automatically check if there are sections with gaps in dialogue longer than 1 second, and if so, split the part that comes after the gap off into its own section.
As mentioned above, there could already be automatic fixes for this, but I've not managed to find a solution using 'fix common errors' etc, but please let me know if there is an automatic solution for this already.
The text was updated successfully, but these errors were encountered:
Perhaps this is already possible, but I'm struggling to get 'neater' and more 'logical' splits/merges in an automatic way for generated subtitles.
When I generate subtitles with faster-whisper, it mostly looks great, but there are often a good few parts where it's not quite as well split/merged as it could be. Here are a few examples:
Ideally, something like this would work out better:
Perhaps an 'automated' way to do this would be a function that does the following (although I'm sure there's a simpler way to do this!):
If a section meets that condition then...
If that is true then...
——————————————
Ideally should be more like this:
Perhaps this is more of an issue with Whisper (maybe there's a setting to fix it?), but it'd be great to be able to automatically check if there are sections with gaps in dialogue longer than 1 second, and if so, split the part that comes after the gap off into its own section.
As mentioned above, there could already be automatic fixes for this, but I've not managed to find a solution using 'fix common errors' etc, but please let me know if there is an automatic solution for this already.
The text was updated successfully, but these errors were encountered: