integrate srt reading for diarization, splitting and speech recognition#18
Conversation
…ech recognition fix typo more fixes
|
Hello, |
|
For some movies subtitles, esp. with main speakers already marked, are already available. Here is a link to how one of the German broadcasters does it: https://www.ard.de/die-ard/EBU-TT-D-Basic-DE-XML-Format-fuer-die-Distribution-von-Untertiteln-in-den-ARD-Mediatheken-102.pdf Based on that, you can get a .srt file to the video easily (this tool for instance already does it for you: https://mediathekview.de/). Conversion from this format to the one referred to here is easy, I'd add a python script in a different PR if this functionality is accepted in general. The main advantage I see over using automated segmentation, diarization and speech recognition is the amount of text. These subtitles are not word-for-word, but somehow condensed. This makes it much easier to fit the audio from the synthesized target language into the foreseen timeslot - and thus less required or less dramatic audio speed-ups. |
This PR adds support for specifying a speaker-annotated .srt file as input to the dubbing process.
The steps of audio chunking, speaker diarization and speech-to-text will not be performed on the audio, rather info from the .srt file will be used.
The proces relies on a .srt file with the following properties:
Example:
5
00:00:13,480 --> 00:00:17,920
[SPEAKER_01]: Deswegen ist er der Kapitän der englischen Nationalmannschaft.
6
00:00:18,039 --> 00:00:21,320
[SPEAKER_01]: Er ist als Spieler sehr gereift und dominiert das Spielgeschehen.
The code uses pysrt for reading the subtitle file.
Please let me know what needs to be changed to have this merged.