integrate srt reading for diarization, splitting and speech recognition by zobadaniel · Pull Request #18 · Softcatala/open-dubbing

zobadaniel · 2025-07-08T12:28:46Z

This PR adds support for specifying a speaker-annotated .srt file as input to the dubbing process.

The steps of audio chunking, speaker diarization and speech-to-text will not be performed on the audio, rather info from the .srt file will be used.

The proces relies on a .srt file with the following properties:

only one line of text per subtitle entry
speaker annotated at the beginning of each line

Example:

5
00:00:13,480 --> 00:00:17,920
[SPEAKER_01]: Deswegen ist er der Kapitän der englischen Nationalmannschaft.

6
00:00:18,039 --> 00:00:21,320
[SPEAKER_01]: Er ist als Spieler sehr gereift und dominiert das Spielgeschehen.

The code uses pysrt for reading the subtitle file.

Please let me know what needs to be changed to have this merged.

…ech recognition fix typo more fixes

jordimas · 2025-07-13T09:15:44Z

Hello,
Could you please explain the use case for this? In what scenario would this workflow be useful for the end user?
Thanks!

zobadaniel · 2025-07-13T10:10:43Z

For some movies subtitles, esp. with main speakers already marked, are already available. Here is a link to how one of the German broadcasters does it: https://www.ard.de/die-ard/EBU-TT-D-Basic-DE-XML-Format-fuer-die-Distribution-von-Untertiteln-in-den-ARD-Mediatheken-102.pdf

Based on that, you can get a .srt file to the video easily (this tool for instance already does it for you: https://mediathekview.de/). Conversion from this format to the one referred to here is easy, I'd add a python script in a different PR if this functionality is accepted in general.

The main advantage I see over using automated segmentation, diarization and speech recognition is the amount of text. These subtitles are not word-for-word, but somehow condensed. This makes it much easier to fit the audio from the synthesized target language into the foreseen timeslot - and thus less required or less dramatic audio speed-ups.

first try to integrate srt reading for diarization, splitting and spe…

649e5dc

…ech recognition fix typo more fixes

milhy545 approved these changes Aug 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrate srt reading for diarization, splitting and speech recognition#18

integrate srt reading for diarization, splitting and speech recognition#18
zobadaniel wants to merge 1 commit into
Softcatala:mainfrom
ZalozbaDev:use_srt_as_input_for_annotate_diarization_stt

zobadaniel commented Jul 8, 2025

Uh oh!

jordimas commented Jul 13, 2025

Uh oh!

zobadaniel commented Jul 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zobadaniel commented Jul 8, 2025

Uh oh!

jordimas commented Jul 13, 2025

Uh oh!

zobadaniel commented Jul 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants