talker-tracker

A visual speaker-activity extractor for multi-avatar videos. This tool detects when each on-screen avatar is "speaking" based on changes in brightness or motion within defined regions of interest (ROIs). It generates an .ass (Advanced Substation Alpha) subtitle file where the active segments are marked for each speaker.

Prerequisites

Python 3.7+
opencv-python-headless
numpy

Installation

Install the required Python packages:

pip install opencv-python-headless numpy

Usage

1. Create a Configuration File

Create a JSON file (e.g., config.json) defining the speakers and their positions (ROIs) in the video.

Configuration Structure:

{
  "source_video": "path/to/video.mp4",
  "output_ass": "output.ass",
  "threshold_brightness": 50,
  "threshold_motion": 1000,
  "min_duration_frames": 5,
  "merge_gap_frames": 10,
  "speakers": [
    {
      "name": "Alice",
      "rect": [100, 100, 150, 150],
      "idle_time": 0.5
    },
    {
      "name": "Bob",
      "rect": [400, 100, 150, 150],
      "idle_time": 2.0
    }
  ]
}

Parameters:

source_video: (Optional) Path to the input video. Can be overridden by CLI argument.
output_ass: (Optional) Path for the output ASS file. Can be overridden by CLI argument.
threshold_brightness: Minimum increase in average brightness relative to the idle state to consider the ROI "active".
threshold_motion: Minimum sum of absolute differences between frames to consider the ROI "moving".
min_duration_frames: Minimum number of consecutive active frames required to create a segment.
merge_gap_frames: Maximum number of inactive frames between segments to merge them into a single continuous segment.
speakers: A list of speaker objects.
- name: The name of the speaker (used for the subtitle style and event).
- rect: The ROI [x, y, width, height] in pixels.
- idle_time: Time in seconds where this speaker is known to be idle (dark/static). This is used as a reference to calculate the baseline brightness.

2. Run the Tool

Run the script with the path to your configuration file:

python main.py config.json

Optional Arguments:

--video: Override the video path specified in the config.
--output: Override the output path specified in the config.

python main.py config.json --video my_video.mp4 --output subtitles.ass

3. Generate Synthetic Test Video

You can generate a synthetic test video to verify the tool's functionality:

python generate_test_video.py

This will create test_video.mp4 and test_config.json. You can then run the main tool against these:

python main.py test_config.json --output test.ass

Output

The tool generates an .ass file containing:

Styles: A style defined for each speaker, positioned at their configured ROI.
Events: Subtitle events with correct start and end times corresponding to when the speaker was detected as active. The text content of the events is empty.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
ass_writer.py		ass_writer.py
config.example.json		config.example.json
detector.py		detector.py
generate_test_video.py		generate_test_video.py
main.py		main.py
srt_writer.py		srt_writer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

talker-tracker

Prerequisites

Installation

Usage

1. Create a Configuration File

2. Run the Tool

3. Generate Synthetic Test Video

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

talker-tracker

Prerequisites

Installation

Usage

1. Create a Configuration File

2. Run the Tool

3. Generate Synthetic Test Video

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages