Skip to content

push_to_hub does not upload videos #7493

@DominikVincent

Description

@DominikVincent

Describe the bug

Hello,

I would like to upload a video dataset (some .mp4 files and some segments within them), i.e. rows correspond to subsequences from videos. Videos might be referenced by several rows.

I created a dataset locally and it references the videos and the video readers can read them correctly. I use push_to_hub() to upload the dataset to the hub.

Expectation: A user uses load_dataset and can load the videos.

However, the videos seem to be just referenced via paths on the computer and not uploaded to the hub. Therefore a target user cannot load the videos in the dataset.

Steps to reproduce the bug

  1. create a video dataset with paths e.g. { ["videos"]: [path1, path2, ...]}
  2. dataset.push_to_hub
  3. on a different computer (or same pc if relative paths are used in a different folder):
dataset = load_dataset("siplab/egosim", split="train")
video = dataset[0]["video_head"]
  1. will fail

Expected behavior

Expectation: A user uses load_dataset and can load the videos.

Environment info

datasets 3.1.0
Python 3.8.18

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions