Skip to content

How to download the SBU caption dataset? #71

@Jingchensun

Description

@Jingchensun

Hi,

I’m trying to replicate the experiments and have successfully downloaded all the datasets except for the SBU Caption dataset.

The SBU Captioned Dataset is described as a collection of 1 million image URLs with associated captions sourced from Flickr.

Unfortunately:

The download link provided in this repo has expired.

The dataset's official website (https://www.cs.rice.edu/~vo9/sbucaptions/) is still online, but the image URLs listed there (e.g., http://static.flickr.com/2723/4385058960_b0f291553e.jpg) are no longer accessible—most return 403 errors or are permanently unavailable due to image deletions or access restrictions on Flickr.

I would like to ask:

How did the authors download this dataset for Stage-2 multi-task fine-tuning in your recent experiments?

Would it be possible to share a copy of the preprocessed dataset (images) in a ZIP or other archive format to facilitate reproducibility?

Thank you very much for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions