Skip to content

Clarification on experimental setup for DetailCLIP-style comparison in NeurIPS 2025 rebuttal #4

@showstarpro

Description

@showstarpro

Hi SuperCLIP team,

Thank you for the insightful work and for adding additional comparisons in the NeurIPS 2025 rebuttal. I’m particularly interested in the table where SuperCLIP is compared with SLIP, MaskCLIP, A-CLIP, and DetailCLIP under the DetailCLIP-style pretraining setup (15M samples, batch size 4K, 25 epochs).
To better understand and potentially reproduce these results, I would like to ask for clarification on a few experimental details regarding this table:

Training data

Is the 15M-sample pretraining set based on YFCC15M, or another subset / filtering of YFCC?

Were the same images and captions used across all compared methods?

Optimization details

What learning rate (and schedule) was used for SuperCLIP in this setting (1e-3 or 3e-3)?

Were other hyperparameters (optimizer type, weight decay, warmup strategy) aligned with DetailCLIP’s official setup?

These details would be very helpful for ensuring a fair understanding of the reported gains and for future comparisons. Thanks again for the clear rebuttal and strong results.

Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions