Clarification on experimental setup for DetailCLIP-style comparison in NeurIPS 2025 rebuttal

Hi SuperCLIP team,

Thank you for the insightful work and for adding additional comparisons in the NeurIPS 2025 rebuttal. I’m particularly interested in the table where SuperCLIP is compared with SLIP, MaskCLIP, A-CLIP, and DetailCLIP under the DetailCLIP-style pretraining setup (15M samples, batch size 4K, 25 epochs).
To better understand and potentially reproduce these results, I would like to ask for clarification on a few experimental details regarding this table:

**Training data**

Is the 15M-sample pretraining set based on YFCC15M, or another subset / filtering of YFCC?

Were the same images and captions used across all compared methods?

**Optimization details**

What learning rate (and schedule) was used for SuperCLIP in this setting （1e-3 or 3e-3）?

Were other hyperparameters (optimizer type, weight decay, warmup strategy) aligned with DetailCLIP’s official setup?

These details would be very helpful for ensuring a fair understanding of the reported gains and for future comparisons. Thanks again for the clear rebuttal and strong results.

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on experimental setup for DetailCLIP-style comparison in NeurIPS 2025 rebuttal #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on experimental setup for DetailCLIP-style comparison in NeurIPS 2025 rebuttal #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions