Gram Anchoring Stage

This is excellent work, but I have some questions that I hope can be answered.
1. Based on the instructions in the [README file](https://github.com/facebookresearch/dinov3/tree/main?tab=readme-ov-file#exact-dinov3-setup-training-dinov3-vit-7b16), by specifying the corresponding configuration files, the Stage 1 training should be performed first, followed by Stage 2 training. In the Stage 2 training [configuration file](https://github.com/facebookresearch/dinov3/blob/main/dinov3/configs/train/dinov3_vit7b16_gram_anchor.yaml), the setting is that gram loss will only start being enabled after 1 Million iterations. However, according to the paper, after completing Stage 1 training, gram loss should be applied immediately when starting the next phase of training. Given the current training procedure, Stage 2 re-trains for 1 Million iterations from scratch (before gram loss is activated). Why is this the case?
2. The paper shows that the dense features at 200K iterations in the early stage of training are better than those at 1M iterations. However, the current practice is to use the checkpoint from the end of Stage 1 (1M iterations) as the gram teacher for Stage 2. This is contradictory to the findings presented in the paper.
3. If that is the case, can I then assume that Stage 1 (as defined in the configuration file) only serves the purpose of finding a Gram Teacher? And, does the second stage (Stage 2) launched with the current configuration file, actually simultaneously include the Pre-training stage (Stage 1 in the paper) and the Gram Anchor stage (Stage 2 in the paper)?

I look forward to your reply

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gram Anchoring Stage #293

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gram Anchoring Stage #293

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions