-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
The predict_step of the model stores reconstructions as the video is processed. For long videos this can quickly lead to OOM errors, and is furthermore not required when just trying to access the CLS token.
Similarly, the ViT stores the patch embeddings in the predict_step, which I'm not sure is necessary.
A cleaner approach would be:
- only save reconstructions if this is explicitly requested from the CLI
- do not save patch embeddings, only CLS tokens
Metadata
Metadata
Assignees
Labels
No labels