Skip to content

Allow to warm up a dataset to avoid re-reading (validation) batches#40

Merged
ravwojdyla merged 1 commit intomainfrom
rav-prefill-val-ds
Jul 16, 2025
Merged

Allow to warm up a dataset to avoid re-reading (validation) batches#40
ravwojdyla merged 1 commit intomainfrom
rav-prefill-val-ds

Conversation

@ravwojdyla
Copy link
Copy Markdown
Contributor

In our experiments we often run validation on subset of the validation dataset (e.g. 32 batches) relatively frequently (e.g. every 32 steps), as well as longer validation at the end of training. The way we have this implemented right now in lightning will reread the validation dataset on each validation, this creates a gap in the GPU utilization:

image

To avoid that we can keep the intermediate validation batches in memory.

@ravwojdyla ravwojdyla force-pushed the rav-skip-prob-batch branch from 4158d42 to 3848616 Compare July 16, 2025 01:21
@ravwojdyla ravwojdyla force-pushed the rav-prefill-val-ds branch from 0d599fb to 8b2e4cc Compare July 16, 2025 01:23
@ravwojdyla ravwojdyla requested a review from yonromai July 16, 2025 01:23
Base automatically changed from rav-skip-prob-batch to main July 16, 2025 01:23
@ravwojdyla ravwojdyla force-pushed the rav-prefill-val-ds branch from 8b2e4cc to 5e69ba6 Compare July 16, 2025 01:23
@ravwojdyla ravwojdyla merged commit c2e6434 into main Jul 16, 2025
2 checks passed
@ravwojdyla ravwojdyla deleted the rav-prefill-val-ds branch July 16, 2025 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants