How does validation loop work ? #4133

Honzys · 2020-10-14T00:48:20Z

Honzys
Oct 14, 2020

Hi,

i am using latest 1.0.0 pytorch lightning and I have stepped into problem with validation.

I have moved from version 0.7.5. Now the validation is somehow ran in parallel with the training. When I am in the middle of epoch, validation starts. Can I disable this behaviour somehow? I don't want to run those 2 stages in parallel because of resources (I want to fully utilize gpus for training and then for validation).

Also I noticed that the validation dataset is only half of the size of training dataset (its dummy case, where training and validation datasets are equal).

I am using LightningDataModule for getting the dataloaders. Running the test on 2 gpus in ddp acceleration mode.

Also I have another question:
Is it possible to run the training not epoch-wise but step-wise? For example I want to do 10_000 steps (easy max_steps=10_000) and I don't care about how many epochs that would be. Further more I would like to run validation at every 1_000th global step (not inside of epoch). So there would be only 10 validations per the whole training. And what if the length of the dataset is 900 samples (I mean I can't use val_check_interval because it only uses number of iterations inside of each epoch and I would never reach 1000 steps inside one epoch). Is there a way to use a global step counter instead of step counter inside of each epoch for this?

I know I can use Iterable Datasets, but downside of that is, that I can not be sure, that every epoch whole dataset is iterated through.

Thanks for your great work!

itsikad · 2020-10-14T07:01:44Z

itsikad
Oct 14, 2020

TL;DR - assuming you only look at the progress bar, it is the correct behavior, they don't run in parallel.

In more details - AFAIK the progress bar shows the status of a full training + validation cycle, when both have the same size it does look like the validation starts in the middle of an epoch and that the validation set is half the size. Moreover, in the case of 2 gpus the number of steps for the full cycle will be the number of samples in your dataset however only half of the steps are required for full training epoch and the rest for validation.

1 reply

maharshi95 Mar 12, 2023

Is there a way to perform validation in parallel? perhaps on the same device but on a different thread / process?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does validation loop work ? #4133

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How does validation loop work ? #4133

Uh oh!

Honzys Oct 14, 2020

Replies: 1 comment · 1 reply

Uh oh!

itsikad Oct 14, 2020

Uh oh!

maharshi95 Mar 12, 2023

Honzys
Oct 14, 2020

Replies: 1 comment 1 reply

itsikad
Oct 14, 2020