Scaling validation to large datasets #436

jn2clark · 2023-02-19T10:26:12Z

Hi, I had a quick question about the validation. I seem to be running into memory issues with my validation set. After checking the code this seems to be expected (https://github.com/mlfoundations/open_clip/blob/main/src/training/train.py#L251). Just wondering if any work has been done here before I try and implement something? What would the ideal approach be? I was thinking of just batching the validation data and reporting the metrics from each batch.

mehdidc · 2023-02-19T10:49:19Z

@jn2clark there is a PR on distributed validation that could help #176, it is still not finished

jn2clark · 2023-02-20T01:18:45Z

Thanks @mehdidc ! I will take a look

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling validation to large datasets #436

Scaling validation to large datasets #436

jn2clark commented Feb 19, 2023

mehdidc commented Feb 19, 2023

jn2clark commented Feb 20, 2023

Scaling validation to large datasets #436

Scaling validation to large datasets #436

Comments

jn2clark commented Feb 19, 2023

mehdidc commented Feb 19, 2023

jn2clark commented Feb 20, 2023