Sequentially load / unload train datasets to GPU #20676
Unanswered
meilame-tayebjee
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have 100 subsamples of a huge dataset that are identified with an index idx.
Let's say I want to use 80 subsamples as training set (idx 1 to 80), 10 as val and 10 as test. Each subsample is approximately 6 GB in GPU memory.
Note that I have two H100 GPUs, each of them having 95Gb memory. My model is a GPT-like model, having 31 millions parameters.
I want to use Lightning for training over several epochs, sequentially loading / unloading the datasets on to the GPUs. But without having them all in memory at once - so I do not even want to initialize the datasets beforehand (I also need to initialize them sequentially).
Basically, during one epoch, I want to load first train dataset on GPU / train / unload and load next one --> until the last training dataset. And restart for another epoch.
I started using the
DataModule
class, with something like the following. However, when callingself.trainer.datamodule.next_train_subsample()
the dataset in indeed updated as I want, but I am not sure if thedata_loader
takes into account that update.Happy to have any insights on how to do it in the right way ! Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions