π The doc issue or request
For small models, my understanding is that torchforge supports the following colocation strategy:
- offload torchtitan's weights to torchstore
- run the generator on the GPU cards
- release generator's GPU memory occupation
- reload torchtitan's weights and continue training
But I can't see such an application in the demos
Suggest a potential alternative/fix
No response