Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[Code] best_model_path in ModelCheckpointCallback (rank 0 and driver node) #202

Open
chongxiaoc opened this issue Aug 26, 2022 · 0 comments

Comments

@chongxiaoc
Copy link
Contributor

chongxiaoc commented Aug 26, 2022

Driver node and rank 0 use same path to save and load weights in ModelCheckpointCallback.

It is possible driver node and rank 0 are not on the same machine, or they don't even share the same file system.

Sending local best_model_path on rank 0 back to driver node is meaningless.

Probably rank 0 has to push best model weights to a persistent storage in a custom callback on train_end_stage.

best_model_path

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant