Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test train save load, with and without fsdp #148

Merged
merged 1 commit into from
Dec 14, 2023

Conversation

jmercat
Copy link
Collaborator

@jmercat jmercat commented Dec 11, 2023

Should help address #145, please tell me if I should actually call main as suggested in the issue or if this is fine.

@achalddave achalddave self-requested a review December 11, 2023 20:39
Copy link
Collaborator

@achalddave achalddave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! couple nits, left some small comments.

open_lm/main.py Outdated Show resolved Hide resolved
tests/shared.py Outdated Show resolved Hide resolved
tests/shared.py Outdated Show resolved Hide resolved
tests/test_save_load.py Outdated Show resolved Hide resolved
tests/test_save_load.py Outdated Show resolved Hide resolved
@jmercat
Copy link
Collaborator Author

jmercat commented Dec 11, 2023

Thanks for the review Achal. Actually I added a test that the parameters are different after train one epoch but it fails... so I'll solve that and push the fix.
Should I make a test for loading from s3?

@jmercat jmercat force-pushed the test_resume branch 2 times, most recently from c8ee305 to 7086109 Compare December 12, 2023 01:37
@achalddave
Copy link
Collaborator

I think we can skip the s3 test, it'll be annoying with the CI, and we use --remote-sync pretty commonly anyway.

@jmercat jmercat requested a review from achalddave December 12, 2023 02:15
@jmercat jmercat force-pushed the test_resume branch 2 times, most recently from 6bf34a8 to 5fadff4 Compare December 13, 2023 02:01
@achalddave achalddave merged commit 813d501 into mlfoundations:main Dec 14, 2023
2 checks passed
@jmercat jmercat deleted the test_resume branch December 14, 2023 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants