-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revamped val (bootstrap ci to support seq and tok intervals, many perplexity evals) #142
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments, not blocking
parser.add_argument("--eps", type=float, default=None, help="Adam epsilon.") | ||
parser.add_argument("--lr", type=float, default=5.0e-4, help="Learning rate.") | ||
parser.add_argument("--beta1", type=float, default=0.9, help="Adam beta 1.") | ||
parser.add_argument("--beta2", type=float, default=0.95, help="Adam beta 2.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're changing the default beta2 here compared to before right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but beta2 of 0.95 has been pretty standard for most open_lm runs so far
"loss_upper_95": upper, | ||
"loss_sequences_lower_95": lower_seq, | ||
"loss_sequences_upper_95": upper_seq, | ||
"loss_tokens_lower_95": lower_tok, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also save loss tokens mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that we never see partial sequences i think this will just be loss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit but LGTM
open_lm/main.py
Outdated
|
||
if is_master(args): | ||
with fsspec.open(os.path.join(args.checkpoint_path, "results.jsonl"), "a") as f: | ||
f.write(json.dumps(evaluation_metrics)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be good to do this as 1 write, since we are doing this over fsspec, there could be a network error and we are left with a corrupt jsonl or something (without the newline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in do the dumps + \n at once so its idempotent and keeps jsonl consistent. In general append mode is scary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg
No description provided.