You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the lr_scheduler_type is set to constant, why not use linear?(from min lr to max lr over warmup_num_steps steps, and then decay at linear rate over the remaining training steps.)
The text was updated successfully, but these errors were encountered:
hi, appreciate the work, i have two questions:
from the code https://github.com/yizhongw/Tk-Instruct/blob/main/scripts/train_tk_instruct.sh#L34, the learning rate is 5e-05, why so small? the learning rate from bigscience t0(https://arxiv.org/pdf/2110.08207.pdf) is learning rate of 1e-3。Why is there such a big difference? have you tried large learning rate? if i collect 40 million labeled data, and continue multiltask finetune from tk-instruct, what learning rate should i use?
the lr_scheduler_type is set to constant, why not use linear?(from min lr to max lr over warmup_num_steps steps, and then decay at linear rate over the remaining training steps.)
The text was updated successfully, but these errors were encountered: