question about the train learning rate #24

miaodog · 2023-04-11T08:47:04Z

hi, appreciate the work, i have two questions:

from the code https://github.com/yizhongw/Tk-Instruct/blob/main/scripts/train_tk_instruct.sh#L34, the learning rate is 5e-05, why so small? the learning rate from bigscience t0(https://arxiv.org/pdf/2110.08207.pdf) is learning rate of 1e-3。Why is there such a big difference? have you tried large learning rate? if i collect 40 million labeled data, and continue multiltask finetune from tk-instruct, what learning rate should i use?
the lr_scheduler_type is set to constant, why not use linear？(from min lr to max lr over warmup_num_steps steps, and then decay at linear rate over the remaining training steps.)

Provide feedback