Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the train learning rate #24

Open
miaodog opened this issue Apr 11, 2023 · 0 comments
Open

question about the train learning rate #24

miaodog opened this issue Apr 11, 2023 · 0 comments

Comments

@miaodog
Copy link

miaodog commented Apr 11, 2023

hi, appreciate the work, i have two questions:

  1. from the code https://github.com/yizhongw/Tk-Instruct/blob/main/scripts/train_tk_instruct.sh#L34, the learning rate is 5e-05, why so small? the learning rate from bigscience t0(https://arxiv.org/pdf/2110.08207.pdf) is learning rate of 1e-3。Why is there such a big difference? have you tried large learning rate? if i collect 40 million labeled data, and continue multiltask finetune from tk-instruct, what learning rate should i use?
    image

  2. the lr_scheduler_type is set to constant, why not use linear?(from min lr to max lr over warmup_num_steps steps, and then decay at linear rate over the remaining training steps.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant