[Train-on-what; human simulator] #85

ruiqi-zhong · 2025-11-11T04:24:38Z

Adding options to train on user-specified messages.

Ran

python -m tinker_cookbook.recipes.chat_sl.train
    model_name=Qwen/Qwen3-8B-Base \
    dataset=no_robots \
    learning_rate=5e-4 \
    batch_size=64 \
    lora_rank=64 \
    eval_every=20 \
    save_every=20 \
    wandb_project=cookbook_sl

and I can reproduce the performance.

joschu

This looks good. In case you haven't yet, could you do viz_sft_dataset with a couple different renderers and train_on_what settings and look at the outputted colorized text and make sure it looks right?

joschu · 2025-11-13T21:50:11Z

In particular, try both the role_colon renderer (nontrivial ac_tail) and another special token renderer.

ruiqi-zhong · 2025-11-14T03:10:14Z

viz_sft_dataset

Thanks for checking! yeah it works.

adding a renderer for human simulator

c0ca34e

ruiqi-zhong requested a review from joschu November 11, 2025 04:24

ruiqi-zhong added 3 commits November 12, 2025 18:04

more general customization on what to train on

c274d2a

n

3ba9f23

b

408236c

joschu approved these changes Nov 13, 2025

View reviewed changes

ruiqi-zhong added 2 commits November 14, 2025 03:13

b

dfe7f64

b

d56ef1e

ruiqi-zhong merged commit 6c9f7a4 into main Nov 14, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Train-on-what; human simulator] #85

[Train-on-what; human simulator] #85

ruiqi-zhong commented Nov 11, 2025 •

edited

Loading

Uh oh!

joschu left a comment

Uh oh!

joschu commented Nov 13, 2025

Uh oh!

ruiqi-zhong commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Train-on-what; human simulator] #85

[Train-on-what; human simulator] #85

Conversation

ruiqi-zhong commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joschu left a comment

Choose a reason for hiding this comment

Uh oh!

joschu commented Nov 13, 2025

Uh oh!

ruiqi-zhong commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ruiqi-zhong commented Nov 11, 2025 •

edited

Loading