Skip to content

fix model naming schemes for weight decay #2

@rhubarbwu

Description

@rhubarbwu

There are several confusing model naming schemes. The most confusing is currently for weight decay.

  • $\beta = 0$: TinyStories-01x0064_01n
  • $\beta = 0.0005$: TinyStories-01x0064_01d
  • $\beta = 0.1$: TinyStories-01x0064_01L

This is inconsistent across different $\beta$ values and also confusing with the $d$ and $L$ in referring to model scale.

We will likely have to write a script that auto-clones all of the models to update them. Then we'll have to correct our experimental scripts and artifacts themselves.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions