Skip to content

cooogus/commitgen-model

Repository files navigation

Commitgen-Model

commitgen_dataset.zip is the data needed to train the model unzip it in order for training to work properly

run:

python scripts/harvest_commits.py 

to get data or add to dataset based on the github repos in the REPO_LIST within that file, can modify at will

after data gathering run:

python data/prepare_bpe.py

to reset meta, val, and bin files (tokenizer + detokenizer) if you want to run the model at BPE level, or run data/prepare_char.py if you want to run the model at char level

to run training run:

python train.py config/train_vm_bpe.py

or whatever training config in the config folder that you want to use

to test model with git diffs run:

python generate_commit.py

and you will be prompted to enter in a diff and it will give you back a commit message, or you can add a hardcoded diff and test that out too

  • note: the model only works for pretty small diffs since it only have around 110M parameters when run on the commitgen dataset and trains in about 2 1/2 hours when running on one Tesla T4 GPU

  • credit: model.py and train.py come from the nanoGPT repo: https://github.com/karpathy/nanoGPT/tree/master found here

About

small nanoGPT based model that generates commit messages for short diffs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages