Commitgen-Model

commitgen_dataset.zip is the data needed to train the model unzip it in order for training to work properly

run:

python scripts/harvest_commits.py

to get data or add to dataset based on the github repos in the REPO_LIST within that file, can modify at will

after data gathering run:

python data/prepare_bpe.py

to reset meta, val, and bin files (tokenizer + detokenizer) if you want to run the model at BPE level, or run data/prepare_char.py if you want to run the model at char level

to run training run:

python train.py config/train_vm_bpe.py

or whatever training config in the config folder that you want to use

to test model with git diffs run:

python generate_commit.py

and you will be prompted to enter in a diff and it will give you back a commit message, or you can add a hardcoded diff and test that out too

note: the model only works for pretty small diffs since it only have around 110M parameters when run on the commitgen dataset and trains in about 2 1/2 hours when running on one Tesla T4 GPU
credit: model.py and train.py come from the nanoGPT repo: https://github.com/karpathy/nanoGPT/tree/master found here

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
data		data
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
export_to_safetensors.py		export_to_safetensors.py
generate_commit.py		generate_commit.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Commitgen-Model

About

Uh oh!

Releases

Packages

Languages

cooogus/commitgen-model

Folders and files

Latest commit

History

Repository files navigation

Commitgen-Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages