Skip to content

Will we ever get training tools like axolotl for gpt-oss type models. Because 8 B60's RIP through distillΒ #181

@cdscustoms-coder

Description

@cdscustoms-coder

I ran a distill off gpt-oss-120b today, at 32 concurrent I was going over 750 t/s at like 85 watts per GPU.

Then disappointment set it, I have to dequant the model and run on CPU and system ram for training gpt-oss-20b.

I am training the smaller model for my business on my business domains.

distill makes nice screenshots though. See attached.

Image Image

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions