Will we ever get training tools like axolotl for gpt-oss type models. Because 8 B60's RIP through distill

I ran a distill off gpt-oss-120b today, at 32 concurrent I was going over 750 t/s at like 85 watts per GPU. 

Then disappointment set it, I have to dequant the model and run on CPU and system ram for training gpt-oss-20b.

I am training the smaller model for my business on my business domains. 

distill makes nice screenshots though. See attached. 

<img width="1017" height="1179" alt="Image" src="https://github.com/user-attachments/assets/97a69a5a-9624-4e96-b86a-dad3d67afc86" />
<img width="2040" height="1435" alt="Image" src="https://github.com/user-attachments/assets/f9021e72-51c5-4c2c-bd7e-b7bc005effbd" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Will we ever get training tools like axolotl for gpt-oss type models. Because 8 B60's RIP through distill #181

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Will we ever get training tools like axolotl for gpt-oss type models. Because 8 B60's RIP through distill #181

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions