I ran a distill off gpt-oss-120b today, at 32 concurrent I was going over 750 t/s at like 85 watts per GPU. Then disappointment set it, I have to dequant the model and run on CPU and system ram for training gpt-oss-20b. I am training the smaller model for my business on my business domains. distill makes nice screenshots though. See attached. <img width="1017" height="1179" alt="Image" src="https://github.com/user-attachments/assets/97a69a5a-9624-4e96-b86a-dad3d67afc86" /> <img width="2040" height="1435" alt="Image" src="https://github.com/user-attachments/assets/f9021e72-51c5-4c2c-bd7e-b7bc005effbd" />