Replies: 1 comment
-
Thanks for the kind words. I am glad to hear you are liking the book overall. Unfortunately, this discrepancy is expected, which is why I didn't recommend MPS in the main chapters. It was really bad when I started working on the book (PyTorch 2.0 back then) but incrementally improved in newer PyTorch versions. Like you said, in newer PyTorch versions, it's mostly fine now except Chapter 7. There were also some relevant discussions in the forum about that, e.g., |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been enjoying the book immensely and have been learning a lot. I'm running on an M1 Max using the MPS backend. Most of the time the output of my code results in the exact same output as the book. However, during some training, things go off the rails.
Specifically, if I look section 7.6 Fine-tuning the LLM on instruction data, when running the following code:
I get:
Whether I'm running on CPU or MPS. However, the next bit of code to fine-tune the model, differs greatly.
CPU:
But on MPS, it starts off similarly, but goes off the rails around step 30:
Is there anything I can do about this? I understand that there may be some differences, but theoretically, I should be able to fine-tune the model under some conditions using MPS, right (even if the output won't match the book's)? Would it be a matter of changing the learning rate/weight decay?
Any thoughts on how to make it work using MPS?
Beta Was this translation helpful? Give feedback.
All reactions