Jit'ing the grad function #1958

srush · 2022-03-03T02:41:43Z

srush
Mar 3, 2022

I have an inner function in my network where calling jit(grad(fn)) seems to yield a big memory savings. However in the standard flax pattern of training I don't really see how to do this. Since we call .apply each time, I would need to run jit in the inner loop, which seems bad.

Is there a way to do this that I'm missing?

Answered by marcvanzee

Mar 3, 2022

(I am not sure if I fully understand your question, so please clarify if my answer below is not what you were asking.)

Do you mean you are calling grad inside your Module's apply function? If you jit something inside another jit block, then the inner jit should be a no-op. We usually jit the entire train function, which calls the grad function so that will be jitted inside this bigger block (we actually prefer jitting bigger blocks because it gives XLA more opportunity for optimizing things -- at the cost of longer compile time).

Since we call .apply each time, I would need to run jit in the inner loop, which seems bad.

The jitted function will be compiled only once for each shape it is…

View full answer

marcvanzee · 2022-03-03T07:42:37Z

marcvanzee
Mar 3, 2022
Maintainer

(I am not sure if I fully understand your question, so please clarify if my answer below is not what you were asking.)

Do you mean you are calling grad inside your Module's apply function? If you jit something inside another jit block, then the inner jit should be a no-op. We usually jit the entire train function, which calls the grad function so that will be jitted inside this bigger block (we actually prefer jitting bigger blocks because it gives XLA more opportunity for optimizing things -- at the cost of longer compile time).

Since we call .apply each time, I would need to run jit in the inner loop, which seems bad.

The jitted function will be compiled only once for each shape it is called with, so if the shapes are the same in the loop then this is not bad.

1 reply

srush Mar 3, 2022
Author

Thanks! I missed that the whole train step was jit'd.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jit'ing the grad function #1958

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Jit'ing the grad function #1958

srush Mar 3, 2022

Replies: 1 comment · 1 reply

marcvanzee Mar 3, 2022 Maintainer

srush Mar 3, 2022 Author

srush
Mar 3, 2022

Replies: 1 comment 1 reply

marcvanzee
Mar 3, 2022
Maintainer

srush Mar 3, 2022
Author