-
In the flax LSTMCell implementation we use 8 different Dense layers. Using a single layer and splitting (as e.g. dm-Haiku does) is significantly faster. I got a ~1.7x speed-up. Is there any reason we don't do this here? |
Beta Was this translation helpful? Give feedback.
Answered by
cgarciae
Dec 19, 2023
Replies: 1 comment 2 replies
-
Have you check OptimizedLSTMCell? |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
leonl42
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Have you check OptimizedLSTMCell?