Hi, excellent work!
I noticed it is a little different from the vanilla LLama MLP layer in terms of the implementation of MLP layer.
In this paper, the feedforward_channels is defined as follows
|
feedforward_channels = int(feedforward_channels * 8 / 3) |
while the vanilla LLama feedforward_channels is just what it is.
Is there any consideration for this modification?
Hi, excellent work!
I noticed it is a little different from the vanilla LLama MLP layer in terms of the implementation of MLP layer.
In this paper, the
feedforward_channelsis defined as followsVisionLLaMA/mmpretrain/mmpretrain/models/utils/swiglu_ffn.py
Line 118 in 33fa561
Is there any consideration for this modification?