Replies: 1 comment 1 reply
-
I think they swallowed it lol. The day6 article suggests the actual parallelism for inference deployment does not include TP as well as SP, but only EP and DP which makes more sense. The SP extremely confused me but now I decide to let it go :) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The open source projects released in the past five days are amazing! I'm a little confused about the parallel strategy during inference.
The technical report indicates that the attention part employs 4-way Tensor Parallelism (TP4) with Sequence Parallelism (SP). I wonder what the implementation details of SP are, both in the prefilling and decoding stages. Are you going to open-source the code?
Beta Was this translation helpful? Give feedback.
All reactions