You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is a common scenario in speculative decoding for the draft model to handle a small number of tokens. When the value of q_seq_len * head_group_size is small, enabling SWAP AB provides a considerable performance improvement.
The performance results on H20 are shown in the chart below.