Skip to content

attn kernel 读取 kv cache 时,prefill 用了 LinearIter,decode 用了 BlockIter,这种设计是出于什么考虑呢? #2518

Time-Limit started this conversation in General
Discussion options

You must be logged in to vote

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants