Fix: Discard KV cache for last token before reusing prompt cache for prompt + suffix #79

sannat17 · 2024-10-21T05:11:57Z

… suffix added to prompt

…e fixed version with the kv cache for last token being discarded

sannat17 · 2024-10-22T02:03:21Z

I must note that a more robust way to handle this would have been to compare the tokens from the INITIAL_PROMPT to the first few tokens of new_inputs, and discard the cache for tokens following the first mismatched token.

However, in case of Llama tokenizers this token can only be the last token so just discarding its cache as a rule seems like an easier fix.

sannat17 · 2024-10-26T06:16:35Z

@ArthurZucker I'm wondering what are you thoughts on this fix since you introduced the prompt_reuse recipe

sannat17 added 2 commits October 21, 2024 01:10

Fix: Discard KV cache for last token before reusing prompt cache with…

6606949

… suffix added to prompt

Show the difference between naively reusing prompt cache and using th…

399268b

…e fixed version with the kv cache for last token being discarded

sannat17 marked this pull request as draft October 26, 2024 06:12

sannat17 marked this pull request as ready for review October 26, 2024 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Discard KV cache for last token before reusing prompt cache for prompt + suffix #79

Fix: Discard KV cache for last token before reusing prompt cache for prompt + suffix #79

sannat17 commented Oct 21, 2024 •

edited

Loading

sannat17 commented Oct 22, 2024 •

edited

Loading

sannat17 commented Oct 26, 2024

Fix: Discard KV cache for last token before reusing prompt cache for prompt + suffix #79

Are you sure you want to change the base?

Fix: Discard KV cache for last token before reusing prompt cache for prompt + suffix #79

Conversation

sannat17 commented Oct 21, 2024 • edited Loading

sannat17 commented Oct 22, 2024 • edited Loading

sannat17 commented Oct 26, 2024

sannat17 commented Oct 21, 2024 •

edited

Loading

sannat17 commented Oct 22, 2024 •

edited

Loading