Models's output from recipe with prompt reuse optimization does not match the non-cached generation #78

sannat17 · 2024-10-21T04:51:10Z

In performance_optimization/prompt_reuse.py, the current method of storing the cached prompt does not correctly discard the KV cache for the last token (and instead follows the same caching recipe as required for model.generate).

For context, look at these comments and discussions:

After running some preliminary tests, the current prompt_reuse.py recipe consistently generates different outputs than the non-cached generation, while using the method from the linked github issue produces consistent generations.

The text was updated successfully, but these errors were encountered:

sannat17 · 2024-10-22T01:59:34Z

Note:

The problem does not present itself if your INITIAL_PROMPT ends in special tokens (for eg. by generation from a chat_template where the last token may be a role based token).
This is only a problem when tokenizer behaviour changes for the last token of INITIAL_PROMPT when a suffix is added, where the first few characters of the suffix could end up becoming a part of the last token of INITIAL_PROMPT, which is not that uncommon.

sannat17 linked a pull request Oct 21, 2024 that will close this issue

Fix: Discard KV cache for last token before reusing prompt cache for prompt + suffix #79

Open

sannat17 mentioned this issue Oct 22, 2024

Does INITIAL_PROMPT has a meaning, or just a placeholder for prompt re-use? #75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models's output from recipe with prompt reuse optimization does not match the non-cached generation #78

Models's output from recipe with prompt reuse optimization does not match the non-cached generation #78

sannat17 commented Oct 21, 2024 •

edited

Loading

sannat17 commented Oct 22, 2024

Models's output from recipe with prompt reuse optimization does not match the non-cached generation #78

Models's output from recipe with prompt reuse optimization does not match the non-cached generation #78

Comments

sannat17 commented Oct 21, 2024 • edited Loading

sannat17 commented Oct 22, 2024

sannat17 commented Oct 21, 2024 •

edited

Loading