-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does INITIAL_PROMPT has a meaning, or just a placeholder for prompt re-use? #75
Comments
It seems like the purpose of INITIAL_PROMPT tokens in subsequent calls is to have the correct output text for these tokens from model.generate when you decode it (not really useful if you are only interested in the newly generated tokens and never need to decode the tokens of INITIAL_PROMPT again). You can verify that for yourself by creating an example scenario where you fill new_inputs' first few tokens with random input_ids, set do_sample=False (to make the model's generation deterministic), and compare the outputs of the regular input with the randomized input. Doing so yielded matching newly generated tokens so it verifies the intuition that they are only placeholders. |
Thanks! It does seem to be the case. If so I think that current API is a bit confusing. Should be changed to feeding only new input + cache (past_key_values) IMHO |
The current API provided by the transformers library is catered towards KV caching when generating multiple tokens from a prompt, and since the model itself isn't stateful you would have to retain all the generated inputs between subsequent calls (so that the final output has the correct decoded text you need). There have been discussions in the past for the library to add prompt caching as an inbuilt feature with a cleaner API (which led to a simpler interface to access and reuse the KV cache than what we had to do before), but till they add that we are stuck with the current approach. Also, the current prompt_reuse logic has a small, edge-case related, bug. Check issue #78 that I raised recently. |
Thanks for your great work!
I looked into the prompt_reuse script.
It basically first feed the "INITIAL_PROMPT" through the model:
Then, to use this cache with another prompt suffix, they use:
However, I am trying to understand if the INITIAL_PROMPT tokens have a meaning, other than being "position_ids" placeholder. Like, let's say we would change INITIAL_PROMPT to be some random prompt with same token length. Would we expect other results? I assume that since these tokens KV are taken from the cache, they're only placeholders.
Thanks!
The text was updated successfully, but these errors were encountered: