Does INITIAL_PROMPT has a meaning, or just a placeholder for prompt re-use? #75

OmriKaduri · 2024-10-16T06:47:59Z

Thanks for your great work!

I looked into the prompt_reuse script.

It basically first feed the "INITIAL_PROMPT" through the model:

inputs = tokenizer(INITIAL_PROMPT, return_tensors="pt").to("cuda")
with torch.no_grad():
    prompt_cache = model(**inputs, past_key_values = prompt_cache).past_key_values

Then, to use this cache with another prompt suffix, they use:

prompt = "Why are french people obsessed with french?"
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
past_key_values = copy.deepcopy(prompt_cache)
outputs = model.generate(**new_inputs, past_key_values=past_key_values,max_new_tokens=20)

However, I am trying to understand if the INITIAL_PROMPT tokens have a meaning, other than being "position_ids" placeholder. Like, let's say we would change INITIAL_PROMPT to be some random prompt with same token length. Would we expect other results? I assume that since these tokens KV are taken from the cache, they're only placeholders.

Thanks!

The text was updated successfully, but these errors were encountered:

sannat17 · 2024-10-22T01:53:38Z

It seems like the purpose of INITIAL_PROMPT tokens in subsequent calls is to have the correct output text for these tokens from model.generate when you decode it (not really useful if you are only interested in the newly generated tokens and never need to decode the tokens of INITIAL_PROMPT again).

You can verify that for yourself by creating an example scenario where you fill new_inputs' first few tokens with random input_ids, set do_sample=False (to make the model's generation deterministic), and compare the outputs of the regular input with the randomized input. Doing so yielded matching newly generated tokens so it verifies the intuition that they are only placeholders.

OmriKaduri · 2024-10-22T08:42:50Z

Thanks! It does seem to be the case. If so I think that current API is a bit confusing. Should be changed to feeding only new input + cache (past_key_values) IMHO

sannat17 · 2024-10-22T09:22:30Z

The current API provided by the transformers library is catered towards KV caching when generating multiple tokens from a prompt, and since the model itself isn't stateful you would have to retain all the generated inputs between subsequent calls (so that the final output has the correct decoded text you need).

There have been discussions in the past for the library to add prompt caching as an inbuilt feature with a cleaner API (which led to a simpler interface to access and reuse the KV cache than what we had to do before), but till they add that we are stuck with the current approach.

Also, the current prompt_reuse logic has a small, edge-case related, bug. Check issue #78 that I raised recently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does INITIAL_PROMPT has a meaning, or just a placeholder for prompt re-use? #75

Does INITIAL_PROMPT has a meaning, or just a placeholder for prompt re-use? #75

OmriKaduri commented Oct 16, 2024

sannat17 commented Oct 22, 2024 •

edited

Loading

OmriKaduri commented Oct 22, 2024

sannat17 commented Oct 22, 2024 •

edited

Loading

Does INITIAL_PROMPT has a meaning, or just a placeholder for prompt re-use? #75

Does INITIAL_PROMPT has a meaning, or just a placeholder for prompt re-use? #75

Comments

OmriKaduri commented Oct 16, 2024

sannat17 commented Oct 22, 2024 • edited Loading

OmriKaduri commented Oct 22, 2024

sannat17 commented Oct 22, 2024 • edited Loading

sannat17 commented Oct 22, 2024 •

edited

Loading

sannat17 commented Oct 22, 2024 •

edited

Loading