What is the prompt length passed to the model? #1111

hhimanshu · 2024-10-27T11:14:32Z

hhimanshu
Oct 27, 2024

Hello,
The lib works great, so thank you for all the hard work. I am enjoying working with it in VSCode.
One thing I observed that when I test and my input is very long (< 128k), the output is not what I expect. It seems to assume and hallucinate. I wonder,

How do I ensure if model received all of my input. AND
If there is a way to tweak the context length on the function level so that I change it as needed. AND
If there is way I can see the context in my request.

Thanks in advance

Answered by aaronvg

Oct 29, 2024

Interesting, it seems like ollama doesn't error out and just truncates the input without throwing an exception like other model providers. Here is an ollama issue other folks raised: ollama/ollama#4967

To get around this, I would run the tokenizer on the inputs of your prompts, and add a buffer to account for your prompt instructions themselves. You can manually copy paste the prompt instructions from the playground prompt-preview and ru it through the tokenizer for the model you're using for. We have a "show tokens" checkbox that also does this, but it uses the OpenAI tokenizer.

the only way to do this is in python/ TS or whichever language you're using, before you call the BAML function…

View full answer

hhimanshu · 2024-10-27T11:26:45Z

hhimanshu
Oct 27, 2024
Author

As I am using OLLAMA, I tried to see what's the context length and this is what I found

INFO [update_slots] input truncated | n_ctx=2048 n_erase=2666 n_keep=24 n_left=2024 n_shift=1012 tid="0x1f8ae3ac0" timestamp=1730028315
[GIN] 2024/10/28 - 00:25:18 | 200 |  2.887899042s |       127.0.0.1 | POST     "/v1/chat/completions"
INFO [update_slots] input truncated | n_ctx=2048 n_erase=13176 n_keep=24 n_left=2024 n_shift=1012 tid="0x1f8ae3ac0" timestamp=1730028345
[GIN] 2024/10/28 - 00:25:52 | 200 |  6.354080833s |       127.0.0.1 | POST     "/v1/chat/completions"

0 replies

hhimanshu · 2024-10-27T11:28:15Z

hhimanshu
Oct 27, 2024
Author

As per ollama docs

By default, Ollama uses a context window size of 2048 tokens.

0 replies

aaronvg · 2024-10-29T18:34:29Z

aaronvg
Oct 29, 2024
Maintainer

Interesting, it seems like ollama doesn't error out and just truncates the input without throwing an exception like other model providers. Here is an ollama issue other folks raised: ollama/ollama#4967

To get around this, I would run the tokenizer on the inputs of your prompts, and add a buffer to account for your prompt instructions themselves. You can manually copy paste the prompt instructions from the playground prompt-preview and ru it through the tokenizer for the model you're using for. We have a "show tokens" checkbox that also does this, but it uses the OpenAI tokenizer.

the only way to do this is in python/ TS or whichever language you're using, before you call the BAML function at the moment.

1 reply

hhimanshu Oct 29, 2024
Author

I moved to LMStudio that does not have this limitation as of yet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boundary

What is the prompt length passed to the model? #1111

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Boundary

What is the prompt length passed to the model? #1111

hhimanshu Oct 27, 2024

Replies: 3 comments · 1 reply

hhimanshu Oct 27, 2024 Author

hhimanshu Oct 27, 2024 Author

aaronvg Oct 29, 2024 Maintainer

hhimanshu Oct 29, 2024 Author

hhimanshu
Oct 27, 2024

Replies: 3 comments 1 reply

hhimanshu
Oct 27, 2024
Author

hhimanshu
Oct 27, 2024
Author

aaronvg
Oct 29, 2024
Maintainer

hhimanshu Oct 29, 2024
Author