Users not warned when input is longer than supported context

This is an issue that stems from Ollama's own behaviour, see e.g. these issues among others ([8099](https://github.com/ollama/ollama/issues/8099) and [7043](https://github.com/ollama/ollama/issues/7043)), but it inevitably impacts also `mall`users. 

In brief, by default, `ollama` truncates input at 2048 tokens, even if the input is longer and the model itself would support much larger contexts. 

The [workaround outlined in the relevant issue](https://github.com/ollama/ollama/issues/8099#issuecomment-2543316682) works, but the misbehaviour is likely to remain annoyingly invisible to users who don't watch the logs (`ollama serve` includes in the input lines such as ` msg="truncating input prompt" limit=2048 prompt=8923 keep=5 new=2048`.

In fact, the issue can ultimately be noticed by the user, as the response is not consistent with the prompt, as the prompt itself gets cut off. 

This [recent blog post](https://posit.co/blog/mall-ai-powered-text-analysis/) inadvertently shows this issue, as it processes texts that are longer than 2048 tokens. 

It first asks Ollama to summarise, but even if it requested it to extract contents, the response would still be a summary (as somewhat implied in the beginning of the text): the request is simply ignored, but even if it was processed, it would apply it to a truncated input. 

All of this would remain invisible to `mall` users, who simply wouldn't get appropriate responses. 

Until `ollama` introduces a better mechanism to manage this, I suppose including a warning when the input hits the 2048 tokens, or at least pointing at this issue in the documentation would make things easier to troubleshoot. 

Below some examples that show this misbehaviour (which anyway emerges even running the code included in the [above-mentioned blog post]((https://posit.co/blog/mall-ai-powered-text-analysis/))), as well as calls to the Ollama API made with `httr2` that allow to see that the evaluated input is truncated at 2048 tokens when the input is longer. 
 
```
# read example dataset used in this post
# https://posit.co/blog/mall-ai-powered-text-analysis/
cop_data <- readr::read_csv( "https://posit.co/wp-content/uploads/2025/03/cop_data2.csv")

library(mall)

llm_use("ollama", "llama3.2", seed = 100, .cache = tempdir())

cop_electricity <- llm_extract(
  cop_data |> dplyr::slice(1), #data
  CleanedText, #column
  label = "electricity keywords"
)

## whatever you ask, you still get a summary, not the keywords
cop_electricity$.extract




## process directly with httr2 to see that the prompt_eval_count in the response is truncated at 2048

req <- httr2::request("http://localhost:11434") |>
  httr2::req_url_path("api/generate") |>
  httr2::req_headers("Content-Type" = "application/json") |> 
  httr2::req_body_json(
    list(
      model = "llama3.2",
      prompt = paste(cop_data$CleanedText[1], 
                     "What are the very first ten words of this text?"),
      system = "You respond with the first ten words of the input you receive.",
      stream = FALSE
    )
  )

resp <- req |>
  httr2::req_perform()  |> 
  httr2::resp_body_json()

resp$prompt_eval_count
resp$response

### changing model, preventing truncation
# new model - llama3.2-32k - created as described here: 
# https://github.com/ollama/ollama/issues/8099#issuecomment-2543316682

req_32 <- httr2::request("http://localhost:11434") |>
  httr2::req_url_path("api/generate") |>
  httr2::req_headers("Content-Type" = "application/json") |> 
  httr2::req_body_json(
    list(
      model = "llama3.2-32k",
      prompt = paste(cop_data$CleanedText[1], 
                     "What are the very first ten words of this text?"),
      system = "You respond with the first ten words of the input you receive.",
      stream = FALSE
    )
  )

resp_32 <- req_32|>
  httr2::req_perform()  |> 
  httr2::resp_body_json()

resp_32$prompt_eval_count
resp_32$response


```

you will see that the second reply is correct, while the first is not. `mall` users have no way to know that their request is effectively truncated, leading to the type of inadvertent use found in that blog post (of course, just an example).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Users not warned when input is longer than supported context #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Users not warned when input is longer than supported context #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions