Skip to content

Users not warned when input is longer than supported context #43

@giocomai

Description

@giocomai

This is an issue that stems from Ollama's own behaviour, see e.g. these issues among others (8099 and 7043), but it inevitably impacts also mallusers.

In brief, by default, ollama truncates input at 2048 tokens, even if the input is longer and the model itself would support much larger contexts.

The workaround outlined in the relevant issue works, but the misbehaviour is likely to remain annoyingly invisible to users who don't watch the logs (ollama serve includes in the input lines such as msg="truncating input prompt" limit=2048 prompt=8923 keep=5 new=2048.

In fact, the issue can ultimately be noticed by the user, as the response is not consistent with the prompt, as the prompt itself gets cut off.

This recent blog post inadvertently shows this issue, as it processes texts that are longer than 2048 tokens.

It first asks Ollama to summarise, but even if it requested it to extract contents, the response would still be a summary (as somewhat implied in the beginning of the text): the request is simply ignored, but even if it was processed, it would apply it to a truncated input.

All of this would remain invisible to mall users, who simply wouldn't get appropriate responses.

Until ollama introduces a better mechanism to manage this, I suppose including a warning when the input hits the 2048 tokens, or at least pointing at this issue in the documentation would make things easier to troubleshoot.

Below some examples that show this misbehaviour (which anyway emerges even running the code included in the above-mentioned blog post), as well as calls to the Ollama API made with httr2 that allow to see that the evaluated input is truncated at 2048 tokens when the input is longer.

# read example dataset used in this post
# https://posit.co/blog/mall-ai-powered-text-analysis/
cop_data <- readr::read_csv( "https://posit.co/wp-content/uploads/2025/03/cop_data2.csv")

library(mall)

llm_use("ollama", "llama3.2", seed = 100, .cache = tempdir())

cop_electricity <- llm_extract(
  cop_data |> dplyr::slice(1), #data
  CleanedText, #column
  label = "electricity keywords"
)

## whatever you ask, you still get a summary, not the keywords
cop_electricity$.extract




## process directly with httr2 to see that the prompt_eval_count in the response is truncated at 2048

req <- httr2::request("http://localhost:11434") |>
  httr2::req_url_path("api/generate") |>
  httr2::req_headers("Content-Type" = "application/json") |> 
  httr2::req_body_json(
    list(
      model = "llama3.2",
      prompt = paste(cop_data$CleanedText[1], 
                     "What are the very first ten words of this text?"),
      system = "You respond with the first ten words of the input you receive.",
      stream = FALSE
    )
  )

resp <- req |>
  httr2::req_perform()  |> 
  httr2::resp_body_json()

resp$prompt_eval_count
resp$response

### changing model, preventing truncation
# new model - llama3.2-32k - created as described here: 
# https://github.com/ollama/ollama/issues/8099#issuecomment-2543316682

req_32 <- httr2::request("http://localhost:11434") |>
  httr2::req_url_path("api/generate") |>
  httr2::req_headers("Content-Type" = "application/json") |> 
  httr2::req_body_json(
    list(
      model = "llama3.2-32k",
      prompt = paste(cop_data$CleanedText[1], 
                     "What are the very first ten words of this text?"),
      system = "You respond with the first ten words of the input you receive.",
      stream = FALSE
    )
  )

resp_32 <- req_32|>
  httr2::req_perform()  |> 
  httr2::resp_body_json()

resp_32$prompt_eval_count
resp_32$response


you will see that the second reply is correct, while the first is not. mall users have no way to know that their request is effectively truncated, leading to the type of inadvertent use found in that blog post (of course, just an example).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions