Skip to content

Direct UTF-8 Cyrillic prompts are overcounted to 262147 tokens across multiple 262144-context models #365

@mikhailsal

Description

@mikhailsal

Summary

Direct UTF-8 Cyrillic requests appear to be overestimated by OpenRouter's context-length check and can fail with the exact same 262147 / 262146 text estimate across multiple different 262144-context models.

This does not appear to be SDK-specific and does not appear to be caused by JSON \u escaping.

I can reproduce it with a raw HTTPS POST to https://openrouter.ai/api/v1/chat/completions using explicit UTF-8 bytes:

json.dumps(payload, ensure_ascii=False).encode("utf-8")

The outgoing request body contains 0 \u escape sequences.

Why this looks like an OpenRouter-side counting bug

I did not start with a single failing payload. I gradually increased the amount
of Cyrillic context until the request flipped from success to failure.

For the same direct UTF-8 Cyrillic threshold search:

  • google/gemma-4-31b-it:free
    • accepted at 73043 prompt tokens
    • then rejected at the next probe as 262147 requested tokens
  • qwen/qwen3-235b-a22b-2507
    • accepted at 106397 prompt tokens
    • then rejected at the next probe as 262147 requested tokens
  • bytedance-seed/seed-2.0-mini
    • accepted at 88124 prompt tokens
    • then rejected at the next probe as 262147 requested tokens

The accepted provider-counted prompt sizes differ a lot by model, but the rejected UTF-8 Cyrillic body gets the same OpenRouter estimate across different model routes.

That suggests a shared pre-routing estimator or context-check problem rather than a model-specific tokenizer issue.

It also fails far earlier than the advertised 262144 context limit would
suggest. On Gemma-4, the last accepted direct UTF-8 probe was only 73043
provider-counted prompt tokens, and the very next probe was rejected by
OpenRouter as 262147 requested tokens.

Reproduction

Use a direct request to the public OpenRouter endpoint. The important points are:

  • raw HTTPS request to https://openrouter.ai/api/v1/chat/completions
  • Content-Type: application/json; charset=utf-8
  • body encoded with ensure_ascii=False and .encode("utf-8")
  • no local proxy involved
  • max_tokens = 1
  • a long Russian-only text body

Example reproducer:

import json
import httpx

OPENROUTER_URL = "https://openrouter.ai/api/v1/chat/completions"
OPENROUTER_KEY = "<your key>"

RU_WORDS = [
    "вечер","утро","полдень","сумерки","рассвет","весна","лето","осень",
    "зима","ветер","дождь","снег","туман","гроза","небо","облако",
    "солнце","луна","звезда","река","озеро","море","берег","остров",
    "поле","лес","сад","тропа","дорога","мост","город","деревня",
    "площадь","улица","двор","окно","дверь","крыша","стена","комната",
    "стол","стул","книга","письмо","карта","свеча","лампа","зеркало",
    "чашка","хлеб","сыр","яблоко","груша","ягода","чай","кофе",
    "вода","молоко","музыка","песня","голос","смех","шепот","тишина",
    "разговор","история","память","мысль","сон","надежда","тревога","радость",
    "печаль","улыбка","взгляд","жест","шаг","встреча","путь","поездка",
    "работа","отдых","дело","помощь","вопрос","ответ","выбор","решение",
    "пример","случай","привычка","правило","начало","конец","момент","час",
    "минута","день","неделя","месяц","год","человек","женщина","мужчина",
    "ребенок","друг","сосед","мастер","учитель","врач","художник","писатель",
    "путешественник","сторож","садовник","почтальон","рыбак","пекарь","музыкант",
]
RHYTHM = [12, 16, 11, 19, 13, 17, 10, 21]
TIME_RU = ["Однажды", "Позже", "Утром", "Вечером", "Весной", "Зимой"]

def build_text(word_count: int) -> str:
    parts = []
    sentence = 0
    index = 0
    while index < word_count:
        target = RHYTHM[sentence % len(RHYTHM)]
        length = min(target, word_count - index)
        sent = []
        if sentence % 11 == 0:
            sent.append(TIME_RU[sentence % 6])
        for offset in range(length):
            word = RU_WORDS[(17 * sentence + offset * 3 + 5) % len(RU_WORDS)]
            if not sent:
                word = word.capitalize()
            sent.append(word)
        punct = [".", ".", ",", ";", ".", ":", ".", "."][sentence % 8]
        parts.append(" ".join(sent) + punct)
        sentence += 1
        index += length
        parts.append("\n\n" if sentence % 5 == 0 else " ")
    return "".join(parts).strip()

def send(model: str, word_count: int):
    payload = {
        "model": model,
        "messages": [{"role": "user", "content": build_text(word_count)}],
        "stream": False,
        "max_tokens": 1,
        "temperature": 0,
    }
    body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
    assert b"\\u" not in body
    r = httpx.post(
        OPENROUTER_URL,
        headers={
            "Authorization": f"Bearer {OPENROUTER_KEY}",
            "Content-Type": "application/json; charset=utf-8",
        },
        content=body,
        timeout=180,
    )
    print(r.status_code)
    print(r.text)

# Increase gradually until it flips from success to failure.
for words in [44747, 45058, 45435, 45560, 45572]:
    print('WORDS', words)
    send("google/gemma-4-31b-it:free", words)

Concrete results

Direct UTF-8 path, no proxy

For google/gemma-4-31b-it:free, using explicit UTF-8 JSON bytes with 0
\u escape sequences:

Gradual increase:

  • intended 72,000 target -> 200, prompt_tokens = 71,848
  • intended 72,500 target -> 200, prompt_tokens = 72,345
  • intended 73,000 target -> 200, prompt_tokens = 72,844
  • intended 73,200 target -> 200, prompt_tokens = 73,043
  • intended 73,220 target -> 400, OpenRouter says 262,147 requested
  • intended 73,240 target -> 400, OpenRouter says 262,224 requested
  • intended 73,400 target -> 400, OpenRouter says 262,796 requested

That is the key symptom: the request starts failing just after a successful
probe at 73,043 provider-counted prompt tokens, which is nowhere near the
advertised 262,144 context limit.

Selected raw probes:

  • probe A
    • target: 73200
    • words: 45560
    • status: 200
    • outgoing body had 0 \u escapes
    • prompt_tokens = 73043
  • probe B
    • target: 73220
    • words: 45572
    • status: 400
    • outgoing body had 0 \u escapes
    • error: This endpoint's maximum context length is 262144 tokens. However, you requested about 262147 tokens (262146 of text input, 1 in the output).
  • probe C
    • target: 73400
    • status: 400
    • error: requested about 262796 tokens (262795 of text input, 1 in the output)

Raw direct OpenRouter error body for the failing 73,220 probe:

{
  "error": {
    "message": "This endpoint's maximum context length is 262144 tokens. However, you requested about 262147 tokens (262146 of text input, 1 in the output). Please reduce the length of either one, or use the context-compression plugin to compress your prompt automatically.",
    "code": 400,
    "metadata": {
      "provider_name": null
    }
  },
  "user_id": "..."
}

The important part is "provider_name": null, which makes it look like the
request is being rejected by OpenRouter before routing to a provider.

For comparison, the original imported-chat failure I was debugging also had the
same OpenRouter-side shape:

{
  "error": {
    "code": 400,
    "message": "This endpoint's maximum context length is 262144 tokens. However, you requested about 271558 tokens (269727 of text input, 1831 of tool input). Please reduce the length of either one, or use the context-compression plugin to compress your prompt automatically.",
    "metadata": {
      "provider_name": null
    }
  },
  "user_id": "..."
}

Same direct UTF-8 body on other 262144-context models

Over-threshold probe (45572 Russian words, same body shape):

  • qwen/qwen3-235b-a22b-2507
    • status: 400
    • OpenRouter estimate: 262147 requested / 262146 text
  • nvidia/nemotron-3-super-120b-a12b:free
    • status: 400
    • OpenRouter estimate: 262147 requested / 262146 text
  • inclusionai/ling-2.6-1t:free
    • status: 400
    • OpenRouter estimate: 262147 requested / 262146 text
  • bytedance-seed/seed-2.0-mini
    • status: 400
    • OpenRouter estimate: 262147 requested / 262146 text

Near-threshold controls:

  • qwen/qwen3-235b-a22b-2507
    • 45560 words -> 200, prompt_tokens = 106397
    • 45572 words -> 400, 262147 requested
  • bytedance-seed/seed-2.0-mini
    • 45560 words -> 200, prompt_tokens = 88124
    • 45572 words -> 400, 262147 requested

Again, these accepted prompt sizes are well below 262144, yet the next small
Cyrillic increase fails with the exact same OpenRouter-side 262147 estimate.

Expected behavior

One of the following should happen:

  • the direct UTF-8 request should be counted correctly and either pass or fail according to the model's actual tokenizer/context accounting, or
  • if a shared pre-routing estimator is used, it should not map materially different model/tokenizer outcomes to the same exact 262147 estimate for the same body.

Actual behavior

OpenRouter rejects the direct UTF-8 Cyrillic body with the exact same 262147 estimate across multiple different 262144-context models, even though accepted provider-counted prompt sizes for the nearby control probe differ widely by model.

The rejection appears to happen on the OpenRouter side before provider routing,
based on the raw error payload containing "metadata": {"provider_name": null}.

Notes

  • This was originally discovered while debugging imported chat history, but the reproducer above does not require any import logic, reasoning fields, or local proxy.
  • The same cliff was also reproduced through a local proxy, but the direct UTF-8 public API reproduction rules the proxy out.
  • If useful, I can provide the exact full script and the raw outputs for all tested models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions