Summary
Direct UTF-8 Cyrillic requests appear to be overestimated by OpenRouter's context-length check and can fail with the exact same 262147 / 262146 text estimate across multiple different 262144-context models.
This does not appear to be SDK-specific and does not appear to be caused by JSON \u escaping.
I can reproduce it with a raw HTTPS POST to https://openrouter.ai/api/v1/chat/completions using explicit UTF-8 bytes:
json.dumps(payload, ensure_ascii=False).encode("utf-8")
The outgoing request body contains 0 \u escape sequences.
Why this looks like an OpenRouter-side counting bug
I did not start with a single failing payload. I gradually increased the amount
of Cyrillic context until the request flipped from success to failure.
For the same direct UTF-8 Cyrillic threshold search:
google/gemma-4-31b-it:free
- accepted at
73043 prompt tokens
- then rejected at the next probe as
262147 requested tokens
qwen/qwen3-235b-a22b-2507
- accepted at
106397 prompt tokens
- then rejected at the next probe as
262147 requested tokens
bytedance-seed/seed-2.0-mini
- accepted at
88124 prompt tokens
- then rejected at the next probe as
262147 requested tokens
The accepted provider-counted prompt sizes differ a lot by model, but the rejected UTF-8 Cyrillic body gets the same OpenRouter estimate across different model routes.
That suggests a shared pre-routing estimator or context-check problem rather than a model-specific tokenizer issue.
It also fails far earlier than the advertised 262144 context limit would
suggest. On Gemma-4, the last accepted direct UTF-8 probe was only 73043
provider-counted prompt tokens, and the very next probe was rejected by
OpenRouter as 262147 requested tokens.
Reproduction
Use a direct request to the public OpenRouter endpoint. The important points are:
- raw HTTPS request to
https://openrouter.ai/api/v1/chat/completions
Content-Type: application/json; charset=utf-8
- body encoded with
ensure_ascii=False and .encode("utf-8")
- no local proxy involved
max_tokens = 1
- a long Russian-only text body
Example reproducer:
import json
import httpx
OPENROUTER_URL = "https://openrouter.ai/api/v1/chat/completions"
OPENROUTER_KEY = "<your key>"
RU_WORDS = [
"вечер","утро","полдень","сумерки","рассвет","весна","лето","осень",
"зима","ветер","дождь","снег","туман","гроза","небо","облако",
"солнце","луна","звезда","река","озеро","море","берег","остров",
"поле","лес","сад","тропа","дорога","мост","город","деревня",
"площадь","улица","двор","окно","дверь","крыша","стена","комната",
"стол","стул","книга","письмо","карта","свеча","лампа","зеркало",
"чашка","хлеб","сыр","яблоко","груша","ягода","чай","кофе",
"вода","молоко","музыка","песня","голос","смех","шепот","тишина",
"разговор","история","память","мысль","сон","надежда","тревога","радость",
"печаль","улыбка","взгляд","жест","шаг","встреча","путь","поездка",
"работа","отдых","дело","помощь","вопрос","ответ","выбор","решение",
"пример","случай","привычка","правило","начало","конец","момент","час",
"минута","день","неделя","месяц","год","человек","женщина","мужчина",
"ребенок","друг","сосед","мастер","учитель","врач","художник","писатель",
"путешественник","сторож","садовник","почтальон","рыбак","пекарь","музыкант",
]
RHYTHM = [12, 16, 11, 19, 13, 17, 10, 21]
TIME_RU = ["Однажды", "Позже", "Утром", "Вечером", "Весной", "Зимой"]
def build_text(word_count: int) -> str:
parts = []
sentence = 0
index = 0
while index < word_count:
target = RHYTHM[sentence % len(RHYTHM)]
length = min(target, word_count - index)
sent = []
if sentence % 11 == 0:
sent.append(TIME_RU[sentence % 6])
for offset in range(length):
word = RU_WORDS[(17 * sentence + offset * 3 + 5) % len(RU_WORDS)]
if not sent:
word = word.capitalize()
sent.append(word)
punct = [".", ".", ",", ";", ".", ":", ".", "."][sentence % 8]
parts.append(" ".join(sent) + punct)
sentence += 1
index += length
parts.append("\n\n" if sentence % 5 == 0 else " ")
return "".join(parts).strip()
def send(model: str, word_count: int):
payload = {
"model": model,
"messages": [{"role": "user", "content": build_text(word_count)}],
"stream": False,
"max_tokens": 1,
"temperature": 0,
}
body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
assert b"\\u" not in body
r = httpx.post(
OPENROUTER_URL,
headers={
"Authorization": f"Bearer {OPENROUTER_KEY}",
"Content-Type": "application/json; charset=utf-8",
},
content=body,
timeout=180,
)
print(r.status_code)
print(r.text)
# Increase gradually until it flips from success to failure.
for words in [44747, 45058, 45435, 45560, 45572]:
print('WORDS', words)
send("google/gemma-4-31b-it:free", words)
Concrete results
Direct UTF-8 path, no proxy
For google/gemma-4-31b-it:free, using explicit UTF-8 JSON bytes with 0
\u escape sequences:
Gradual increase:
- intended
72,000 target -> 200, prompt_tokens = 71,848
- intended
72,500 target -> 200, prompt_tokens = 72,345
- intended
73,000 target -> 200, prompt_tokens = 72,844
- intended
73,200 target -> 200, prompt_tokens = 73,043
- intended
73,220 target -> 400, OpenRouter says 262,147 requested
- intended
73,240 target -> 400, OpenRouter says 262,224 requested
- intended
73,400 target -> 400, OpenRouter says 262,796 requested
That is the key symptom: the request starts failing just after a successful
probe at 73,043 provider-counted prompt tokens, which is nowhere near the
advertised 262,144 context limit.
Selected raw probes:
- probe A
- target:
73200
- words:
45560
- status:
200
- outgoing body had
0 \u escapes
prompt_tokens = 73043
- probe B
- target:
73220
- words:
45572
- status:
400
- outgoing body had
0 \u escapes
- error:
This endpoint's maximum context length is 262144 tokens. However, you requested about 262147 tokens (262146 of text input, 1 in the output).
- probe C
- target:
73400
- status:
400
- error:
requested about 262796 tokens (262795 of text input, 1 in the output)
Raw direct OpenRouter error body for the failing 73,220 probe:
{
"error": {
"message": "This endpoint's maximum context length is 262144 tokens. However, you requested about 262147 tokens (262146 of text input, 1 in the output). Please reduce the length of either one, or use the context-compression plugin to compress your prompt automatically.",
"code": 400,
"metadata": {
"provider_name": null
}
},
"user_id": "..."
}
The important part is "provider_name": null, which makes it look like the
request is being rejected by OpenRouter before routing to a provider.
For comparison, the original imported-chat failure I was debugging also had the
same OpenRouter-side shape:
{
"error": {
"code": 400,
"message": "This endpoint's maximum context length is 262144 tokens. However, you requested about 271558 tokens (269727 of text input, 1831 of tool input). Please reduce the length of either one, or use the context-compression plugin to compress your prompt automatically.",
"metadata": {
"provider_name": null
}
},
"user_id": "..."
}
Same direct UTF-8 body on other 262144-context models
Over-threshold probe (45572 Russian words, same body shape):
qwen/qwen3-235b-a22b-2507
- status:
400
- OpenRouter estimate:
262147 requested / 262146 text
nvidia/nemotron-3-super-120b-a12b:free
- status:
400
- OpenRouter estimate:
262147 requested / 262146 text
inclusionai/ling-2.6-1t:free
- status:
400
- OpenRouter estimate:
262147 requested / 262146 text
bytedance-seed/seed-2.0-mini
- status:
400
- OpenRouter estimate:
262147 requested / 262146 text
Near-threshold controls:
qwen/qwen3-235b-a22b-2507
45560 words -> 200, prompt_tokens = 106397
45572 words -> 400, 262147 requested
bytedance-seed/seed-2.0-mini
45560 words -> 200, prompt_tokens = 88124
45572 words -> 400, 262147 requested
Again, these accepted prompt sizes are well below 262144, yet the next small
Cyrillic increase fails with the exact same OpenRouter-side 262147 estimate.
Expected behavior
One of the following should happen:
- the direct UTF-8 request should be counted correctly and either pass or fail according to the model's actual tokenizer/context accounting, or
- if a shared pre-routing estimator is used, it should not map materially different model/tokenizer outcomes to the same exact
262147 estimate for the same body.
Actual behavior
OpenRouter rejects the direct UTF-8 Cyrillic body with the exact same 262147 estimate across multiple different 262144-context models, even though accepted provider-counted prompt sizes for the nearby control probe differ widely by model.
The rejection appears to happen on the OpenRouter side before provider routing,
based on the raw error payload containing "metadata": {"provider_name": null}.
Notes
- This was originally discovered while debugging imported chat history, but the reproducer above does not require any import logic, reasoning fields, or local proxy.
- The same cliff was also reproduced through a local proxy, but the direct UTF-8 public API reproduction rules the proxy out.
- If useful, I can provide the exact full script and the raw outputs for all tested models.
Summary
Direct UTF-8 Cyrillic requests appear to be overestimated by OpenRouter's context-length check and can fail with the exact same
262147/262146 textestimate across multiple different262144-context models.This does not appear to be SDK-specific and does not appear to be caused by JSON
\uescaping.I can reproduce it with a raw HTTPS
POSTtohttps://openrouter.ai/api/v1/chat/completionsusing explicit UTF-8 bytes:The outgoing request body contains
0\uescape sequences.Why this looks like an OpenRouter-side counting bug
I did not start with a single failing payload. I gradually increased the amount
of Cyrillic context until the request flipped from success to failure.
For the same direct UTF-8 Cyrillic threshold search:
google/gemma-4-31b-it:free73043prompt tokens262147requested tokensqwen/qwen3-235b-a22b-2507106397prompt tokens262147requested tokensbytedance-seed/seed-2.0-mini88124prompt tokens262147requested tokensThe accepted provider-counted prompt sizes differ a lot by model, but the rejected UTF-8 Cyrillic body gets the same OpenRouter estimate across different model routes.
That suggests a shared pre-routing estimator or context-check problem rather than a model-specific tokenizer issue.
It also fails far earlier than the advertised
262144context limit wouldsuggest. On Gemma-4, the last accepted direct UTF-8 probe was only
73043provider-counted prompt tokens, and the very next probe was rejected by
OpenRouter as
262147requested tokens.Reproduction
Use a direct request to the public OpenRouter endpoint. The important points are:
https://openrouter.ai/api/v1/chat/completionsContent-Type: application/json; charset=utf-8ensure_ascii=Falseand.encode("utf-8")max_tokens = 1Example reproducer:
Concrete results
Direct UTF-8 path, no proxy
For
google/gemma-4-31b-it:free, using explicit UTF-8 JSON bytes with0\uescape sequences:Gradual increase:
72,000target ->200,prompt_tokens = 71,84872,500target ->200,prompt_tokens = 72,34573,000target ->200,prompt_tokens = 72,84473,200target ->200,prompt_tokens = 73,04373,220target ->400, OpenRouter says262,147requested73,240target ->400, OpenRouter says262,224requested73,400target ->400, OpenRouter says262,796requestedThat is the key symptom: the request starts failing just after a successful
probe at
73,043provider-counted prompt tokens, which is nowhere near theadvertised
262,144context limit.Selected raw probes:
73200455602000\uescapesprompt_tokens = 7304373220455724000\uescapesThis endpoint's maximum context length is 262144 tokens. However, you requested about 262147 tokens (262146 of text input, 1 in the output).73400400requested about 262796 tokens (262795 of text input, 1 in the output)Raw direct OpenRouter error body for the failing
73,220probe:{ "error": { "message": "This endpoint's maximum context length is 262144 tokens. However, you requested about 262147 tokens (262146 of text input, 1 in the output). Please reduce the length of either one, or use the context-compression plugin to compress your prompt automatically.", "code": 400, "metadata": { "provider_name": null } }, "user_id": "..." }The important part is
"provider_name": null, which makes it look like therequest is being rejected by OpenRouter before routing to a provider.
For comparison, the original imported-chat failure I was debugging also had the
same OpenRouter-side shape:
{ "error": { "code": 400, "message": "This endpoint's maximum context length is 262144 tokens. However, you requested about 271558 tokens (269727 of text input, 1831 of tool input). Please reduce the length of either one, or use the context-compression plugin to compress your prompt automatically.", "metadata": { "provider_name": null } }, "user_id": "..." }Same direct UTF-8 body on other
262144-context modelsOver-threshold probe (
45572Russian words, same body shape):qwen/qwen3-235b-a22b-2507400262147requested /262146textnvidia/nemotron-3-super-120b-a12b:free400262147requested /262146textinclusionai/ling-2.6-1t:free400262147requested /262146textbytedance-seed/seed-2.0-mini400262147requested /262146textNear-threshold controls:
qwen/qwen3-235b-a22b-250745560words ->200,prompt_tokens = 10639745572words ->400,262147requestedbytedance-seed/seed-2.0-mini45560words ->200,prompt_tokens = 8812445572words ->400,262147requestedAgain, these accepted prompt sizes are well below
262144, yet the next smallCyrillic increase fails with the exact same OpenRouter-side
262147estimate.Expected behavior
One of the following should happen:
262147estimate for the same body.Actual behavior
OpenRouter rejects the direct UTF-8 Cyrillic body with the exact same
262147estimate across multiple different262144-context models, even though accepted provider-counted prompt sizes for the nearby control probe differ widely by model.The rejection appears to happen on the OpenRouter side before provider routing,
based on the raw error payload containing
"metadata": {"provider_name": null}.Notes