Direct UTF-8 Cyrillic prompts are overcounted to 262147 tokens across multiple 262144-context models

## Summary

Direct UTF-8 Cyrillic requests appear to be overestimated by OpenRouter's context-length check and can fail with the exact same `262147` / `262146 text` estimate across multiple different `262144`-context models.

This does **not** appear to be SDK-specific and does **not** appear to be caused by JSON `\u` escaping.

I can reproduce it with a raw HTTPS `POST` to `https://openrouter.ai/api/v1/chat/completions` using explicit UTF-8 bytes:

```python
json.dumps(payload, ensure_ascii=False).encode("utf-8")
```

The outgoing request body contains `0` `\u` escape sequences.

## Why this looks like an OpenRouter-side counting bug

I did not start with a single failing payload. I gradually increased the amount
of Cyrillic context until the request flipped from success to failure.

For the same direct UTF-8 Cyrillic threshold search:

- `google/gemma-4-31b-it:free`
  - accepted at `73043` prompt tokens
  - then rejected at the next probe as `262147` requested tokens
- `qwen/qwen3-235b-a22b-2507`
  - accepted at `106397` prompt tokens
  - then rejected at the next probe as `262147` requested tokens
- `bytedance-seed/seed-2.0-mini`
  - accepted at `88124` prompt tokens
  - then rejected at the next probe as `262147` requested tokens

The accepted provider-counted prompt sizes differ a lot by model, but the rejected UTF-8 Cyrillic body gets the **same OpenRouter estimate** across different model routes.

That suggests a shared pre-routing estimator or context-check problem rather than a model-specific tokenizer issue.

It also fails **far earlier** than the advertised `262144` context limit would
suggest. On Gemma-4, the last accepted direct UTF-8 probe was only `73043`
provider-counted prompt tokens, and the very next probe was rejected by
OpenRouter as `262147` requested tokens.

## Reproduction

Use a direct request to the public OpenRouter endpoint. The important points are:

- raw HTTPS request to `https://openrouter.ai/api/v1/chat/completions`
- `Content-Type: application/json; charset=utf-8`
- body encoded with `ensure_ascii=False` and `.encode("utf-8")`
- no local proxy involved
- `max_tokens = 1`
- a long Russian-only text body

Example reproducer:

```python
import json
import httpx

OPENROUTER_URL = "https://openrouter.ai/api/v1/chat/completions"
OPENROUTER_KEY = "<your key>"

RU_WORDS = [
    "вечер","утро","полдень","сумерки","рассвет","весна","лето","осень",
    "зима","ветер","дождь","снег","туман","гроза","небо","облако",
    "солнце","луна","звезда","река","озеро","море","берег","остров",
    "поле","лес","сад","тропа","дорога","мост","город","деревня",
    "площадь","улица","двор","окно","дверь","крыша","стена","комната",
    "стол","стул","книга","письмо","карта","свеча","лампа","зеркало",
    "чашка","хлеб","сыр","яблоко","груша","ягода","чай","кофе",
    "вода","молоко","музыка","песня","голос","смех","шепот","тишина",
    "разговор","история","память","мысль","сон","надежда","тревога","радость",
    "печаль","улыбка","взгляд","жест","шаг","встреча","путь","поездка",
    "работа","отдых","дело","помощь","вопрос","ответ","выбор","решение",
    "пример","случай","привычка","правило","начало","конец","момент","час",
    "минута","день","неделя","месяц","год","человек","женщина","мужчина",
    "ребенок","друг","сосед","мастер","учитель","врач","художник","писатель",
    "путешественник","сторож","садовник","почтальон","рыбак","пекарь","музыкант",
]
RHYTHM = [12, 16, 11, 19, 13, 17, 10, 21]
TIME_RU = ["Однажды", "Позже", "Утром", "Вечером", "Весной", "Зимой"]

def build_text(word_count: int) -> str:
    parts = []
    sentence = 0
    index = 0
    while index < word_count:
        target = RHYTHM[sentence % len(RHYTHM)]
        length = min(target, word_count - index)
        sent = []
        if sentence % 11 == 0:
            sent.append(TIME_RU[sentence % 6])
        for offset in range(length):
            word = RU_WORDS[(17 * sentence + offset * 3 + 5) % len(RU_WORDS)]
            if not sent:
                word = word.capitalize()
            sent.append(word)
        punct = [".", ".", ",", ";", ".", ":", ".", "."][sentence % 8]
        parts.append(" ".join(sent) + punct)
        sentence += 1
        index += length
        parts.append("\n\n" if sentence % 5 == 0 else " ")
    return "".join(parts).strip()

def send(model: str, word_count: int):
    payload = {
        "model": model,
        "messages": [{"role": "user", "content": build_text(word_count)}],
        "stream": False,
        "max_tokens": 1,
        "temperature": 0,
    }
    body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
    assert b"\\u" not in body
    r = httpx.post(
        OPENROUTER_URL,
        headers={
            "Authorization": f"Bearer {OPENROUTER_KEY}",
            "Content-Type": "application/json; charset=utf-8",
        },
        content=body,
        timeout=180,
    )
    print(r.status_code)
    print(r.text)

# Increase gradually until it flips from success to failure.
for words in [44747, 45058, 45435, 45560, 45572]:
    print('WORDS', words)
    send("google/gemma-4-31b-it:free", words)
```

## Concrete results

### Direct UTF-8 path, no proxy

For `google/gemma-4-31b-it:free`, using explicit UTF-8 JSON bytes with `0`
`\u` escape sequences:

Gradual increase:

- intended `72,000` target -> `200`, `prompt_tokens = 71,848`
- intended `72,500` target -> `200`, `prompt_tokens = 72,345`
- intended `73,000` target -> `200`, `prompt_tokens = 72,844`
- intended `73,200` target -> `200`, `prompt_tokens = 73,043`
- intended `73,220` target -> `400`, OpenRouter says `262,147` requested
- intended `73,240` target -> `400`, OpenRouter says `262,224` requested
- intended `73,400` target -> `400`, OpenRouter says `262,796` requested

That is the key symptom: the request starts failing just after a successful
probe at `73,043` provider-counted prompt tokens, which is nowhere near the
advertised `262,144` context limit.

Selected raw probes:

- probe A
  - target: `73200`
  - words: `45560`
  - status: `200`
  - outgoing body had `0` `\u` escapes
  - `prompt_tokens = 73043`
- probe B
  - target: `73220`
  - words: `45572`
  - status: `400`
  - outgoing body had `0` `\u` escapes
  - error: `This endpoint's maximum context length is 262144 tokens. However, you requested about 262147 tokens (262146 of text input, 1 in the output).`
- probe C
  - target: `73400`
  - status: `400`
  - error: `requested about 262796 tokens (262795 of text input, 1 in the output)`

Raw direct OpenRouter error body for the failing `73,220` probe:

```json
{
  "error": {
    "message": "This endpoint's maximum context length is 262144 tokens. However, you requested about 262147 tokens (262146 of text input, 1 in the output). Please reduce the length of either one, or use the context-compression plugin to compress your prompt automatically.",
    "code": 400,
    "metadata": {
      "provider_name": null
    }
  },
  "user_id": "..."
}
```

The important part is `"provider_name": null`, which makes it look like the
request is being rejected by OpenRouter before routing to a provider.

For comparison, the original imported-chat failure I was debugging also had the
same OpenRouter-side shape:

```json
{
  "error": {
    "code": 400,
    "message": "This endpoint's maximum context length is 262144 tokens. However, you requested about 271558 tokens (269727 of text input, 1831 of tool input). Please reduce the length of either one, or use the context-compression plugin to compress your prompt automatically.",
    "metadata": {
      "provider_name": null
    }
  },
  "user_id": "..."
}
```

### Same direct UTF-8 body on other `262144`-context models

Over-threshold probe (`45572` Russian words, same body shape):

- `qwen/qwen3-235b-a22b-2507`
  - status: `400`
  - OpenRouter estimate: `262147` requested / `262146` text
- `nvidia/nemotron-3-super-120b-a12b:free`
  - status: `400`
  - OpenRouter estimate: `262147` requested / `262146` text
- `inclusionai/ling-2.6-1t:free`
  - status: `400`
  - OpenRouter estimate: `262147` requested / `262146` text
- `bytedance-seed/seed-2.0-mini`
  - status: `400`
  - OpenRouter estimate: `262147` requested / `262146` text

Near-threshold controls:

- `qwen/qwen3-235b-a22b-2507`
  - `45560` words -> `200`, `prompt_tokens = 106397`
  - `45572` words -> `400`, `262147` requested
- `bytedance-seed/seed-2.0-mini`
  - `45560` words -> `200`, `prompt_tokens = 88124`
  - `45572` words -> `400`, `262147` requested

Again, these accepted prompt sizes are well below `262144`, yet the next small
Cyrillic increase fails with the exact same OpenRouter-side `262147` estimate.

## Expected behavior

One of the following should happen:

- the direct UTF-8 request should be counted correctly and either pass or fail according to the model's actual tokenizer/context accounting, or
- if a shared pre-routing estimator is used, it should not map materially different model/tokenizer outcomes to the same exact `262147` estimate for the same body.

## Actual behavior

OpenRouter rejects the direct UTF-8 Cyrillic body with the exact same `262147` estimate across multiple different `262144`-context models, even though accepted provider-counted prompt sizes for the nearby control probe differ widely by model.

The rejection appears to happen on the OpenRouter side before provider routing,
based on the raw error payload containing `"metadata": {"provider_name": null}`.

## Notes

- This was originally discovered while debugging imported chat history, but the reproducer above does **not** require any import logic, reasoning fields, or local proxy.
- The same cliff was also reproduced through a local proxy, but the direct UTF-8 public API reproduction rules the proxy out.
- If useful, I can provide the exact full script and the raw outputs for all tested models.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct UTF-8 Cyrillic prompts are overcounted to 262147 tokens across multiple 262144-context models #365

Summary

Why this looks like an OpenRouter-side counting bug

Reproduction

Concrete results

Direct UTF-8 path, no proxy

Same direct UTF-8 body on other `262144`-context models

Expected behavior

Actual behavior

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Direct UTF-8 Cyrillic prompts are overcounted to 262147 tokens across multiple 262144-context models #365

Description

Summary

Why this looks like an OpenRouter-side counting bug

Reproduction

Concrete results

Direct UTF-8 path, no proxy

Same direct UTF-8 body on other 262144-context models

Expected behavior

Actual behavior

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Same direct UTF-8 body on other `262144`-context models