Skip to content

[BOT ISSUE] OpenAI: add missing gpt-realtime-translate and gpt-realtime-whisper models #564

@github-actions

Description

@github-actions

Gap

Two new OpenAI realtime models are missing from packages/proxy/schema/model_list.json:

  1. gpt-realtime-translate — a live translation model that translates speech from 70+ input languages into 13 output languages
  2. gpt-realtime-whisper — a streaming speech-to-text model for low-latency realtime transcription

Both were announced on May 7, 2026 alongside gpt-realtime-2. The catalog already includes other realtime models (gpt-realtime-mini, gpt-realtime-1.5) and gpt-realtime-2 is tracked separately in #543.

Verified fields

gpt-realtime-translate

Field Value Source
Model ID gpt-realtime-translate Model page
Format openai Catalog convention for OpenAI models
Flavor chat Catalog convention
Max input tokens 16,000 Model page
Max output tokens 2,000 Model page
Available providers ["openai"] Model page

gpt-realtime-whisper

Field Value Source
Model ID gpt-realtime-whisper Model page
Format openai Catalog convention for OpenAI models
Flavor chat Catalog convention
Max input tokens 16,000 Model page
Max output tokens 2,000 Model page
Available providers ["openai"] Model page

Verification note

  • Model existence: Both models confirmed on the OpenAI models listing, their dedicated model pages (translate, whisper), and the launch announcement (three independent official signals).
  • Pricing: Both models use per-minute audio pricing ($0.034/min for translate, $0.017/min for whisper) which does not map cleanly to the per-token input_cost_per_mil_tokens/output_cost_per_mil_tokens schema fields. Per-token pricing is not published — pricing fields are omitted rather than guessed.
  • Token limits: 16,000 context / 2,000 max output confirmed on each model's detail page.
  • displayName: Not published by OpenAI; omitted rather than guessed.
  • parent: Not applicable — both are standalone models with no snapshots.
  • multimodal: Not set. These are audio-focused models; the multimodal field is typically used for vision (image input) support in the catalog.

Local files inspected

  • packages/proxy/schema/model_list.json — grep confirms neither gpt-realtime-translate nor gpt-realtime-whisper exists

Source URLs

{
  "kind": "missing_model",
  "provider": "openai",
  "models": ["gpt-realtime-translate", "gpt-realtime-whisper"],
  "status": "active",
  "model_specs": {
    "gpt-realtime-translate": {
      "format": "openai",
      "flavor": "chat",
      "max_input_tokens": 16000,
      "max_output_tokens": 2000,
      "available_providers": ["openai"]
    },
    "gpt-realtime-whisper": {
      "format": "openai",
      "flavor": "chat",
      "max_input_tokens": 16000,
      "max_output_tokens": 2000,
      "available_providers": ["openai"]
    }
  },
  "source_urls": [
    "https://developers.openai.com/api/docs/models/gpt-realtime-translate",
    "https://developers.openai.com/api/docs/models/gpt-realtime-whisper",
    "https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/"
  ]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions