Skip to content
Draft
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/openrouter-video-adapter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
'@tanstack/ai-openrouter': minor
---

Add `openRouterVideo`, a video generation adapter for OpenRouter's dedicated async API (`POST /api/v1/videos`) — Seedance, Veo 3.1, Wan, Kling, and Sora 2 Pro through one API key. Follows the jobs/polling architecture (`generateVideo()` → `getVideoJobStatus()`), with per-model `size` / `duration` / provider-option types generated from OpenRouter's `GET /api/v1/videos/models` metadata and validated before submit. `duration` is typed per model on the shared typed-duration contract — the adapter implements `availableDurations()` and `snapDuration(seconds)` (matching the Veo adapter) to enumerate the valid set and coerce raw UI seconds to the closest supported value. Image-conditioned prompts map `metadata.role` onto the wire: `start_frame` / `end_frame` → `frame_images[]` (`first_frame` / `last_frame`), `reference` / `character` → `input_references[]`; frame roles are validated against each model's `supported_frame_images`. Completed videos are downloaded server-side and returned as `data:` URLs (OpenRouter's download URLs require the API key), and the gateway-reported cost is surfaced as `usage.cost`.

Image adapter fixes from the #624 review: requested `size` is now validated (the `WIDTHxHEIGHT` union previously used a Unicode `×`, so every size except `1024x1024` silently dropped its aspect ratio; unsupported sizes now throw with the supported list), `numberOfImages > 1` throws instead of silently returning one image (verified live: the gateway ignores all count keys in `image_config`), and `image_config.strength` (0.0–1.0 image-to-image influence) is exposed via `modelOptions.strength`.
5 changes: 5 additions & 0 deletions .changeset/video-adapter-duration-constraint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai': patch
---

Fix `generateVideo()` (and the other `generateVideo` activity entry points) rejecting video adapters that declare per-model typed durations. The activity's `TAdapter extends VideoAdapter<string, any, any, any>` bound let the sixth `TModelDurationByName` generic fall back to its `Record<string, number>` default; because `createVideoJob` is a contravariant function-valued property, a concrete adapter whose `duration` is narrowed to a literal union (e.g. Veo's `4 | 6 | 8`, OpenRouter Seedance's `4..15`) failed the bound, so the documented `generateVideo({ adapter: geminiVideo('veo-3.1-generate-preview'), ... })` pattern did not type-check. The constraint now leaves the size and duration generics unpinned (`VideoAdapter<string, any, any, any, any, any>`); the real per-model types are still recovered by inference (`VideoSizeForAdapter` / `VideoDurationForAdapter`).
87 changes: 87 additions & 0 deletions docs/adapters/openrouter.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,93 @@ fields are simply absent and the stream completes normally. Both
`openRouterText` and `openRouterResponsesText` populate cost when OpenRouter
returns it.

## Image Generation

`openRouterImage` routes image generation through OpenRouter's
chat-completions surface (`modalities: ['image']`). Multimodal prompts are
supported — text and image parts are forwarded in order for
image-conditioned generation:

```typescript
import { generateImage } from "@tanstack/ai";
import { openRouterImage } from "@tanstack/ai-openrouter";

const result = await generateImage({
adapter: openRouterImage("google/gemini-2.5-flash-image"),
prompt: "A watercolor lighthouse at dusk",
size: "1344x768", // mapped to image_config.aspect_ratio ('16:9')
modelOptions: {
image_size: "2K", // resolution (Gemini models)
strength: 0.35, // image-to-image influence, i2i-capable models only
},
});
```

Notes:

- The pathway returns **exactly one image per request** — `numberOfImages > 1`
throws instead of silently under-delivering. Make multiple requests if you
need multiple candidates.
- `size` must be one of the ten supported `WIDTHxHEIGHT` values (it is
converted to `image_config.aspect_ratio`); anything else throws with the
supported list.

## Video Generation (Experimental)

`openRouterVideo` targets OpenRouter's dedicated **async video API**
(`POST /api/v1/videos`) — Seedance, Veo 3.1, Wan, Kling, and Sora 2 Pro
through your one OpenRouter key. It follows the jobs/polling architecture
shared by all TanStack AI video adapters:

```typescript
// Server: create the job, then poll
import { generateVideo, getVideoJobStatus } from "@tanstack/ai";
import { openRouterVideo } from "@tanstack/ai-openrouter";

const adapter = openRouterVideo("bytedance/seedance-2.0");

const { jobId } = await generateVideo({
adapter,
prompt: [
{ type: "text", content: "Animate this product shot, slow push-in" },
{
type: "image",
source: { type: "url", value: "https://your-cdn.com/product.png" },
metadata: { role: "start_frame" },
},
],
size: "1280x720",
// `duration` is typed per model from the published metadata; coerce raw
// seconds with adapter.snapDuration() or enumerate via adapter.availableDurations().
duration: 8,
});

let status = await getVideoJobStatus({ adapter, jobId });
while (status.status !== "completed" && status.status !== "failed") {
await new Promise((r) => setTimeout(r, 5000));
status = await getVideoJobStatus({ adapter, jobId });
}
// status.url is a data: URL (OpenRouter download URLs require the API key,
// so the adapter downloads server-side); status.usage?.cost is the real
// billed cost reported by the gateway.
```

```tsx
// Client: track the job with the useGenerateVideo hook
import { useGenerateVideo, fetchServerSentEvents } from "@tanstack/ai-react";

const { generate, result, videoStatus, isLoading } = useGenerateVideo({
connection: fetchServerSentEvents("/api/generate/video"),
});
// result?.url renders directly: <video src={result.url} controls />
```

Sizes, durations, and per-model options (`resolution`, `aspectRatio`,
`generateAudio`, `seed`, …) are typed and validated per model from
OpenRouter's video model metadata. See
[Video Generation](../media/video-generation.md) for the full lifecycle,
streaming mode, and the image-to-video role-mapping table.

## Next Steps

- [Getting Started](../getting-started/quick-start) - Learn the basics
Expand Down
7 changes: 4 additions & 3 deletions docs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -256,13 +256,13 @@
"label": "Image Generation",
"to": "media/image-generation",
"addedAt": "2026-04-15",
"updatedAt": "2026-06-08"
"updatedAt": "2026-06-10"
},
Comment on lines +259 to 260

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Refresh media/image-generation updatedAt to this PR date.

docs/media/image-generation.md is changed in this PR, but its entry still shows "updatedAt": "2026-06-10" instead of today (2026-06-24), so docs freshness metadata is inconsistent.

As per coding guidelines, “Update updatedAt timestamp in docs/config.json when making content changes to a documentation page.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/config.json` around lines 259 - 260, The docs freshness metadata for the
media/image-generation page is stale because the config entry still uses an
older updatedAt value even though docs/media/image-generation.md was modified in
this PR. Update the corresponding entry in docs/config.json for
media/image-generation so its updatedAt matches the PR date, using the existing
docs metadata entry structure as the anchor.

Source: Coding guidelines

{
"label": "Video Generation",
"to": "media/video-generation",
"addedAt": "2026-04-15",
"updatedAt": "2026-06-08"
"updatedAt": "2026-06-24"
},
{
"label": "Generation Hooks",
Expand Down Expand Up @@ -454,7 +454,8 @@
{
"label": "OpenRouter Adapter",
"to": "adapters/openrouter",
"addedAt": "2026-04-15"
"addedAt": "2026-04-15",
"updatedAt": "2026-06-24"
},
{
"label": "OpenAI-Compatible",
Expand Down
2 changes: 1 addition & 1 deletion docs/media/image-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ await generateImage({
| **Gemini** | Native models (`gemini-*-flash-image`, "nano-banana", etc.) → prompt parts map 1:1 onto multimodal `contents`, preserving interleaved order. Up to ~14 input images (provider limit, not enforced by the SDK).<br>Imagen models → throws (text-to-image only). |
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (e.g. nano-banana edit gets `image_urls`, Fooocus masks get `mask_image_url`). Defaults for unknown endpoints: 1 input → `image_url`; multiple → `image_urls`; `role: 'mask'` → `mask_url`; `role: 'control'` → `control_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`. Override with `modelOptions` for endpoint-specific fields. |
| **Grok** | grok-imagine models → xAI's `/v1/images/edits` (up to 3 source images, addressed by xAI in request order; prompt sent verbatim). `role: 'mask'` / `'control'` throw (no Imagine API equivalent). `grok-2-image-1212` throws (text-to-image only). |
| **OpenRouter** | Prompt parts map 1:1 onto multimodal `image_url` / `text` content parts, preserving interleaved order, and are forwarded to the underlying image model. |
| **OpenRouter** | Prompt parts map 1:1 onto multimodal `image_url` / `text` content parts, preserving interleaved order, and are forwarded to the underlying image model. `modelOptions.strength` (0.0–1.0) controls image-to-image influence on models that document it (e.g. Recraft). One image per request — `numberOfImages > 1` throws (the gateway ignores count keys). |
| **Anthropic** | n/a — no image generation API. |

Adapters that don't support image-conditioned generation throw a clear
Expand Down
89 changes: 80 additions & 9 deletions docs/media/video-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,21 @@
title: Video Generation
id: video-generation
order: 6
description: "Generate video from text prompts with OpenAI Sora or Google Veo using TanStack AI's experimental generateVideo() jobs/polling API."
description: "Generate video from text prompts with OpenAI Sora, Google Veo, fal.ai, or OpenRouter (Seedance, Veo, Wan) using TanStack AI's experimental generateVideo() jobs/polling API."
keywords:
- tanstack ai
- video generation
- sora
- veo
- gemini
- openrouter
- seedance
- fal
- generateVideo
- jobs api
- experimental
- text-to-video
- image-to-video
---

# Video Generation (Experimental)
Expand All @@ -39,6 +43,8 @@ TanStack AI provides experimental support for video generation through dedicated
Currently supported:
- **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)
- **fal.ai**: Kling, MiniMax, Hunyuan, and other fal-hosted video endpoints
- **OpenRouter**: Seedance, Veo 3.1, Wan, Kling, Sora 2 Pro and others via the dedicated async video API (`POST /api/v1/videos`)

## Basic Usage

Expand Down Expand Up @@ -427,12 +433,12 @@ for the per-provider table.
Each `ImagePart` can carry an optional `metadata.role` hint that the
adapter uses to route the input to the provider-specific field:

| Role | Maps to |
| --------------- | ------------------------------------------------------------- |
| `'start_frame'` | fal `start_image_url`, Veo input `image` (positional default for the first input) |
| `'end_frame'` | fal `end_image_url`, Veo `lastFrame` |
| `'reference'` | fal `reference_image_urls`, Veo `referenceImages` |
| `'character'` | Same as `'reference'` — character consistency images |
| Role | Maps to |
| --------------- | --------------------------------------------------------------------------------------------------------- |
| `'start_frame'` | fal `start_image_url`; Veo input `image`; OpenRouter `frame_images[]` with `frame_type: 'first_frame'` (positional default for the first input) |
| `'end_frame'` | fal `end_image_url`; Veo `lastFrame`; OpenRouter `frame_images[]` with `frame_type: 'last_frame'` |
| `'reference'` | fal `reference_image_urls`; Veo `referenceImages`; OpenRouter `input_references[]` |
| `'character'` | Same as `'reference'` — character consistency images |

```typescript
import { generateVideo } from '@tanstack/ai'
Expand Down Expand Up @@ -460,6 +466,7 @@ await generateVideo({
| **OpenAI** | Sora-2 / Sora-2-Pro → the image part goes to `input_reference`; flattened text is the prompt. Single image only — throws if more than one. |
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types — e.g. `role: 'start_frame'` lands on `image_url` for Kling/Veo image-to-video, `first_frame_url` for first-last-frame endpoints, and `start_image_url` otherwise. Defaults: single input → `image_url` (start frame); `role: 'end_frame'` → `end_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`. Override per-endpoint via `modelOptions` — the media-conditioning fields are typed optional there (even when the endpoint requires them) since they usually arrive as prompt parts. |
| **Gemini** | Veo → the first un-roled / `'start_frame'` image becomes the input image; `'end_frame'` → `lastFrame`; `'reference'` / `'character'` → `referenceImages` (asset references, Veo 3.1). Throws on multiple starting images. |
| **OpenRouter** | `role: 'start_frame'` / `'end_frame'` → `frame_images[]` with `frame_type: 'first_frame'` / `'last_frame'`; `role: 'reference'` / `'character'` → `input_references[]`; an unroled image defaults to the start frame. At most one start and one end frame; frame roles are validated against the model's `supported_frame_images` metadata (e.g. Hailuo only takes a first frame). When both frame images and references are present, OpenRouter treats the request as image-to-video and references take lower priority. URL image sources pass through verbatim and `data` sources become data URIs — OpenRouter does not fetch URLs behind redirects or bot checks, so use directly accessible URLs. |

Adapters whose underlying API can't accept image inputs throw a clear
runtime error so calls fail fast.
Expand Down Expand Up @@ -567,6 +574,68 @@ Adapters that haven't declared a per-model duration map keep the plain
> Files API and requires your API key to download (send it as an
> `x-goog-api-key` header or `key` query parameter).

### OpenRouter Model Options

OpenRouter's [video generation API](https://openrouter.ai/docs/guides/overview/multimodal/video-generation)
runs Seedance, Veo, Wan, Kling, Sora 2 Pro and others behind one async jobs
API. `size`, `duration`, and the per-model options below are typed **and
validated per model** from OpenRouter's published model capabilities (a size
or duration the model doesn't support throws before the request is sent):

```typescript
import { generateVideo } from '@tanstack/ai'
import { openRouterVideo } from '@tanstack/ai-openrouter'

const { jobId } = await generateVideo({
adapter: openRouterVideo('bytedance/seedance-2.0'),
prompt: 'A beautiful sunset over the ocean',
size: '1280x720', // per-model union from OpenRouter's model metadata
duration: 8, // validated against the model's supported durations
modelOptions: {
resolution: '720p', // alternative to size: resolution + aspectRatio
aspectRatio: '16:9',
generateAudio: true, // omitted from the type for models that can't
seed: 42, // omitted from the type for models that can't
callbackUrl: 'https://your-app.com/webhooks/openrouter-video',
provider: { options: { bytedance: { watermark: false } } }, // passthrough
},
})
```

Like the Veo adapter, OpenRouter's `duration` is **typed per model** — each
model narrows `duration` to the whole-second union published in its metadata,
and the adapter implements the same `availableDurations()` / `snapDuration()`
introspection helpers:

```typescript
import { generateVideo } from '@tanstack/ai'
import { openRouterVideo } from '@tanstack/ai-openrouter'

const adapter = openRouterVideo('bytedance/seedance-2.0')

adapter.availableDurations()
// { kind: 'discrete', values: [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] }
adapter.snapDuration(7.4) // 7 — closest valid duration

const sliderSeconds = 7 // raw seconds from a UI control
await generateVideo({
adapter,
prompt: 'A timelapse of clouds',
duration: adapter.snapDuration(sliderSeconds), // coerce to a valid duration
})
```

Two OpenRouter-specific behaviors to know about:

- **The completed video arrives as a `data:` URL.** OpenRouter's download
URLs require your API key in an `Authorization` header, so the adapter
downloads the content server-side and returns a base64 data URL that can
be handed straight to a `<video>` tag. Videos over ~10 MiB log a warning —
prefer re-uploading to your own storage/CDN over passing large data URLs
around.
- **Cost is reported on completion.** The gateway reports the real billed
cost for the job; it's surfaced as `usage.cost` on the completed result.

## Response Types

> **Note:** The interfaces below are the underlying adapter-level types. The `getVideoJobStatus()` helper returns a single merged object, `{ status, progress?, url?, error?, usage? }` — it does not return `jobId` or `expiresAt`.
Expand Down Expand Up @@ -675,8 +744,10 @@ Check the [OpenAI documentation](https://platform.openai.com/docs) for current l
The video adapters use the same environment variables as the other adapters
for their provider:

- `OPENAI_API_KEY`: Your OpenAI API key (Sora)
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (Veo)
- `OPENAI_API_KEY`: Your OpenAI API key (`openaiVideo`, Sora)
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (`geminiVideo`, Veo)
- `OPENROUTER_API_KEY`: Your OpenRouter API key (`openRouterVideo`)
- `FAL_KEY`: Your fal.ai API key (`falVideo`)

## Explicit API Keys

Expand Down
1 change: 1 addition & 0 deletions examples/ts-react-media/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"@tanstack/ai": "workspace:*",
"@tanstack/ai-fal": "workspace:*",
"@tanstack/ai-gemini": "workspace:*",
"@tanstack/ai-openrouter": "workspace:*",
"@tanstack/react-router": "^1.158.4",
"@tanstack/react-start": "^1.159.0",
"@tanstack/router-plugin": "^1.158.4",
Expand Down
14 changes: 14 additions & 0 deletions examples/ts-react-media/src/lib/models.ts
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,20 @@ export const VIDEO_MODELS = [
description: 'Fast image-to-video animation',
mode: 'image-to-video' as const,
},
{
id: 'bytedance/seedance-2.0',
name: 'Seedance 2.0 (Text-to-Video, OpenRouter)',
description:
"OpenRouter's async video API; duration typed 4–15s with snapDuration()",
mode: 'text-to-video' as const,
},
{
id: 'google/veo-3.1',
name: 'Veo 3.1 (Image-to-Video, OpenRouter)',
description:
'OpenRouter async video; duration snaps to the nearest of 4/6/8s',
mode: 'image-to-video' as const,
},
] as const

export type ImageModel = (typeof IMAGE_MODELS)[number]
Expand Down
Loading
Loading