feat: add dspy.Video type adapter by ilia-iliev · Pull Request #9443 · stanfordnlp/dspy

ilia-iliev · 2026-03-13T09:20:04Z

Summary

Adds a Video type to DSPy's adapter system, following the same pattern as the existing types.

Relevant issues #8507 #7847

Currently only Gemini works with video, so the PR is google-specific. Video adapter can be extended in the future when more providers allow for video.

I've been using similar adapter for video analysis for the past few months.

Supports passing video inline or referencing an already uploaded video to Google's Files API (their recommended approach)

Example:

import dspy
dspy.configure(lm=dspy.LM("gemini/gemini-3-flash-preview"))

class Describe(dspy.Signature):
    video: dspy.Video = dspy.InputField()
    description: str = dspy.OutputField()

program = dspy.Predict(Describe)
result = program(video=dspy.Video.from_file("video.mp4"))

alvinttang

Code Review: Add dspy.Video type adapter

Clean implementation that follows the existing Audio and Image patterns well. Tests are thorough. A few issues worth addressing:

1. No file size guard for inline encoding

_encode_video_from_file reads the entire file into memory:

with open(file_path, "rb") as f:
    file_bytes = f.read()

For video files this is particularly dangerous — a 1 GB video will consume 1 GB of RAM plus ~1.33 GB for the base64 string. The docstring mentions "suitable for videos under 20 MB" but there is no enforcement. Consider adding a size check:

file_size = os.path.getsize(file_path)
if file_size > MAX_INLINE_SIZE:
    raise ValueError(f"File {file_path} is {file_size} bytes, exceeding the {MAX_INLINE_SIZE} byte limit for inline encoding. Use Video.from_url() with Google's Files API instead.")

2. The `format()` output structure may not match litellm's expected schema

The format method returns:

{"type": "file", "file": {"format": ..., "file_data": ...}}

But looking at litellm's Gemini integration and the OpenAI multimodal spec, video content parts typically use either:

{"type": "video_url", "video_url": {"url": ...}} (OpenAI-style), or
{"type": "image_url", ...} with a data URI (litellm converts these)

Has this been tested end-to-end with litellm/Gemini? The "type": "file" format may not be recognized by the chat adapter's message formatting. I'd suggest adding an integration test note or verifying against the actual litellm format.

3. String-based file path detection is fragile

In validate_input:

if isinstance(values, str):
    if os.path.isfile(values):
        return _encode_video_from_file(values)
    return {"url": values}

This means if a file happens to exist at a path that looks like a URL (e.g., someone has a file named https: in their cwd), it would be treated as a file. More critically, os.path.isfile() performs a filesystem call on every string input, including URLs. This is a side effect during validation that could be slow on network-mounted filesystems. Consider checking for URL schemes first:

if values.startswith(("http://", "https://", "gs://")):
    return {"url": values}
if os.path.isfile(values):
    return _encode_video_from_file(values)

4. `validate_source` allows both `None`

The validator:

if has_url == has_inline_data:
    raise ValueError("Exactly one of 'url' or 'inline_data' must be provided.")

This correctly rejects both-set and neither-set cases. But the error message for neither-set ("Exactly one of...") could be clearer — when a user passes Video(mime_type="video/mp4") they get this message, which doesn't tell them what to do. The test test_neither_url_nor_inline_data_raises expects match="url.*inline_data" which passes, but a user-facing message like "You must provide either 'url' or 'inline_data'" would be friendlier.

5. Missing `eq` and `hash`

The model is frozen=True which makes it hashable via pydantic's default, but equality comparison uses pydantic's default which compares all fields. For inline videos with large base64 strings, this means equality checks will compare megabytes of string data. If Videos are ever used in sets or as dict keys (e.g., for caching), consider implementing __hash__ based on a content digest.

6. Test imports coupling

from tests.signatures.test_adapter_file import count_messages_with_file_pattern

Importing from another test file creates a fragile coupling. If test_adapter_file.py is refactored or the function is renamed, video tests break. Consider moving count_messages_with_file_pattern to a shared test utility module.

Overall, solid work. The main risk is point 1 (unbounded memory usage for large videos) and point 2 (format compatibility with actual LLM providers).

ilia-iliev · 2026-03-16T09:04:44Z

Thanks, those are sound comments. Will address them

1. Added max size for inline 2. Not doing filesystem call for strings that look like URL 3. Improved error message 4. Designated count_messages_with_file_pattern to own module

ilia-iliev · 2026-03-16T10:22:09Z

2 Integration tests confirmed the formatting - both inline and Google Files API is handled correctly with {"type": "file", "file": {...}}

5 Didn't implement custom __hash__ - looks like it is already implemented by default in dspy/clients/cache.py:78-100

Addressed all other comments. Thanks!

alvinttang

Well-structured PR — the Video type follows the established patterns from Image, Audio, and File closely. Thorough test coverage too. Some observations:

1. The format() method output structure

The method returns [{"type": "file", "file": {"file_data": "data:..."}}] for inline data and [{"type": "file", "file": {"file_id": url, "format": mime_type}}] for URL references. Two questions:

Is file_id the correct key for litellm's Gemini integration? The litellm docs for Gemini file references typically expect file_uri (matching the Gemini API's fileData.fileUri field). If file_id is used, litellm may not route it correctly to the Gemini Files API.
For inline data, using file_data with a data URI works for litellm, but it's worth verifying that videos encoded this way actually work end-to-end with Gemini, since the 20MB base64 limit maps to ~15MB of raw video (base64 has ~33% overhead), which is quite small for video.

2. The validate_input string handling has an implicit assumption

if values.startswith(("http://", "https://", "gs://")):
    return {"url": values}
if os.path.isfile(values):
    return _encode_video_from_file(values)
return {"url": values}  # fallback: treat as URL

The fallback return {"url": values} for strings that aren't URLs and aren't existing files silently treats any string as a URL. This could mask typos in file paths — e.g., Video("./videos/demo.mp4") where the path doesn't exist would silently become a URL reference instead of raising an error. Consider raising a ValueError for non-URL strings that don't point to an existing file, or at least logging a warning.

3. The MAX_INLINE_SIZE constant

20MB is documented as the limit, but the Gemini API's inline data limit is actually 20MB of base64-encoded data (not raw file size). Since base64 encoding increases size by ~33%, a 20MB raw file becomes ~27MB of base64, which may exceed the API limit. Consider either:

Lowering MAX_INLINE_SIZE to ~15MB (to stay under 20MB after encoding), or
Documenting that the limit refers to raw file size and the effective encoded size may be larger.

4. Extracting count_messages_with_file_pattern to tests/test_utils/messages.py is a good refactor

This avoids duplication between the file and video test modules. Clean.

5. Missing __init__.py in tests/test_utils/?

The new tests/test_utils/messages.py module is imported by both test files. Does tests/test_utils/ have an __init__.py? If not, the import may fail depending on how pytest is configured (though pytest usually handles this via rootdir discovery).

6. Test coverage is excellent

The test suite covers URL construction, file encoding, bytes, validation errors, frozen model, repr, signature integration, list fields, optional fields, and save/load. Very thorough.

Overall this is a solid addition. The main concerns are the format() output structure (verify file_id vs file_uri for litellm/Gemini compatibility) and the silent URL fallback for invalid file paths.

alvinttang

Well-structured PR — the Video type follows the established patterns from Image, Audio, and File closely. Thorough test coverage too. Some observations:

1. The format() method output — verify file_id vs file_uri

The method returns {"type": "file", "file": {"file_id": url, "format": mime_type}} for URL references. Two questions:

Is file_id the correct key for litellm's Gemini integration? The litellm docs for Gemini file references typically expect file_uri (matching the Gemini API's fileData.fileUri field). If file_id is wrong, litellm may not route the video correctly to Gemini. Worth verifying against the existing Audio type's format() method to ensure consistency.
For inline data, the file_data key with a data URI (data:video/mp4;base64,...) matches litellm's expected format. This path should work.

2. Silent URL fallback for invalid file paths

In validate_input:

if values.startswith(("http://", "https://", "gs://")):
    return {"url": values}
if os.path.isfile(values):
    return _encode_video_from_file(values)
return {"url": values}  # fallback

The final fallback silently treats any non-URL, non-existing-file string as a URL. This could mask typos in file paths — e.g., Video("./videos/demo.mp4") where the path doesn't exist would silently become a URL reference rather than raising an error. Consider raising a ValueError for strings that don't match URL patterns and aren't existing files, or at least logging a warning. The Image type may have the same pattern, but it's worth reconsidering here since video URLs are less common than image URLs.

3. MAX_INLINE_SIZE may undercount

The 20MB limit checks raw file size, but base64 encoding increases size by ~33%. If the Gemini API's inline limit is 20MB of encoded data, a 20MB raw file becomes ~27MB encoded, which would exceed the limit. Consider either lowering to ~15MB or documenting that the limit refers to raw file size.

4. Missing __init__.py in tests/test_utils/

The new tests/test_utils/messages.py is imported by both test_adapter_file.py and test_adapter_video.py. If tests/test_utils/ doesn't have an __init__.py, the import may fail depending on pytest configuration. I don't see one being added in this diff.

5. validate_source error message could be clearer

if has_url == has_inline_data:
    raise ValueError("You must provide either 'url' or 'inline_data'.")

This fires when both are present AND when neither is present, but the error message is the same for both cases. Consider distinguishing: "Both 'url' and 'inline_data' were provided; use only one" vs "Neither 'url' nor 'inline_data' was provided; one is required."

6. Extracting count_messages_with_file_pattern to a shared utility is a good refactor

This avoids duplication between the file and video test modules.

7. Test coverage is excellent

URL construction, file encoding, bytes input, validation errors, frozen model enforcement, repr, signature integration, list fields, optional fields, and save/load are all covered. Very thorough.

Overall this is a solid addition to the type system. The main concerns are verifying the format() output matches what litellm/Gemini actually expects, and the silent URL fallback behavior.

ilia-iliev · 2026-03-17T09:47:17Z

file_id is correct - confirmed by uploading a file to Files API and referencing it. This is also what I see in LilteLLm's docs -

Added a comment to clarify the 20MB limit. I'm not sure DSPy is the right place to check video sizes though I agree some limit is necessary to prevent accidental loading of huge video. Also, it appears that Gemini has recently increased their inline to 100MB.

Updated the error messages

ilia and others added 2 commits March 13, 2026 10:44

feat: add dspy.Video type for Gemini video inputs

5839910

Merge branch 'stanfordnlp:main' into main

41c2917

ilia-iliev changed the title ~~feat: add dspy.Video type for Gemini video inputs~~ feat: add dspy.Video type adapter Mar 13, 2026

alvinttang reviewed Mar 14, 2026

View reviewed changes

Addressed PR comments:

560bc49

1. Added max size for inline 2. Not doing filesystem call for strings that look like URL 3. Improved error message 4. Designated count_messages_with_file_pattern to own module

alvinttang reviewed Mar 16, 2026

View reviewed changes

ilia added 2 commits March 17, 2026 10:25

improved error messages

917b7bd

added comment

54fb240

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add dspy.Video type adapter#9443

feat: add dspy.Video type adapter#9443
ilia-iliev wants to merge 5 commits intostanfordnlp:mainfrom
ilia-iliev:main

ilia-iliev commented Mar 13, 2026 •

edited

Loading

Uh oh!

alvinttang left a comment

Uh oh!

ilia-iliev commented Mar 16, 2026

Uh oh!

ilia-iliev commented Mar 16, 2026 •

edited

Loading

Uh oh!

alvinttang left a comment

Uh oh!

alvinttang left a comment

Uh oh!

ilia-iliev commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ilia-iliev commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

alvinttang left a comment

Choose a reason for hiding this comment

Code Review: Add dspy.Video type adapter

1. No file size guard for inline encoding

2. The format() output structure may not match litellm's expected schema

3. String-based file path detection is fragile

4. validate_source allows both None

5. Missing __eq__ and __hash__

6. Test imports coupling

Uh oh!

ilia-iliev commented Mar 16, 2026

Uh oh!

ilia-iliev commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alvinttang left a comment

Choose a reason for hiding this comment

Uh oh!

alvinttang left a comment

Choose a reason for hiding this comment

Uh oh!

ilia-iliev commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilia-iliev commented Mar 13, 2026 •

edited

Loading

2. The `format()` output structure may not match litellm's expected schema

4. `validate_source` allows both `None`

5. Missing `eq` and `hash`

ilia-iliev commented Mar 16, 2026 •

edited

Loading