fix: route PDFs per provider format and stop dropping media in generate/run#54
Merged
Conversation
…te/run
Two related media-handling fixes:
1. PDF attachments were silently encoded as images on 3 of 4 providers.
build_openai_compatible_message_content (OpenAI + Grok) emitted every
attachment as an image_url part and build_anthropic_message_content
emitted every attachment as an image block, regardless of MIME type.
Now the builders branch on mime_type:
- OpenAI: inline PDFs become the documented file content part
({"type":"file","file":{"filename","file_data"}}); URL-based PDFs
return a clear error (chat completions does not accept file URLs).
- Anthropic: PDFs become document blocks with base64 or url sources
per the PDF-support docs.
- Grok: xAI chat completions only documents text and image parts, so
PDFs return a clear "not supported" error instead of a mislabeled
image_url part.
- Non-image, non-PDF MIME types error with the offending type named.
- Gemini already passed MIME types through and is unchanged; image
handling is byte-for-byte unchanged on all providers.
2. generate and tool run silently dropped attached media. Request::generate
and Request::run ignored Request::media entirely. Added
LLMClient::generate_with_media (default: error rather than drop) with
implementations for all four providers via message-based generate
internals, threaded media through ToolRunner::run_tool_loop and the
three tool-loop drivers using the same content builders, and routed
Request::generate/run through them. MockClient records the new
GenerateWithMedia request kind and tool-loop media for assertions.
Verified shapes against provider docs (Anthropic PDF support, OpenAI PDF
files guide, xAI image-understanding guide). Unit tests cover the
serialized part/block shapes per provider and mockito tests assert the
request bodies for generate/run now carry media.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related media-handling bugs, both fixed:
1. PDFs were silently encoded as images on 3 of 4 providers
src/backend/media.rsencoded every attachment as animage_urlpart (OpenAI + Grok) or animageblock (Anthropic), regardless of MIME type. Attaching a PDF (application/pdf) produced a broken/rejected request. Only Gemini passed the MIME type through.The content builders now branch on
mime_type:from_bytes)new){"type":"file","file":{"filename":"document.pdf","file_data":"data:application/pdf;base64,..."}}BadRequesterror{"type":"document","source":{"type":"base64","media_type":"application/pdf","data":...}}{"type":"document","source":{"type":"url","url":...}}inlineData/fileDataalready carry the MIME type)Image handling (
image/*) is byte-for-byte unchanged on all providers (covered by existing + new tests).Per-provider decisions (verified against current provider docs before implementing):
documentblock withbase64andurlsource variants exactly as implemented.filecontent part withfilename+ base64file_data. The docs explicitly state "Chat Completions does not support file URLs. Use the Responses API for this option." — so URL-based PDFs return an actionableBadRequesterror telling the user to attach the bytes inline withMediaFile::from_bytes, rather than sending a request the API will reject. SinceMediaFilecarries no filename, a stabledocument.pdfplaceholder is used (OpenAI uses the extension for type detection).image_urlmislabeling.audio/mpeg) error on OpenAI, Grok, and Anthropic with the offending MIME type named; Gemini continues to pass MIME types through as before.2.
generate/ toolrunsilently dropped attached mediaRequest::generateandRequest::runignoredRequest::mediaentirely —client.with_media(&media).generate(...)sent a text-only request.LLMClient::generate_with_media(prompt, media)with a default that errors (Unsupported) rather than dropping media, mirroringmaterialize_with_media. Implemented for all four providers by refactoring eachgenerate_with_metadatainto a message-basedgenerate_internalthat reuses the same content builders as the materialize path (so PDF rules above apply identically).AnyClientdispatches it;MockClientrecords it as a newRequestKind::GenerateWithMedia.media: &[MediaFile]throughToolRunner::run_tool_loopand the three tool-loop drivers (run_openai_compatible_tools,run_anthropic_tools,run_gemini_tools); the initial user turn is now built with the same content builders, so media is carried — or rejected with a clear error — per provider rules. With no media attached the serialized bodies are unchanged (untaggedTextcontent).Request::generateandRequest::runnow route throughgenerate_with_media/the media-aware tool loop when media is attached.Tests (no live API calls)
src/backend/media.rsunit tests on serialized content: OpenAI inline-PDFfilepart shape, Anthropic base64/urldocumentblock shapes, Grok PDF errors, URL-PDF error on OpenAI, unsupported-MIME errors, and images still producing the old shapes.tests/http_mock_tests.rs(mockito, realOpenAIClient): request body forwith_media(..).generate(..)carries theimage_urlpart;generate_with_mediawith an inline PDF carries the documentedfilepart; URL-based PDF errors before any HTTP request (expect(0));with_tools(..).media(..).run(..)carries the media part in the tool loop's initial user turn.tests/mock_edge_tests.rs: builder routing —generate→GenerateWithMedia,run(no tools) →GenerateWithMedia,run(with tools) records media in the tool loop.Verification
cargo fmtcleancargo clippy --all-targets(default features, as CI runs) and--all-features: 0 warningscargo testandcargo test --all-features: 610 passed, 0 failed (including doctests)🤖 Generated with Claude Code