Skip to content

fix: route PDFs per provider format and stop dropping media in generate/run#54

Merged
clifton merged 1 commit into
mainfrom
fix/pdf-media-handling
Jun 10, 2026
Merged

fix: route PDFs per provider format and stop dropping media in generate/run#54
clifton merged 1 commit into
mainfrom
fix/pdf-media-handling

Conversation

@clifton

@clifton clifton commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

Two related media-handling bugs, both fixed:

1. PDFs were silently encoded as images on 3 of 4 providers

src/backend/media.rs encoded every attachment as an image_url part (OpenAI + Grok) or an image block (Anthropic), regardless of MIME type. Attaching a PDF (application/pdf) produced a broken/rejected request. Only Gemini passed the MIME type through.

The content builders now branch on mime_type:

Provider Inline PDF (from_bytes) URL-based PDF (new) Non-image/non-PDF
OpenAI {"type":"file","file":{"filename":"document.pdf","file_data":"data:application/pdf;base64,..."}} clear BadRequest error clear error naming the MIME type
Anthropic {"type":"document","source":{"type":"base64","media_type":"application/pdf","data":...}} {"type":"document","source":{"type":"url","url":...}} clear error
Grok (xAI) clear "PDF attachments are not supported for Grok" error same error clear error
Gemini unchanged (inlineData/fileData already carry the MIME type) unchanged unchanged

Image handling (image/*) is byte-for-byte unchanged on all providers (covered by existing + new tests).

Per-provider decisions (verified against current provider docs before implementing):

  • Anthropic (PDF support docs): documents the document block with base64 and url source variants exactly as implemented.
  • OpenAI (PDF files guide): Chat Completions accepts the file content part with filename + base64 file_data. The docs explicitly state "Chat Completions does not support file URLs. Use the Responses API for this option." — so URL-based PDFs return an actionable BadRequest error telling the user to attach the bytes inline with MediaFile::from_bytes, rather than sending a request the API will reject. Since MediaFile carries no filename, a stable document.pdf placeholder is used (OpenAI uses the extension for type detection).
  • Grok / xAI (image-understanding guide and main docs): the chat API documents only text and image content parts — no file/document part exists. Since PDF support could not be confirmed, PDFs now return a clear "PDF attachments are not supported for Grok: ... Extract the PDF's text or render its pages to images" error instead of the previous silent image_url mislabeling.
  • Non-image, non-PDF MIME types (e.g. audio/mpeg) error on OpenAI, Grok, and Anthropic with the offending MIME type named; Gemini continues to pass MIME types through as before.

2. generate / tool run silently dropped attached media

Request::generate and Request::run ignored Request::media entirely — client.with_media(&media).generate(...) sent a text-only request.

  • Added LLMClient::generate_with_media(prompt, media) with a default that errors (Unsupported) rather than dropping media, mirroring materialize_with_media. Implemented for all four providers by refactoring each generate_with_metadata into a message-based generate_internal that reuses the same content builders as the materialize path (so PDF rules above apply identically). AnyClient dispatches it; MockClient records it as a new RequestKind::GenerateWithMedia.
  • Threaded media: &[MediaFile] through ToolRunner::run_tool_loop and the three tool-loop drivers (run_openai_compatible_tools, run_anthropic_tools, run_gemini_tools); the initial user turn is now built with the same content builders, so media is carried — or rejected with a clear error — per provider rules. With no media attached the serialized bodies are unchanged (untagged Text content).
  • Request::generate and Request::run now route through generate_with_media/the media-aware tool loop when media is attached.

Tests (no live API calls)

  • src/backend/media.rs unit tests on serialized content: OpenAI inline-PDF file part shape, Anthropic base64/url document block shapes, Grok PDF errors, URL-PDF error on OpenAI, unsupported-MIME errors, and images still producing the old shapes.
  • tests/http_mock_tests.rs (mockito, real OpenAIClient): request body for with_media(..).generate(..) carries the image_url part; generate_with_media with an inline PDF carries the documented file part; URL-based PDF errors before any HTTP request (expect(0)); with_tools(..).media(..).run(..) carries the media part in the tool loop's initial user turn.
  • tests/mock_edge_tests.rs: builder routing — generateGenerateWithMedia, run (no tools) → GenerateWithMedia, run (with tools) records media in the tool loop.

Verification

  • cargo fmt clean
  • cargo clippy --all-targets (default features, as CI runs) and --all-features: 0 warnings
  • cargo test and cargo test --all-features: 610 passed, 0 failed (including doctests)

🤖 Generated with Claude Code

…te/run

Two related media-handling fixes:

1. PDF attachments were silently encoded as images on 3 of 4 providers.
   build_openai_compatible_message_content (OpenAI + Grok) emitted every
   attachment as an image_url part and build_anthropic_message_content
   emitted every attachment as an image block, regardless of MIME type.
   Now the builders branch on mime_type:
   - OpenAI: inline PDFs become the documented file content part
     ({"type":"file","file":{"filename","file_data"}}); URL-based PDFs
     return a clear error (chat completions does not accept file URLs).
   - Anthropic: PDFs become document blocks with base64 or url sources
     per the PDF-support docs.
   - Grok: xAI chat completions only documents text and image parts, so
     PDFs return a clear "not supported" error instead of a mislabeled
     image_url part.
   - Non-image, non-PDF MIME types error with the offending type named.
   - Gemini already passed MIME types through and is unchanged; image
     handling is byte-for-byte unchanged on all providers.

2. generate and tool run silently dropped attached media. Request::generate
   and Request::run ignored Request::media entirely. Added
   LLMClient::generate_with_media (default: error rather than drop) with
   implementations for all four providers via message-based generate
   internals, threaded media through ToolRunner::run_tool_loop and the
   three tool-loop drivers using the same content builders, and routed
   Request::generate/run through them. MockClient records the new
   GenerateWithMedia request kind and tool-loop media for assertions.

Verified shapes against provider docs (Anthropic PDF support, OpenAI PDF
files guide, xAI image-understanding guide). Unit tests cover the
serialized part/block shapes per provider and mockito tests assert the
request bodies for generate/run now carry media.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@clifton clifton merged commit c1f15cb into main Jun 10, 2026
9 checks passed
@clifton clifton deleted the fix/pdf-media-handling branch June 10, 2026 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant