Skip to content

feat(core): add PDF text extraction to read_file tool#3202

Closed
scrollDynasty wants to merge 3 commits intoQwenLM:mainfrom
scrollDynasty:feat/pdf-reading-support
Closed

feat(core): add PDF text extraction to read_file tool#3202
scrollDynasty wants to merge 3 commits intoQwenLM:mainfrom
scrollDynasty:feat/pdf-reading-support

Conversation

@scrollDynasty
Copy link
Copy Markdown

Summary

Implements PDF text extraction support for the read_file tool.

Closes #1149

Problem

Previously, attempting to read a .pdf file returned binary content or an error about unsupported modality — making PDFs completely unusable in Qwen Code.

Solution

  • Detect .pdf extension in processSingleFileContent
  • Use pdf-parse to extract text content from the binary file
  • Remove PDF from mediaModalityKey() — text extraction works regardless of model modality

Edge Cases Handled

  • Normal PDF with text → returns extracted text
  • Encrypted/password-protected PDF → clear error message
  • Image-only/scanned PDF (no text layer) → suggests using OCR
  • Corrupted or missing file → returns actual error

Testing

Manually verified before/after:

  • Before: "I cannot directly read PDF files"
  • After: successfully extracts and returns text content

Unit tests added:

  • Successful PDF text extraction
  • Encrypted PDF handling
  • Image-only PDF handling
  • Non-existent file handling

All checks pass:

  • npm run test 113 tests pass
  • npm run lint:ci
  • npm run build
  • npm run typecheck

- Use pdf-parse to extract text from PDF files in read_file tool
- Handle encrypted/password-protected PDFs with clear error message
- Handle image-only/scanned PDFs with OCR suggestion
- Handle corrupted files gracefully
- Remove PDF from media modality check (text extraction works universally)
- Add unit tests for PDF reading scenarios

Closes QwenLM#1149
@tanzhenxin
Copy link
Copy Markdown
Collaborator

Thanks for working on this, @scrollDynasty! PDF support is definitely a gap we need to fill.

We actually have an overlapping PR in #3160 that covers the same problem — making PDFs readable for text-only models. That one takes a slightly different approach (system pdftotext with fallback, preserving native PDF pass-through for multimodal models like Claude/Gemini, page range support, etc.), so we're going to move forward with that implementation.

Appreciate you taking the time to put this together — the error handling for encrypted and image-only PDFs is a nice touch. Going to close this one in favor of #3160.

@tanzhenxin tanzhenxin closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to READ PDF file ?!

2 participants