feat(core): add PDF text extraction to read_file tool by scrollDynasty · Pull Request #3202 · QwenLM/qwen-code

scrollDynasty · 2026-04-13T08:05:31Z

Summary

Implements PDF text extraction support for the read_file tool.

Problem

Previously, attempting to read a .pdf file returned binary content or an error about unsupported modality — making PDFs completely unusable in Qwen Code.

Solution

Detect .pdf extension in processSingleFileContent
Use pdf-parse to extract text content from the binary file
Remove PDF from mediaModalityKey() — text extraction works regardless of model modality

Edge Cases Handled

Normal PDF with text → returns extracted text
Encrypted/password-protected PDF → clear error message
Image-only/scanned PDF (no text layer) → suggests using OCR
Corrupted or missing file → returns actual error

Testing

Manually verified before/after:

Before: "I cannot directly read PDF files"
After: successfully extracts and returns text content

Unit tests added:

Successful PDF text extraction
Encrypted PDF handling
Image-only PDF handling
Non-existent file handling

All checks pass:

npm run test 113 tests pass
npm run lint:ci
npm run build
npm run typecheck

…ted and image-only PDFs

- Use pdf-parse to extract text from PDF files in read_file tool - Handle encrypted/password-protected PDFs with clear error message - Handle image-only/scanned PDFs with OCR suggestion - Handle corrupted files gracefully - Remove PDF from media modality check (text extraction works universally) - Add unit tests for PDF reading scenarios Closes QwenLM#1149

tanzhenxin · 2026-04-13T09:08:09Z

Thanks for working on this, @scrollDynasty! PDF support is definitely a gap we need to fill.

We actually have an overlapping PR in #3160 that covers the same problem — making PDFs readable for text-only models. That one takes a slightly different approach (system pdftotext with fallback, preserving native PDF pass-through for multimodal models like Claude/Gemini, page range support, etc.), so we're going to move forward with that implementation.

Appreciate you taking the time to put this together — the error handling for encrypted and image-only PDFs is a nice touch. Going to close this one in favor of #3160.

scrollDynasty added 3 commits April 13, 2026 12:39

feat: add PDF reading support with error handling for password-protec…

c649581

…ted and image-only PDFs

test: add debug logs for PDFParse results in fileUtils tests

7c83154

scrollDynasty mentioned this pull request Apr 13, 2026

How to READ PDF file ?! #1149

Open

tanzhenxin closed this Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add PDF text extraction to read_file tool#3202

feat(core): add PDF text extraction to read_file tool#3202
scrollDynasty wants to merge 3 commits intoQwenLM:mainfrom
scrollDynasty:feat/pdf-reading-support

scrollDynasty commented Apr 13, 2026

Uh oh!

tanzhenxin commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scrollDynasty commented Apr 13, 2026

Summary

Problem

Solution

Edge Cases Handled

Testing

Uh oh!

tanzhenxin commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants