fix: Secure Resume Upload Processing Against Malicious PDF and DOCX Files#5685
fix: Secure Resume Upload Processing Against Malicious PDF and DOCX Files#5685Krishnx21 wants to merge 7 commits into
Conversation
|
@Krishnx21 is attempting to deploy a commit to the jhasourav07's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@Aamod007 @JhaSourav07 check it out .. |
Aamod007
left a comment
There was a problem hiding this comment.
Strong security hardening. �pp/api/student/resume/upload/route.ts now sanitizes filenames, validates extension vs MIME type, and blocks encrypted or malformed inputs before parsing, while lib/resume-parser.ts adds worker isolation, timeout handling, and ZIP/PDF structure checks to keep untrusted documents from destabilizing the app.
Aamod007
left a comment
There was a problem hiding this comment.
Difficulty: critical – Secures Resume Upload Processing Against Malicious PDF and DOCX Files.
Quality: clean – Comprehensive security.
Type: security – File upload hardening.
Excellent security work!
Fixes #5489
Description
This PR addresses a critical security and reliability issue in the resume upload pipeline where untrusted PDF and DOCX files are parsed directly within the application process.
The current implementation reads uploaded files into memory and processes them using document parsing libraries (pdf-parse and mammoth) without isolation, resource constraints, or decompression safeguards. Since DOCX files are ZIP-based archives and PDFs can contain complex structures, specially crafted files can consume excessive CPU and memory resources, potentially causing denial-of-service conditions.
Affected Components
app/api/student/resume/upload/route.ts
lib/resume-parser.ts
Changes Made
Moved document parsing into a dedicated worker process instead of executing directly within the request handler.
Benefits:
Prevents parser failures from affecting API availability
Isolates resource-intensive operations
Improves application stability
2. Added Parsing Timeouts
Introduced execution time limits for document processing.
Features:
Automatic termination of long-running parsing tasks
Protection against parser hangs
Reduced risk of CPU exhaustion attacks
3. Added Memory Usage Controls
Implemented safeguards to prevent excessive memory allocation during parsing.
Protections include:
Maximum parser memory limits
Large-buffer rejection
Controlled extraction operations
4. Added DOCX Archive Validation
Introduced ZIP archive inspection before DOCX processing.
Validation includes:
Maximum file count checks
Maximum extracted size checks
Compression ratio validation
ZIP bomb detection
5. Added PDF Safety Checks
Implemented PDF validation before parsing.
Checks include:
Maximum page limits
Maximum object limits
File structure validation
Rejection of suspicious PDFs
6. Added Extracted Content Limits
Restricted the amount of text that can be extracted from uploaded documents.
Benefits:
Prevents oversized extraction payloads
Reduces memory pressure
Improves parser performance
7. Improved Error Handling
Added safer failure paths for malformed documents.
The system now:
Rejects invalid files gracefully
Returns meaningful validation errors
Avoids application crashes caused by parser exceptions
Security Impact
This PR mitigates:
PDF parser denial-of-service attacks
DOCX ZIP bomb attacks
CPU exhaustion attacks
Memory exhaustion attacks
Long-running parser abuse
Worker instability caused by malformed documents
Testing
Verified Scenarios
✅ Valid PDF resumes parse successfully
✅ Valid DOCX resumes parse successfully
✅ Malformed PDFs are rejected safely
✅ Malformed DOCX files are rejected safely
✅ Oversized extraction attempts are blocked
✅ Timeout protection terminates long-running parses
✅ ZIP bomb detection prevents excessive expansion
✅ Application remains responsive during parsing failures
Manual Testing
Upload a valid PDF resume.
Verify successful parsing.
Upload a valid DOCX resume.
Verify successful parsing.
Upload malformed documents.
Confirm safe rejection.
Upload highly compressed test archives.
Verify decompression limits trigger correctly.
Confirm API responsiveness remains unaffected.
Performance Benefits
Improved service stability
Better resource management
Reduced risk of worker crashes
Safer handling of user-supplied files
GSSoC 2026
Program: GSSoC 2026
Type: Security Fix