Skip to content

fix: Secure Resume Upload Processing Against Malicious PDF and DOCX Files#5685

Open
Krishnx21 wants to merge 7 commits into
JhaSourav07:mainfrom
Krishnx21:fix/5489-resume-parser-isolation
Open

fix: Secure Resume Upload Processing Against Malicious PDF and DOCX Files#5685
Krishnx21 wants to merge 7 commits into
JhaSourav07:mainfrom
Krishnx21:fix/5489-resume-parser-isolation

Conversation

@Krishnx21

Copy link
Copy Markdown
Contributor

Fixes #5489

Description

This PR addresses a critical security and reliability issue in the resume upload pipeline where untrusted PDF and DOCX files are parsed directly within the application process.

The current implementation reads uploaded files into memory and processes them using document parsing libraries (pdf-parse and mammoth) without isolation, resource constraints, or decompression safeguards. Since DOCX files are ZIP-based archives and PDFs can contain complex structures, specially crafted files can consume excessive CPU and memory resources, potentially causing denial-of-service conditions.

Affected Components
app/api/student/resume/upload/route.ts
lib/resume-parser.ts
Changes Made

  1. Isolated Document Processing

Moved document parsing into a dedicated worker process instead of executing directly within the request handler.

Benefits:

Prevents parser failures from affecting API availability
Isolates resource-intensive operations
Improves application stability
2. Added Parsing Timeouts

Introduced execution time limits for document processing.

Features:

Automatic termination of long-running parsing tasks
Protection against parser hangs
Reduced risk of CPU exhaustion attacks
3. Added Memory Usage Controls

Implemented safeguards to prevent excessive memory allocation during parsing.

Protections include:

Maximum parser memory limits
Large-buffer rejection
Controlled extraction operations
4. Added DOCX Archive Validation

Introduced ZIP archive inspection before DOCX processing.

Validation includes:

Maximum file count checks
Maximum extracted size checks
Compression ratio validation
ZIP bomb detection
5. Added PDF Safety Checks

Implemented PDF validation before parsing.

Checks include:

Maximum page limits
Maximum object limits
File structure validation
Rejection of suspicious PDFs
6. Added Extracted Content Limits

Restricted the amount of text that can be extracted from uploaded documents.

Benefits:

Prevents oversized extraction payloads
Reduces memory pressure
Improves parser performance
7. Improved Error Handling

Added safer failure paths for malformed documents.

The system now:

Rejects invalid files gracefully
Returns meaningful validation errors
Avoids application crashes caused by parser exceptions
Security Impact

This PR mitigates:

PDF parser denial-of-service attacks
DOCX ZIP bomb attacks
CPU exhaustion attacks
Memory exhaustion attacks
Long-running parser abuse
Worker instability caused by malformed documents

Testing
Verified Scenarios
✅ Valid PDF resumes parse successfully
✅ Valid DOCX resumes parse successfully
✅ Malformed PDFs are rejected safely
✅ Malformed DOCX files are rejected safely
✅ Oversized extraction attempts are blocked
✅ Timeout protection terminates long-running parses
✅ ZIP bomb detection prevents excessive expansion
✅ Application remains responsive during parsing failures
Manual Testing
Upload a valid PDF resume.
Verify successful parsing.
Upload a valid DOCX resume.
Verify successful parsing.
Upload malformed documents.
Confirm safe rejection.
Upload highly compressed test archives.
Verify decompression limits trigger correctly.
Confirm API responsiveness remains unaffected.
Performance Benefits
Improved service stability
Better resource management
Reduced risk of worker crashes
Safer handling of user-supplied files

GSSoC 2026
Program: GSSoC 2026
Type: Security Fix

@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

@Krishnx21 is attempting to deploy a commit to the jhasourav07's projects Team on Vercel.

A member of the Team first needs to authorize it.

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@Krishnx21

Copy link
Copy Markdown
Contributor Author

@Aamod007 @JhaSourav07 check it out ..

@Aamod007 Aamod007 added mentor:Aamod007 gssoc:approved PR has been reviewed and accepted for valid contribution points labels Jun 15, 2026

@Aamod007 Aamod007 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong security hardening. �pp/api/student/resume/upload/route.ts now sanitizes filenames, validates extension vs MIME type, and blocks encrypted or malformed inputs before parsing, while lib/resume-parser.ts adds worker isolation, timeout handling, and ZIP/PDF structure checks to keep untrusted documents from destabilizing the app.

@Aamod007 Aamod007 added level:advanced Complex contributions involving architecture, optimization, or significant feature work quality:clean PR follows clean coding practices, proper formatting, documentation, and maintainability standards. type:security Security fixes, dependency updates, or hardening type:bug Something isn't working as expected type:testing Adding, updating, or fixing tests labels Jun 15, 2026
@github-actions github-actions Bot added this to the GSSoC 2026 milestone Jun 15, 2026
@JhaSourav07 JhaSourav07 removed the gssoc:approved PR has been reviewed and accepted for valid contribution points label Jun 17, 2026
@Aamod007 Aamod007 added the level:critical High-priority or mission-critical contributions affecting core systems, security, or infrastructure label Jun 18, 2026

@Aamod007 Aamod007 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difficulty: critical – Secures Resume Upload Processing Against Malicious PDF and DOCX Files.

Quality: clean – Comprehensive security.

Type: security – File upload hardening.

Excellent security work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

level:advanced Complex contributions involving architecture, optimization, or significant feature work level:critical High-priority or mission-critical contributions affecting core systems, security, or infrastructure mentor:Aamod007 quality:clean PR follows clean coding practices, proper formatting, documentation, and maintainability standards. type:bug Something isn't working as expected type:security Security fixes, dependency updates, or hardening type:testing Adding, updating, or fixing tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Resume Upload Parses Untrusted PDF and DOCX Files In-Process

3 participants