Skip to content

fix(security): prevent SSRF and path traversal in URL fetching and file output#432

Open
l3tchupkt wants to merge 3 commits intogoogle:mainfrom
l3tchupkt:security/fix-ssrf-path-traversal
Open

fix(security): prevent SSRF and path traversal in URL fetching and file output#432
l3tchupkt wants to merge 3 commits intogoogle:mainfrom
l3tchupkt:security/fix-ssrf-path-traversal

Conversation

@l3tchupkt
Copy link
Copy Markdown

Summary

This PR fixes two security issues in langextract:

  1. SSRF via unrestricted URL fetching in extract()
  2. Path traversal in save_annotated_documents()

Details

SSRF Protection

  • Added validation to block internal and reserved IP ranges
  • Prevents access to localhost, private networks, and cloud metadata endpoints
  • Includes DNS resolution checks to mitigate DNS rebinding attacks

Path Traversal Protection

  • Sanitized output_name to remove directory components
  • Ensured resolved output paths remain within output_dir

Impact

These issues could allow:

  • Access to internal services or cloud metadata (credential exposure)
  • Writing files outside the intended output directory

Notes

These changes are backward-compatible for normal usage.
If internal URL access is required, an explicit allowlist can be added in future updates.

Author

Lakshmikanthan K (letchupkt)
linkedin.com/in/lakshmikanthank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant