Skip to content

fix(security): Add URL validation and rate limiting to crawl endpoint#640

Open
joelrb wants to merge 3 commits into
bettergovph:mainfrom
joelrb:fix/crawl-endpoint-security
Open

fix(security): Add URL validation and rate limiting to crawl endpoint#640
joelrb wants to merge 3 commits into
bettergovph:mainfrom
joelrb:fix/crawl-endpoint-security

Conversation

@joelrb
Copy link
Copy Markdown

@joelrb joelrb commented May 14, 2026

Description

Problem

The /api/crawl endpoint was vulnerable to several security threats:

  1. SSRF (Server-Side Request Forgery) - Could be used to access internal networks and sensitive systems
  2. Rate limiting abuse - No protection against excessive API calls leading to service disruption
  3. Domain validation bypass - Could crawl any domain, potentially violating terms of service or accessing sensitive content
  4. Hardcoded credentials - Cloudflare account IDs exposed in configuration files

Fix Applied

  1. URL Validation - Restricted crawl access to .gov.ph domains only to prevent SSRF attacks and ensure mission alignment
  2. Rate Limiting - Implemented 10 requests per minute per IP using KV storage to prevent abuse
  3. Configuration Security - Removed hardcoded Cloudflare IDs and moved to environment variables
  4. Enhanced Error Handling - Added graceful error handling with informative messages
  5. CORS Security - Fixed CORS preflight response status codes and restricted allowed methods

Framework Change

No framework changes - this is a pure security enhancement to existing Cloudflare Workers functionality.

Files Changed

  • functions/api/crawl.ts - Added URL validation, rate limiting, and improved error handling
  • functions/index.ts - Fixed CORS preflight status code
  • wrangler.jsonc - Removed hardcoded Cloudflare IDs and added environment variable placeholders
  • .env.example - Added documentation for required environment variables

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • CI/CD changes
  • Performance improvements

Related Issues

Fixes #security-vulnerabilities-ssrf-rate-limiting
Related to #government-service-security

Testing

Describe the testing performed for this PR:

  • Unit tests pass
  • Integration tests pass
  • Manual testing performed
  • All linting passes
  • Type checking passes
  • Dead code check passes

Include specific test scenarios or commands used.

Manual test:

  • Verified URL validation blocks non-.gov.ph domains, localhost, and IP addresses
  • Verified rate limiting works (10 requests/minute per IP)
  • Verified HTTP method validation (blocks POST/PUT/DELETE, allows GET/OPTIONS)
  • Verified CORS headers are properly configured
  • Tested error handling with missing database dependencies
  • Confirmed no hardcoded IDs exposed in status endpoint

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have tested my changes locally
  • I have updated AGENTS.md if needed (for architectural changes)
  • I have added appropriate labels (Priority, Type, Area)
  • No new security vulnerabilities introduced

Performance & Security

  • Performance impact considered (no significant regressions)
  • Dependencies audited (no known vulnerabilities)
  • Secrets not exposed
  • Error handling implemented appropriately

Screenshots (if applicable)

Add screenshots to help explain your changes.

Additional Notes

This security enhancement maintains the public service nature of BetterGov.ph while preventing abuse and protecting against common web vulnerabilities. The crawl endpoint now safely allows access to Philippine government websites while blocking potential security threats.

Joel Ballesteros added 3 commits May 14, 2026 19:32
- Add proper error handling for crawl endpoint with graceful failures
- Fix CORS preflight response status from 200 to 204 (standard)
- Improve error messages for database and API failures
- Add fallback behavior when cached content is not available
- Fix main index.ts CORS preflight handling
@DaijobuDes DaijobuDes added enhancement New feature or request low priority This project can take it easy, no to long deadline. security labels May 14, 2026
@KishonShrill KishonShrill requested review from clrke and jasontorres May 15, 2026 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request low priority This project can take it easy, no to long deadline. security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants