A Django REST API that extracts URLs from XML sitemaps asynchronously using Celery and Redis.
- Extract URLs from XML sitemaps
- Supports sitemap indexes and nested sitemaps
- Asynchronous processing with Celery
- Redis-based task queue and caching
- Task status polling
- Guest and authenticated user support
- Pagination for large sitemap results
- Database persistence of extraction results
- Automatic expiration of stored results
- Django
- Django REST Framework
- Celery
- Redis
- SQLite (Development)
- Docker (Optional)
git clone <repository-url>
cd <project-name>python -m venv .venv.venv\Scripts\activatesource .venv/bin/activatepip install -r requirements.txtCreate a .env file:
SECRET_KEY=your-secret-key
DEBUG=True
REDIS_URL=redis://localhost:6379/0python manage.py makemigrations
python manage.py migratedocker run -d --name redis -p 6379:6379 redisVerify:
docker pspython manage.py runserverApplication:
http://127.0.0.1:8000/
celery -A sitemap worker -l infocelery -A sitemap worker -l info --pool=soloPOST /api/sitemap-extractor/Request:
{
"url": "https://example.com"
}Response:
{
"task_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "processing"
}GET /api/sitemap-task/<task_id>/Possible responses:
{
"status": "PENDING"
}{
"status": "STARTED"
}{
"status": "FAILURE",
"error": "Error message"
}GET /api/sitemap-task/<task_id>/?page=1Response:
{
"status": "SUCCESS",
"count": 2500,
"page": 1,
"page_size": 100,
"total_pages": 25,
"results": [
"https://example.com/page-1",
"https://example.com/page-2"
]
}- Maximum URLs: 1,000
- Cached responses supported
- Maximum URLs: 100,000
- Results stored and reused until expiration
Guest user results are cached in Redis:
sitemap:<domain>
Cache timeout:
1 hour
Each extraction stores:
- User ID
- IP Address
- Domain
- Extracted URLs
- Total URL Count
- Status
- Creation Timestamp
- Expiration Timestamp
Statuses:
pending
completed
failed
docker restart <celery-container>or
celery -A sitemap worker -P threads -c 8 -l info celery -A sitemap purge - Celery Beat cleanup task
- Rate limiting
- Export results as CSV
- Monitoring and metrics