Periodic updater service that automatically regenerates llms.txt files for websites on a configurable schedule. Acts as a cron-like daemon that monitors the database and triggers updates via the API.
The cron-ltx crate provides:
- Periodic polling: Checks the database at regular intervals for websites needing updates
- Automatic job creation: Submits update jobs to the API server
- Authenticated requests: Handles authentication when the API requires it
- TLS support: Makes secure HTTPS requests to the API
- Configurable scheduling: Adjustable poll intervals via environment variables
- Graceful operation: Handles failures and retries appropriately
src/cron-ltx/
├── src/
│ ├── main.rs # Service entry point, main polling loop
│ ├── lib.rs # Library exports
│ ├── process.rs # Core update scheduling logic
│ ├── auth_client.rs # HTTP client with authentication support
│ └── errors.rs # Error types
└── Cargo.toml
The cron service operates in a continuous loop:
-
Poll Database: Queries for websites that need updating
- Checks for sites that haven't been updated recently
- Respects configurable update intervals
- Prioritizes older entries
-
Create Update Jobs: For each website needing an update:
- Makes authenticated POST request to
/api/jobs - Provides the website URL
- Receives job ID confirmation
- Makes authenticated POST request to
-
Sleep: Waits for the configured poll interval before next cycle
-
Repeat: Continues indefinitely until stopped
Configure via environment variables:
-
DATABASE_URL: PostgreSQL connection string (required)- Example:
postgres://ltx_user:ltx_password@postgres:5432/ltx_db
- Example:
-
HOST: API server hostname (default:localhost) -
PORT: API server port (default:3000)
CRON_POLL_INTERVAL_S: Polling interval in seconds (default:300= 5 minutes)- How often to check for websites needing updates
- Adjust based on update frequency requirements
- Lower values = more frequent checks = higher load
When ENABLE_AUTH=1 on the API server, the cron service must authenticate:
ENABLE_AUTH: Set to1to enable authenticationAUTH_PASSWORD: Plain text password for authentication (required if auth enabled)AUTH_PASSWORD_HASH: Password hash (used for verification)SESSION_SECRET: Secret for session validation
ACCEPT_INVALID_CERTS: Set totruefor development with self-signed certificates- Must be
falseor unset in production - Only use when working with locally-generated certificates
- Must be
RUST_LOG: Logging level (default:info)- Use
debugfor detailed operational logs - Use
tracefor maximum verbosity
- Use
# Development build
cargo build -p cron-ltx
# Production build
cargo build -p cron-ltx --releaseThe cron service is automatically started with the full stack:
# Development mode
docker compose up
# The cron service will start polling every 5 minutes (default)Configure via environment variables in docker-compose.yml or .env file.
Requires PostgreSQL and API server running:
# Set environment variables
export DATABASE_URL='postgresql://ltx_user:ltx_password@localhost/ltx_db'
export HOST='localhost'
export PORT='3000'
export CRON_POLL_INTERVAL_S='300'
# If authentication is enabled on the API
export ENABLE_AUTH='1'
export AUTH_PASSWORD='your_password'
source ./make_password_and_export_env.sh "$AUTH_PASSWORD"
# If using self-signed certificates (development only)
export ACCEPT_INVALID_CERTS='true'
# Run the service
cargo run -p cron-ltx# Run unit tests
cargo test -p cron-ltx
# Run with coverage
just testWhen the API server has authentication enabled:
- Login Request: On startup, sends POST to
/auth/loginwith password - Session Cookie: Receives and stores session cookie
- Authenticated Requests: Includes session cookie in all subsequent API calls
- Session Renewal: Automatically handles session expiration and re-authenticates
The auth_client module encapsulates this logic, providing a simple interface for authenticated HTTP requests.
The service logs important events:
INFO cron_ltx: Starting cron service with poll interval: 300s
DEBUG cron_ltx: Found 3 websites needing updates
DEBUG cron_ltx: Creating update job for https://example.com
INFO cron_ltx: Successfully created job abc-123 for https://example.com
INFO cron_ltx: Sleeping for 300 seconds until next poll
Enable debug logging for more detail:
RUST_LOG=cron_ltx=debug cargo run -p cron-ltxThe service determines which websites need updating based on:
- Last update timestamp: Sites not updated recently are prioritized
- Update interval: Configurable per-site update frequency
- Job status: Only creates new jobs if no pending job exists
- Failure handling: Backs off on repeated failures
The exact scheduling logic is implemented in src/process.rs.
The service handles various failure scenarios:
- Database connection failures: Retries on next poll cycle
- API unavailable: Logs error and continues to next cycle
- Authentication failures: Attempts to re-authenticate
- Network errors: Logs and retries later
- Invalid responses: Logs detailed error information
The service is designed to be resilient and continue operating despite transient failures.
-
Poll interval: Balance between update freshness and system load
- Too frequent: Unnecessary database queries and API requests
- Too infrequent: Outdated llms.txt files
- Recommended: 5-15 minutes for most use cases
-
Database queries: Optimized with indexes on update timestamps
-
Concurrent jobs: Currently sequential, could be parallelized
-
Memory usage: Minimal, processes sites one at a time
- Use a process supervisor (systemd, supervisord, Docker) to ensure the service stays running
- Set
ACCEPT_INVALID_CERTS=false(or unset) - Use proper CA-signed certificates for the API server
- Configure appropriate poll intervals for your scale
- Monitor logs for authentication or connection issues
- Consider alerting on repeated failures
For high-volume deployments:
- Run multiple cron service instances with distributed locking
- Implement job deduplication in the database
- Consider using a proper job queue (Redis, RabbitMQ) instead of polling
- Add metrics and monitoring (Prometheus, Grafana)
Key dependencies:
tokio: Async runtime and timersdiesel+diesel-async: Database queriesreqwest: HTTP client for API requestsserde: JSON serializationtracing: Structured loggingcore-ltx: Common utilities (auth config, TLS, logging)
See Cargo.toml for the complete dependency list.
Check:
- Database connection is working (
diesel migration run) - API server is running and reachable
- Authentication credentials are correct (if enabled)
- Websites in database actually need updates
- Check logs with
RUST_LOG=debug
Ensure:
AUTH_PASSWORDmatches the password used by the API serverSESSION_SECRETis consistent across services- API server has
ENABLE_AUTH=1set - Session hasn't expired (check
SESSION_DURATION_SECONDS)
For development:
- Set
ACCEPT_INVALID_CERTS=true - Use self-signed certificates generated by
./make_tls_cert.sh
For production:
- Use proper CA-signed certificates
- Set
ACCEPT_INVALID_CERTS=falseor unset it - Verify certificate paths in API server configuration
- Project Root README - Overall project documentation
- api-ltx README - API server that cron interacts with
- worker-ltx README - Worker that processes the jobs cron creates
- core-ltx README - Common utilities used by cron
- data-model-ltx README - Database models used by cron