Newscast AI generates a personalised daily audio news briefing for each user. It pulls articles from RSS feeds and live SearxNG web searches, scores and ranks them against per-user topic preferences, summarises the top stories with a local or API-served LLM, writes a broadcast-style script, and synthesises an MP3 episode served over a private RSS feed — all on infrastructure you run yourself. No cloud AI APIs required; a single consumer GPU is enough.
Services:
| Service | Technology | Port | Role |
|---|---|---|---|
api |
FastAPI | 8000 | User preferences, episode retrieval, RSS |
worker |
Celery + APScheduler | 8001 | Episode generation orchestration |
mcp |
FastAPI | 7000 | LLM inference, TTS, summarisation |
nginx |
nginx:alpine | 8080 | Reverse proxy + audio file serving |
postgres |
postgres:16 | 5432 | Primary database |
redis |
redis:7 | 6379 | Celery broker + result backend |
searxng |
searxng/searxng | — | Web search for agentic ingestion |
minio |
minio | 9000 | S3-compatible audio object storage |
- Docker ≥ 24 and Docker Compose ≥ 2.20
- A
.envfile in the project root (copy.env.exampleand fill in values)
cp .env.example .envMinimum required values in .env:
POSTGRES_PASSWORD=changeme
REDIS_URL=redis://redis:6379/0
MCP_URL=http://mcp:7000
SECRET_KEY=<random 32-byte hex>
# LLM — pick one of the three provider modes (see Model Configuration below)
HOSTIFY_PROVIDER=api
OPENAI_API_KEY=sk-... # or "EMPTY" when pointing at a local vLLM server
HOSTIFY_MODEL=Qwen/Qwen2.5-7B-Instructdocker compose up --buildThis starts: postgres, redis, searxng, minio, api (port 8000), worker (port 8001), mcp (port 7000), and nginx (port 8080).
# Register preferences
curl -s -X POST http://localhost:8000/users \
-H "Content-Type: application/json" \
-d '{"schedule_time":"07:30","topics":["AI","Canada","Finance"],"max_duration_min":7,"voice":"en_US"}' \
| jq .
# Poll until the episode is ready (user_id returned above)
curl -s http://localhost:8000/episodes/1/latest | jq .
# Fetch the RSS feed
curl -s http://localhost:8080/feed/1.rssThe worker generates an episode immediately on user creation. Subsequent
episodes fire automatically at schedule_time each day.
docker compose down # stop containers, keep volumes
docker compose down -v # stop containers and delete volumes (wipes DB + audio)Model selection is controlled by environment variables read by services/mcp/settings.py.
| Tier | HOSTIFY_PROVIDER |
Model | VRAM | Use case |
|---|---|---|---|---|
| Production | api |
Qwen/Qwen2.5-7B-Instruct (vLLM) |
~14 GB | Server GPU, full quality |
| Large | api |
Qwen/Qwen2.5-14B-Instruct (vLLM) |
~28 GB | Two-GPU or A10G |
| Local dev | local |
unsloth/Qwen2.5-7B-Instruct-bnb-4bit |
~6 GB | Single consumer GPU |
| CPU fallback | cpu |
facebook/bart-large-cnn |
None | No GPU; pipeline smoke-tests only |
Set HOSTIFY_PROVIDER in your .env:
HOSTIFY_PROVIDER=api # OpenAI-compatible endpoint (default)
HOSTIFY_PROVIDER=local # load model in-process (needs GPU)
HOSTIFY_PROVIDER=cpu # CPU-only BART pipeline (not suitable for production)To point api mode at a local vLLM server instead of OpenAI:
OPENAI_API_KEY=EMPTY
OPENAI_BASE_URL=http://localhost:8000/v1
HOSTIFY_MODEL=Qwen/Qwen2.5-7B-InstructQwen2.5-7B-Instruct ranked first in the 7–8B weight class on the Open LLM
Leaderboard (January 2026) for instruction following and structured JSON output
— both critical for the CritiqueAgent and HumanificationAgent nodes in the
LangGraph hostify pipeline. The full rationale and benchmark citations are in
services/mcp/settings.py.
Intentional service duplication over a shared package.
base_agent.py exists in both services/mcp/ and services/worker/ with an
explicit sync contract rather than a services/common/ package. A shared
package requires coordinated Docker build changes and introduces import path
fragility for only two consumers. The refactor trigger is clear: a third
service needs BaseAgent, or the class exceeds ~50 lines.
Synchronous Celery tasks, async MCP calls.
summarizer_client.py uses blocking requests because Celery workers are
synchronous processes. agent_search.py uses aiohttp because it runs
inside an asyncio event loop. Mixing them would block the event loop or
require a full Celery async migration — neither is warranted at current load.
Single LLM round-trip for agentic search.
choose_and_summarize() collapses topic ranking and article summarisation into
one structured prompt. A two-step approach would re-send the same candidate
articles twice, doubling latency and token cost. The strict JSON output schema
makes the result directly usable downstream without a parsing step.
Progressive time-window fallback in RSS ingestion.
NewsAgent retries with expanding windows (7d → 30d → 1y) before returning
no_news_today. The narrow default catches fresh articles first; widening on
retry keeps the episode non-empty during slow news cycles without permanently
degrading freshness for active topics.
Each service can be started independently against a locally-running postgres
and redis (or use docker compose up postgres redis to start just the
infrastructure).
# Install dependencies
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Set environment variables
export DATABASE_URL="postgresql://user:pass@localhost:5432/newscast"
export REDIS_URL="redis://localhost:6379/0"
export MCP_URL="http://localhost:7000"
export OPENAI_API_KEY="sk-..."
# API service
uvicorn services.api.app:app --reload --port 8000
# MCP service (separate terminal)
uvicorn services.mcp.server:app --reload --port 7000
# Worker scheduler server (separate terminal)
uvicorn services.worker.server:app --port 8001
# Celery worker (separate terminal)
celery -A services.worker.tasks.celery worker --loglevel=infoblack services/ # format
isort services/ # sort imports
mypy services/ # type checkConfiguration for all three tools lives in pyproject.toml.
python -m services.worker.ingestionRuns the CLI entry point in ingestion.py and logs scored articles — useful
for verifying feed connectivity and topic scoring without starting the full
stack.
