Steward

Delegate work, not attention.

A workflow-hardening layer for Claude Code, Codex, and other CLI agents. Start with a full agent when the task is fuzzy. Get async progress reports instead of constant interruptions. As the task stabilizes, crystallize the repeatable parts into scripts — keeping LLM calls only where language and judgment are actually needed.

"Not full agent forever. Delegate first, crystallize what repeats."

What It Does

# Add a task from plain English
steward-add-task "Send me a daily digest of new papers on LLM agent eval"

# It runs on a cron loop — no human intervention needed
steward --force paper_watch

# When the workflow stabilizes, crystallize it into a script
steward-promote paper_watch --last 10

Two examples of what Steward manages daily:

Task	What it does	Mode
Paper watch	Search arXiv, score papers, deep-read top picks, update vault	Direct (scripted)
Job follow-up	Sync contacts, draft outreach, check replies, remind on stale leads	Direct (scripted)

Both started as full-agent tasks. Both were promoted to cheap scripts after their workflows stabilized. The agent runs only on failure (self-healing).

Quick Start

Let your AI agent set it up

Point your AI agent at docs/agent-setup.md — it contains step-by-step instructions to bootstrap a working instance.

Manual setup

pip install -e /path/to/steward/
steward-init ./my-tasks
cd my-tasks
steward --force hello_world

Add your own tasks

steward-add-task "Track my reading list and summarize weekly"
steward-add-task "Monitor competitor releases and alert me on changes"

Automate

# Run every hour — Steward handles scheduling internally
0 * * * * STEWARD_INSTANCE_ROOT=/path/to/my-tasks steward >> cron.log 2>&1

CLI Agent Support

Steward is CLI-agent agnostic at every layer. The same provider abstraction governs three places where LLMs run:

Task execution — when the orchestrator spawns a task in agent mode
Self-healing — when the reflector spawns a fix agent for a broken task
In-task LLM calls — when your task code does deep research, web search, scoring, or any other LLM step via steward.llm.call_llm()

Setting STEWARD_LLM_CLI=claude switches all three atomically. Your task's deep research engine starts using Claude Code; the reflector's fix agents start using Claude Code; new task spawns use Claude Code. No code changes anywhere.

Provider	Set as default
Codex (OpenAI)	`export STEWARD_LLM_CLI=codex`
Claude Code (Anthropic)	`export STEWARD_LLM_CLI=claude`
Custom (Gemini, OpenCode, ...)	`export STEWARD_LLM_CLI=my-cli` + template

Auto-detected if not set. Per-task override in charter.yaml:

execution:
  agent: "claude"    # this task always uses Claude Code, even if global default is codex

Custom CLI template via STEWARD_LLM_CMD_TEMPLATE env var — see docs/architecture.md for details.

Why this matters: as CLI tools and pricing change month-to-month, you can switch providers without rewriting your tasks. The same paper_reader that runs on Codex today can run on Claude Code Max tomorrow with one environment variable.

Why not just use Claude Code or Codex directly?

Claude Code, Codex, and Gemini CLI are interactive tools — you sit at the terminal, type a request, watch it work, type the next request. Steward turns them into delegated background workers:

You set up tasks once. They run on a schedule (hourly/daily/weekly) without you opening the terminal.
You read a daily digest, not a transcript. One email summarizing what every task produced today, with action items extracted.
It self-heals across days. When a task breaks, the reflector detects it, spawns a fix agent, re-runs the task, and verifies the output before declaring it resolved. You only see the failure if the reflector can't resolve it.
It uses Claude Code / Codex under the hood. Steward isn't a replacement — it's a thin layer that delegates to the CLI agent you already have.

If you only need to run something once, use Claude Code directly. If you need to run it every morning at 5 AM and only hear about it when something is wrong, that's what Steward is for.

Why not OpenClaw or other autonomous agents?

OpenClaw and similar autonomous agents run continuously, making decisions inside the agent's reasoning loop. Steward inverts this: the workflow is consolidated into a script you can read and edit, not buried in an agent transcript.

Direct control. When paper_reader scores candidates, the logic lives in tasks/paper_reader/scripts/score_candidates.py — open it, tweak a threshold, ship. With an autonomous agent, you adjust prompts and hope.
Amortized cost. Autonomous agents pay full LLM cost on every cycle forever. Steward tasks start in agent mode, then steward-promote crystallizes the stable parts into cheap Python — LLM only stays where judgment is actually needed.
Debugging is git diff, not transcript review. When something breaks, you read the changed code and the failed step's error — not a 200-line reasoning trace.
Changes are version-controlled. Every task lives in a git repo. Your automation history is a commit log, not an agent memory file.

If you want a fully autonomous decision-maker that handles novel situations, use OpenClaw. If you want background workers whose logic you can read, edit, grep, and version-control, use Steward.

How It Differs

Steward is not another always-on agent or cron wrapper. The key difference:

Progressive workflow hardening. Tasks start in exploratory agent mode (expensive, flexible), then promote to direct-mode scripts (cheap, editable) as they stabilize. The steward-promote command analyzes execution history and generates the script for you.

	Always-on agents	Steward
Cost	Full LLM every cycle	LLM only where needed
Control	Agent decides everything	You see exactly what runs
Editability	Modify prompts and hope	Edit Python directly
Failure	Retry the whole agent	Self-healing fixes the specific step
Attention	Watch it constantly	Read the daily digest

Workflow Promotion

# Analyze a task's execution history
steward-promote daily_planner --last 10

# Produces:
#   promotion_report.md    — what's scriptable vs. needs LLM
#   run.py.generated       — candidate direct-mode script
#   charter.promoted.yaml  — updated config

# Or let it happen via email: reply to the daily digest with
#   approve daily_planner
#   reject daily_planner: still need LLM for inbox parsing
#   pause promotion daily_planner

See docs/workflow-promotion.md for the full guide.

Task Configuration

Each task has a charter.yaml:

task_id: "my_task"
name: "Human-readable name"

schedule:
  frequency: "daily"          # hourly | daily | weekly
  max_runtime_minutes: 60

execution:
  agent: "direct"             # codex | claude | direct | custom
  entrypoint: "python run.py"

report:
  digest: true
  own_email:
    enabled: true
    prefix: "[MY-TASK]"

Execution modes

Mode	How it works	LLM cost
Agent	LLM agent runs the whole workflow	High
Direct	Python script drives, calls LLM selectively	Low
Hybrid	Direct mode + automatic self-healing on crash	Low (high on failure)

Communication

Reports delivered via email (SMTP/Gmail). Reply to steer task behavior — corrections, new priorities, pause/resume. The email layer is modular (steward/comm/email.py).

Documentation

Doc	What
guide.md	Full setup and usage guide
docs/architecture.md	System architecture and execution modes
docs/workflow-promotion.md	Agent → direct promotion guide
docs/agent-setup.md	AI agent bootstrapping instructions
docs/comparison.md	vs OpenClaw, autoresearch, Airflow

Requirements

Python 3.10+
A CLI coding agent: Codex, Claude Code, or any custom CLI
pyyaml>=6.0

pip install -e /path/to/steward/
# Installs: steward, steward-init, steward-status, steward-promote, steward-add-task

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
docs		docs
steward		steward
templates		templates
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
agent_loop.sh		agent_loop.sh
guide.md		guide.md
init_cmd.py		init_cmd.py
pyproject.toml		pyproject.toml
status_cmd.py		status_cmd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steward

What It Does

Quick Start

Let your AI agent set it up

Manual setup

Add your own tasks

Automate

CLI Agent Support

Why not just use Claude Code or Codex directly?

Why not OpenClaw or other autonomous agents?

How It Differs

Workflow Promotion

Task Configuration

Execution modes

Communication

Documentation

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Steward

What It Does

Quick Start

Let your AI agent set it up

Manual setup

Add your own tasks

Automate

CLI Agent Support

Why not just use Claude Code or Codex directly?

Why not OpenClaw or other autonomous agents?

How It Differs

Workflow Promotion

Task Configuration

Execution modes

Communication

Documentation

Requirements

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages