agentprod

Production patterns for indie AI agents — extracted from running a multi-LLM trading agent in production.

agentprod is a small Python library of the four things you reach for once your AI agent leaves your laptop and starts charging your credit card at 3 AM:

Module	What it gives you	Why it exists
`Router`	Cost-aware model selection (cheapest model that meets the quality bar)	Burning Sonnet on "what is the price?" is how you go bankrupt
`Throttle`	Async token bucket with jitter + hard timeout	Provider rate limits don't just slow you down, they cascade
`retry_call` / `retry_async`	Pattern-based detection of transient failures	LLM SDKs change their exception classes every release; the error string is stable
`CostTracker`	Per-call USD ledger with arbitrary labels (agent, user, route)	"Which agent burned $40 last night" is a question the provider dashboard can't answer

No hard dependency on LangChain / LangGraph / OpenAI SDK. Bring your own LLM client. agentprod just gives you the production scaffolding around it.

Status

Alpha (v0.0.1). APIs may change before 1.0. Battle-tested in one production system; tests cover the core paths but the public surface is intentionally small until usage shapes it.

Install

# Pure stdlib — no required deps
pip install agentprod

# With tenacity for richer retry semantics
pip install "agentprod[retry]"

Python 3.10+.

Quickstart

The full example is in examples/quickstart.py. Skeleton:

import asyncio
from agentprod import (
    Complexity, Router, Throttle, retry_async,
    CostTracker, ModelPricing,
)

router = Router(model_for={
    Complexity.SIMPLE:   "gpt-4o-mini",
    Complexity.MODERATE: "gpt-4o",
    Complexity.COMPLEX:  "claude-sonnet-4-6",
})
throttle = Throttle(capacity=10, refill_per_sec=10)
PRICING = {
    "gpt-4o-mini": ModelPricing(input_per_1k=0.00015, output_per_1k=0.0006),
    "gpt-4o":      ModelPricing(input_per_1k=0.0025,  output_per_1k=0.01),
}
cost = CostTracker(jsonl_path=".data/cost.jsonl")

async def handle(query: str, *, agent: str) -> str:
    model = router.select(query)
    await throttle.acquire(timeout=1.0, label=f"llm:{model}")
    text, in_tok, out_tok = await retry_async(
        lambda: your_llm_call(model, query),
        max_attempts=3,
    )
    cost.record(
        model=model,
        input_tokens=in_tok, output_tokens=out_tok,
        pricing=PRICING[model],
        labels={"agent": agent},
    )
    return text

Each piece in 30 seconds

Router — cost-aware model selection

Pick the cheapest model that can handle the query:

from agentprod import Complexity, Router

router = Router(
    model_for={
        Complexity.SIMPLE:   "gpt-4o-mini",
        Complexity.MODERATE: "gpt-4o",
        Complexity.COMPLEX:  "claude-sonnet-4-6",
    },
    # Optional: bump domain terms to a higher tier
    complex_keywords=("DCF", "valuation", "portfolio"),
    simple_keywords=("price of", "ticker"),
)

router.select("what is the price of AAPL?")
# → "gpt-4o-mini"

router.select("compare AAPL and MSFT cash flow over 5 years")
# → "claude-sonnet-4-6"

Three-tier classifier (simple / moderate / complex) using:

Simple-keyword regex (wins over everything — short queries shouldn't hit the expensive model just because they happen to contain a long word)
Complex-keyword count
Word-count thresholds (CJK width-aware — works on Korean / Japanese / Chinese mixed input)

Throttle — asyncio token bucket

from agentprod import Throttle, ThrottleTimeout

bucket = Throttle(
    capacity=12,             # max burst size
    refill_per_sec=12,       # sustained rps
    jitter_ms=(5, 30),       # avoid thundering herd
    on_acquire=lambda r: log.info("throttle wait: %s", r),
)

try:
    await bucket.acquire(timeout=1.0, label="GET /quote")
    # ... make your call ...
except ThrottleTimeout:
    # bucket couldn't free a slot in time — drop and try next cycle
    return None

Why not aiolimiter / asyncio-throttle? Two things:

Hard timeout with explicit exception. Burst > timeout is usually a signal to drop the request, not to keep waiting.
Metrics callback. Sync or async, exceptions swallowed. You ship throttle waits to your observability stack without wrapping the bucket.

retry — pattern-based transient detection

from agentprod import is_retryable, retry_call, retry_async

# Decision function — drop into any retry library
if is_retryable(exc):
    ...

# Or use the wrapper (uses tenacity if installed, manual backoff otherwise)
result = retry_call(
    lambda: openai_client.chat.completions.create(...),
    max_attempts=3,
)

# Async version
result = await retry_async(
    lambda: anthropic_client.messages.create(...),
    max_attempts=3,
)

Default patterns cover OpenAI / Anthropic / Google / bare httpx error strings: rate limit, 429, 500, 502, 503, overloaded, timeout, server error, too many requests, connection reset.

Why string matching: provider SDKs reshuffle their exception classes every release. The message is the most stable contract.

CostTracker — per-call ledger with labels

from agentprod import CostTracker, ModelPricing

pricing = ModelPricing(
    input_per_1k=0.0025,
    output_per_1k=0.01,
    cached_input_per_1k=0.00125,  # optional, for providers with prompt caching
)

tracker = CostTracker(jsonl_path=".data/cost.jsonl")

tracker.record(
    model="gpt-4o",
    input_tokens=1234, output_tokens=567, cached_input_tokens=800,
    pricing=pricing,
    labels={"agent": "fundamental_analyst", "user": "u_123", "route": "/analyze"},
)

tracker.total_usd()                       # 12.4583
tracker.total_usd(where={"user": "u_123"})  # 0.42
tracker.by_label("agent")                  # {"fundamental_analyst": 0.42, ...}
tracker.by_model()                         # {"gpt-4o": 12.4583}

Why bring your own pricing: model prices change weekly. A library that ships its own catalog goes stale fast.

Why these four

These are the four pieces I rebuilt in three different agent codebases before deciding to extract them once. Every production AI agent eventually needs:

Cost discipline at the routing layer. Per-call cost discipline alone isn't enough — by the time you see a $300 bill, the spend is sunk. Routing is where the economics start.
Rate-limit resilience that doesn't cascade. A single 429 turns into 50 once your retries pile up. Token bucket + hard timeout breaks the cascade.
Retry that survives SDK upgrades. I've had three OpenAI SDK upgrades break my retry code because the exception classes moved. String matching the message has outlived all of them.
Cost attribution by label, not just total. "We spent $40 last night" is useless. "The fundamental_analyst agent spent $38 on retries against gpt-4o" is fixable.

Everything else in your agent is your business logic and shouldn't live in a library.

Non-goals

No LLM client wrapping. Use OpenAI / Anthropic / LangChain / your own. agentprod gives you the scaffolding around the call, not the call itself.
No model catalog. Prices change too fast.
No vector DB / RAG / evaluation. Different problem domain.
No multiprocessing. The Throttle is asyncio-only by design. If you need cross-process throttling, you want Redis-backed leaky bucket.

Development

git clone https://github.com/whdrnr2583-cmd/agentprod
cd agentprod
pip install -e ".[dev]"
pytest

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
src/agentprod		src/agentprod
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentprod

Status

Install

Quickstart

Each piece in 30 seconds

Router — cost-aware model selection

Throttle — asyncio token bucket

retry — pattern-based transient detection

CostTracker — per-call ledger with labels

Why these four

Non-goals

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agentprod

Status

Install

Quickstart

Each piece in 30 seconds

Router — cost-aware model selection

Throttle — asyncio token bucket

retry — pattern-based transient detection

CostTracker — per-call ledger with labels

Why these four

Non-goals

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages