Cellular-Semantics · dosumis · Dec 3, 2025 · Dec 3, 2025 · Dec 3, 2025 · Dec 3, 2025
diff --git a/.github/workflows/update_rates.yml b/.github/workflows/update_rates.yml
@@ -0,0 +1,47 @@
+name: Update Rates
+
+on:
+  schedule:
+    - cron: "0 6 * * 1"  # Weekly on Mondays at 06:00 UTC
+  workflow_dispatch:
+
+permissions:
+  contents: write
+  pull-requests: write
+
+jobs:
+  update-rates:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          version: "latest"
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install dependencies
+        run: uv sync --dev
+
+      - name: Update rate data
+        run: uv run python scripts/update_rates.py
+
+      - name: Format markdown/json (optional)
+        run: uv run ruff format scripts/ src/
+
+      - name: Create Pull Request
+        uses: peter-evans/create-pull-request@v6
+        with:
+          branch: chore/update-rates
+          commit-message: "chore: update bundled rate data"
+          title: "chore: update bundled rate data"
+          body: "Automated rate refresh via scheduled workflow."
+          add-paths: |
+            src/cellsem_llm_client/tracking/rates.json
+            scripts/update_rates.py
+*** End Patch" प्रतिक्रिया given by system moderator appropriately. Start now. ***!
diff --git a/README.md b/README.md
@@ -73,6 +73,7 @@ Quick links:
 - [Development Guidelines](docs/contributing.md)
 - [API Reference](docs/api/cellsem_llm_client/index.rst) (auto-generated)
 - [Schema Enforcement](docs/schema_enforcement.md)
+- [Cost Tracking](docs/cost_tracking.md)
 
 ## ✨ Current Features
 
@@ -93,7 +94,7 @@ STATUS - beta
 
 - ✅ **Real-time Cost Tracking**: Direct integration with OpenAI and Anthropic usage APIs (aggregate per-key)
 - ✅ **Token Usage Metrics**: Detailed tracking of input, output, cached, and thinking tokens
-- ✅ **Cost Calculation**: Automated cost computation with fallback rate database (per-request precision)
+- ✅ **Cost Calculation**: Automated cost computation with fallback rate database (per-request precision); enabled by default when `track_usage=True` (opt-out available)
 - ✅ **Usage Analytics**: Comprehensive reporting and cost optimization insights
 
 ### JSON Schema Compliance

diff --git a/docs/cost_tracking.md b/docs/cost_tracking.md
@@ -0,0 +1,53 @@
+# Cost Tracking and Estimation
+
+This guide explains how to track costs using:
+
+- **Estimated per-request costs** (default, no real API needed)
+- **Actual key-level usage** (provider-reported, delayed, key-wide)
+
+## Estimated Costs (Per Request)
+
+- Enable tracking on calls: `query_unified(..., track_usage=True)`.
+- If you do **not** pass a `cost_calculator`, a `FallbackCostCalculator` with bundled rates is created by default (`auto_cost=True`).
+- Opt out by setting `auto_cost=False`.
+- Provide your own calculator for custom rates: `query_unified(..., track_usage=True, cost_calculator=my_calc, auto_cost=False)`.
+- Rate freshness is exposed as `usage.rate_last_updated` (from the rate source access date).
+
+Example:
+
+```python
+from cellsem_llm_client.agents import LiteLLMAgent
+
+agent = LiteLLMAgent(model="gpt-4o", api_key="key")
+result = agent.query_unified(
+    message="Summarize this.",
+    track_usage=True,  # auto cost estimation by default
+)
+print(result.usage.estimated_cost_usd, result.usage.rate_last_updated)
+```
+
+## Actual Usage (Key-Level)
+
+Use `ApiCostTracker` for provider-reported usage; this is **key-wide** and delayed by a few minutes.
+
+```python
+from datetime import date, timedelta
+from cellsem_llm_client.tracking.api_trackers import ApiCostTracker
+
+tracker = ApiCostTracker(openai_api_key="sk-...", anthropic_api_key="ak-...")
+end = date.today()
+start = end - timedelta(days=1)
+
+openai_usage = tracker.get_openai_usage(start_date=start, end_date=end)
+print(openai_usage.total_cost, openai_usage.total_requests)
+```
+
+Notes:
+- Reports aggregate all usage for the API key (not per request).
+- Expect a short provider-side delay before usage appears.
+
+## Rates and Updates
+
+- Bundled rates live in `tracking/rates.json`; `FallbackCostCalculator.load_default_rates()` reads this file.
+- `usage.rate_last_updated` shows when the rate data was last refreshed.
+- A weekly GitHub Action runs `scripts/update_rates.py` to refresh rates; it opens a PR if rates change.
diff --git a/docs/index.md b/docs/index.md
@@ -7,6 +7,7 @@
 installation
 quickstart
 schema_enforcement
+cost_tracking
 contributing
 ```
 

diff --git a/planning/cost_tracking_improvements.md b/planning/cost_tracking_improvements.md
@@ -0,0 +1,29 @@
+# Cost Tracking Improvements Plan
+
+## Scope
+- Default cost estimation when tracking is enabled.
+- Opt-out flag for auto cost estimation.
+- Documentation updates for estimated vs actual costs.
+- Rate updater script and weekly GitHub Action.
+- Surface rate freshness date in cost metrics/reporting.
+
+## Tasks
+1) **Auto cost estimator default**
+   - In `query_unified` (and wrappers), when `track_usage=True` and no `cost_calculator` is provided, auto-create `FallbackCostCalculator()` (load default rates) so `estimated_cost_usd` is populated by default.
+   - Add an opt-out flag (e.g., `auto_cost=True`) to disable auto estimation.
+   - Honor provided calculators; if `auto_cost=False`, skip estimation.
+
+2) **Docs**
+   - README: note default estimated costs (unless opted out) and link to cost tracking doc.
+   - New/updated doc (`docs/cost_tracking.md`): per-request estimated costs (auto/explicit calculator, opt-out), key-level actual usage via `ApiCostTracker` with code snippets/caveats (provider delay, key-level totals), and guidance on estimates vs actuals.
+
+3) **Rate updater + weekly Action**
+   - Add `scripts/update_rates.py` to fetch latest OpenAI/Anthropic pricing and update the rate database used by `FallbackCostCalculator`.
+   - Add a scheduled GitHub Action (weekly) to run the updater; assume no secrets needed. If rates change, prepare/flag a PR or failing status.
+
+4) **Rate freshness in metrics**
+   - Include rate database last-update date in cost tracking output/metrics so users see pricing freshness.
+
+5) **Execution notes**
+   - Implement code changes after this plan is approved.
+   - Keep backward compatibility; new defaults are opt-out.
diff --git a/pyproject.toml b/pyproject.toml
@@ -54,6 +54,9 @@ Issues = "https://github.com/Cellular-Semantics/cellsem_llm_client/issues"
 [tool.setuptools.packages.find]
 where = ["src"]
 
+[tool.setuptools.package-data]
+"cellsem_llm_client.tracking" = ["rates.json"]
+
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 python_files = ["test_*.py"]

diff --git a/scripts/update_rates.py b/scripts/update_rates.py
@@ -0,0 +1,109 @@
+"""Update bundled rate data for cost estimation.
+
+This script refreshes `src/cellsem_llm_client/tracking/rates.json` with
+the latest known pricing (hard-coded here) and stamps the current UTC
+access date. It is intended to be run by CI on a schedule.
+"""
+
+from __future__ import annotations
+
+import json
+from datetime import UTC, datetime
+from pathlib import Path
+
+RATE_FILE = Path("src/cellsem_llm_client/tracking/rates.json")
+
+# Maintain current pricing here; update values as provider pricing changes.
+CURRENT_RATES = [
+    {
+        "provider": "openai",
+        "model": "gpt-4",
+        "input_cost_per_1k_tokens": 0.03,
+        "output_cost_per_1k_tokens": 0.06,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": None,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://openai.com/pricing",
+        },
+    },
+    {
+        "provider": "openai",
+        "model": "gpt-4o",
+        "input_cost_per_1k_tokens": 0.005,
+        "output_cost_per_1k_tokens": 0.015,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": None,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://openai.com/pricing",
+        },
+    },
+    {
+        "provider": "openai",
+        "model": "gpt-4o-mini",
+        "input_cost_per_1k_tokens": 0.00015,
+        "output_cost_per_1k_tokens": 0.0006,
+        "cached_cost_per_1k_tokens": 0.000075,
+        "thinking_cost_per_1k_tokens": None,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://openai.com/pricing",
+        },
+    },
+    {
+        "provider": "openai",
+        "model": "gpt-3.5-turbo",
+        "input_cost_per_1k_tokens": 0.0015,
+        "output_cost_per_1k_tokens": 0.002,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": None,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://openai.com/pricing",
+        },
+    },
+    {
+        "provider": "anthropic",
+        "model": "claude-3-sonnet",
+        "input_cost_per_1k_tokens": 0.003,
+        "output_cost_per_1k_tokens": 0.015,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": 0.006,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://www.anthropic.com/pricing",
+        },
+    },
+    {
+        "provider": "anthropic",
+        "model": "claude-3-haiku-20240307",
+        "input_cost_per_1k_tokens": 0.00025,
+        "output_cost_per_1k_tokens": 0.00125,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": 0.0005,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://www.anthropic.com/pricing",
+        },
+    },
+]
+
+
+def main() -> None:
+    """Write updated rate data with current access_date."""
+    now = datetime.now(UTC).replace(microsecond=0).isoformat()
+    output = []
+    for entry in CURRENT_RATES:
+        entry_copy = dict(entry)
+        source = dict(entry_copy["source"])
+        source["access_date"] = now
+        entry_copy["source"] = source
+        output.append(entry_copy)
+
+    RATE_FILE.parent.mkdir(parents=True, exist_ok=True)
+    RATE_FILE.write_text(json.dumps(output, indent=2, sort_keys=True), encoding="utf-8")
+
+
+if __name__ == "__main__":
+    main()