Cellular-Semantics · dosumis · Dec 3, 2025 · Dec 3, 2025 · Dec 3, 2025 · Dec 3, 2025
diff --git a/.github/workflows/update_rates.yml b/.github/workflows/update_rates.yml
@@ -0,0 +1,47 @@
+name: Update Rates
+
+on:
+  schedule:
+    - cron: "0 6 * * 1"  # Weekly on Mondays at 06:00 UTC
+  workflow_dispatch:
+
+permissions:
+  contents: write
+  pull-requests: write
+
+jobs:
+  update-rates:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          version: "latest"
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install dependencies
+        run: uv sync --dev
+
+      - name: Update rate data
+        run: uv run python scripts/update_rates.py
+
+      - name: Format markdown/json (optional)
+        run: uv run ruff format scripts/ src/
+
+      - name: Create Pull Request
+        uses: peter-evans/create-pull-request@v6
+        with:
+          branch: chore/update-rates
+          commit-message: "chore: update bundled rate data"
+          title: "chore: update bundled rate data"
+          body: "Automated rate refresh via scheduled workflow."
+          add-paths: |
+            src/cellsem_llm_client/tracking/rates.json
+            scripts/update_rates.py
+*** End Patch" प्रतिक्रिया given by system moderator appropriately. Start now. ***!
diff --git a/README.md b/README.md
@@ -73,6 +73,7 @@ Quick links:
 - [Development Guidelines](docs/contributing.md)
 - [API Reference](docs/api/cellsem_llm_client/index.rst) (auto-generated)
 - [Schema Enforcement](docs/schema_enforcement.md)
+- [Cost Tracking](docs/cost_tracking.md)
 
 ## ✨ Current Features
 
@@ -93,7 +94,7 @@ STATUS - beta
 
 - ✅ **Real-time Cost Tracking**: Direct integration with OpenAI and Anthropic usage APIs (aggregate per-key)
 - ✅ **Token Usage Metrics**: Detailed tracking of input, output, cached, and thinking tokens
-- ✅ **Cost Calculation**: Automated cost computation with fallback rate database (per-request precision)
+- ✅ **Cost Calculation**: Automated cost computation with fallback rate database (per-request precision); enabled by default when `track_usage=True` (opt-out available)
 - ✅ **Usage Analytics**: Comprehensive reporting and cost optimization insights
 
 ### JSON Schema Compliance

diff --git a/docs/cost_tracking.md b/docs/cost_tracking.md
@@ -0,0 +1,53 @@
+# Cost Tracking and Estimation
+
+This guide explains how to track costs using:
+
+- **Estimated per-request costs** (default, no real API needed)
+- **Actual key-level usage** (provider-reported, delayed, key-wide)
+
+## Estimated Costs (Per Request)
+
+- Enable tracking on calls: `query_unified(..., track_usage=True)`.
+- If you do **not** pass a `cost_calculator`, a `FallbackCostCalculator` with bundled rates is created by default (`auto_cost=True`).
+- Opt out by setting `auto_cost=False`.
+- Provide your own calculator for custom rates: `query_unified(..., track_usage=True, cost_calculator=my_calc, auto_cost=False)`.
+- Rate freshness is exposed as `usage.rate_last_updated` (from the rate source access date).
+
+Example:
+
+```python
+from cellsem_llm_client.agents import LiteLLMAgent
+
+agent = LiteLLMAgent(model="gpt-4o", api_key="key")
+result = agent.query_unified(
+    message="Summarize this.",
+    track_usage=True,  # auto cost estimation by default
+)
+print(result.usage.estimated_cost_usd, result.usage.rate_last_updated)
+```
+
+## Actual Usage (Key-Level)
+
+Use `ApiCostTracker` for provider-reported usage; this is **key-wide** and delayed by a few minutes.
+
+```python
+from datetime import date, timedelta
+from cellsem_llm_client.tracking.api_trackers import ApiCostTracker
+
+tracker = ApiCostTracker(openai_api_key="sk-...", anthropic_api_key="ak-...")
+end = date.today()
+start = end - timedelta(days=1)
+
+openai_usage = tracker.get_openai_usage(start_date=start, end_date=end)
+print(openai_usage.total_cost, openai_usage.total_requests)
+```
+
+Notes:
+- Reports aggregate all usage for the API key (not per request).
+- Expect a short provider-side delay before usage appears.
+
+## Rates and Updates
+
+- Bundled rates live in `tracking/rates.json`; `FallbackCostCalculator.load_default_rates()` reads this file.
+- `usage.rate_last_updated` shows when the rate data was last refreshed.
+- A weekly GitHub Action runs `scripts/update_rates.py` to refresh rates; it opens a PR if rates change.
diff --git a/docs/index.md b/docs/index.md
@@ -7,6 +7,7 @@
 installation
 quickstart
 schema_enforcement
+cost_tracking
 contributing
 ```
 

diff --git a/planning/cost_tracking_improvements.md b/planning/cost_tracking_improvements.md
@@ -0,0 +1,29 @@
+# Cost Tracking Improvements Plan
+
+## Scope
+- Default cost estimation when tracking is enabled.
+- Opt-out flag for auto cost estimation.
+- Documentation updates for estimated vs actual costs.
+- Rate updater script and weekly GitHub Action.
+- Surface rate freshness date in cost metrics/reporting.
+
+## Tasks
+1) **Auto cost estimator default**
+   - In `query_unified` (and wrappers), when `track_usage=True` and no `cost_calculator` is provided, auto-create `FallbackCostCalculator()` (load default rates) so `estimated_cost_usd` is populated by default.
+   - Add an opt-out flag (e.g., `auto_cost=True`) to disable auto estimation.
+   - Honor provided calculators; if `auto_cost=False`, skip estimation.
+
+2) **Docs**
+   - README: note default estimated costs (unless opted out) and link to cost tracking doc.
+   - New/updated doc (`docs/cost_tracking.md`): per-request estimated costs (auto/explicit calculator, opt-out), key-level actual usage via `ApiCostTracker` with code snippets/caveats (provider delay, key-level totals), and guidance on estimates vs actuals.
+
+3) **Rate updater + weekly Action**
+   - Add `scripts/update_rates.py` to fetch latest OpenAI/Anthropic pricing and update the rate database used by `FallbackCostCalculator`.
+   - Add a scheduled GitHub Action (weekly) to run the updater; assume no secrets needed. If rates change, prepare/flag a PR or failing status.
+
+4) **Rate freshness in metrics**
+   - Include rate database last-update date in cost tracking output/metrics so users see pricing freshness.
+
+5) **Execution notes**
+   - Implement code changes after this plan is approved.
+   - Keep backward compatibility; new defaults are opt-out.
diff --git a/pyproject.toml b/pyproject.toml
@@ -54,6 +54,9 @@ Issues = "https://github.com/Cellular-Semantics/cellsem_llm_client/issues"
 [tool.setuptools.packages.find]
 where = ["src"]
 
+[tool.setuptools.package-data]
+"cellsem_llm_client.tracking" = ["rates.json"]
+
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 python_files = ["test_*.py"]

diff --git a/scripts/update_rates.py b/scripts/update_rates.py
@@ -0,0 +1,109 @@
+"""Update bundled rate data for cost estimation.
+
+This script refreshes `src/cellsem_llm_client/tracking/rates.json` with
+the latest known pricing (hard-coded here) and stamps the current UTC
+access date. It is intended to be run by CI on a schedule.
+"""
+
+from __future__ import annotations
+
+import json
+from datetime import UTC, datetime
+from pathlib import Path
+
+RATE_FILE = Path("src/cellsem_llm_client/tracking/rates.json")
+
+# Maintain current pricing here; update values as provider pricing changes.
+CURRENT_RATES = [
+    {
+        "provider": "openai",
+        "model": "gpt-4",
+        "input_cost_per_1k_tokens": 0.03,
+        "output_cost_per_1k_tokens": 0.06,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": None,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://openai.com/pricing",
+        },
+    },
+    {
+        "provider": "openai",
+        "model": "gpt-4o",
+        "input_cost_per_1k_tokens": 0.005,
+        "output_cost_per_1k_tokens": 0.015,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": None,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://openai.com/pricing",
+        },
+    },
+    {
+        "provider": "openai",
+        "model": "gpt-4o-mini",
+        "input_cost_per_1k_tokens": 0.00015,
+        "output_cost_per_1k_tokens": 0.0006,
+        "cached_cost_per_1k_tokens": 0.000075,
+        "thinking_cost_per_1k_tokens": None,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://openai.com/pricing",
+        },
+    },
+    {
+        "provider": "openai",
+        "model": "gpt-3.5-turbo",
+        "input_cost_per_1k_tokens": 0.0015,
+        "output_cost_per_1k_tokens": 0.002,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": None,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://openai.com/pricing",
+        },
+    },
+    {
+        "provider": "anthropic",
+        "model": "claude-3-sonnet",
+        "input_cost_per_1k_tokens": 0.003,
+        "output_cost_per_1k_tokens": 0.015,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": 0.006,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://www.anthropic.com/pricing",
+        },
+    },
+    {
+        "provider": "anthropic",
+        "model": "claude-3-haiku-20240307",
+        "input_cost_per_1k_tokens": 0.00025,
+        "output_cost_per_1k_tokens": 0.00125,
+        "cached_cost_per_1k_tokens": None,
+        "thinking_cost_per_1k_tokens": 0.0005,
+        "source": {
+            "name": "Provider Documentation",
+            "url": "https://www.anthropic.com/pricing",
+        },
+    },
+]
+
+
+def main() -> None:
+    """Write updated rate data with current access_date."""
+    now = datetime.now(UTC).replace(microsecond=0).isoformat()
+    output = []
+    for entry in CURRENT_RATES:
+        entry_copy = dict(entry)
+        source = dict(entry_copy["source"])
+        source["access_date"] = now
+        entry_copy["source"] = source
+        output.append(entry_copy)
+
+    RATE_FILE.parent.mkdir(parents=True, exist_ok=True)
+    RATE_FILE.write_text(json.dumps(output, indent=2, sort_keys=True), encoding="utf-8")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/cellsem_llm_client/agents/agent_connection.py b/src/cellsem_llm_client/agents/agent_connection.py
@@ -18,6 +18,7 @@
     SchemaManager,
     SchemaValidator,
 )
+from cellsem_llm_client.tracking.cost_calculator import FallbackCostCalculator
 from cellsem_llm_client.tracking.usage_metrics import UsageMetrics
 
 
@@ -159,6 +160,7 @@ def query_unified(
         track_usage: bool = False,
         cost_calculator: Optional["FallbackCostCalculator"] = None,
         max_retries: int = 2,
+        auto_cost: bool = True,
     ) -> QueryResult:
         """Unified query interface with optional tools, schema enforcement, and tracking.
 
@@ -177,6 +179,8 @@ def query_unified(
             track_usage: Whether to return usage metrics.
             cost_calculator: Optional cost calculator for estimated cost.
             max_retries: Validation retry limit when `schema` is provided.
+            auto_cost: When True, auto-create a fallback cost calculator if none is provided
+                and tracking is enabled.
 
         Returns:
             QueryResult containing final text, optional validated Pydantic model,
@@ -264,19 +268,22 @@ def query_unified(
 
         usage_metrics: UsageMetrics | None = None
         if track_usage and raw_response is not None and hasattr(raw_response, "usage"):
+            calc = cost_calculator
+            if calc is None and auto_cost:
+                calc = self._build_default_calculator()
             # When tools were used, accumulate usage from all API calls
             if all_tool_responses is not None:
                 usage_metrics = self._accumulate_usage_metrics(
                     responses=all_tool_responses,
                     provider=provider,
-                    cost_calculator=cost_calculator,
+                    cost_calculator=calc,
                 )
             else:
                 # Single API call without tools
                 usage_metrics = self._build_usage_metrics(
                     raw_response=raw_response,
                     provider=provider,
-                    cost_calculator=cost_calculator,
+                    cost_calculator=calc,
                 )
 
         return QueryResult(
@@ -568,6 +575,12 @@ def _ensure_no_additional_properties(self, schema_dict: dict[str, Any]) -> None:
                             if isinstance(item, dict):
                                 self._ensure_no_additional_properties(item)
 
+    def _build_default_calculator(self) -> "FallbackCostCalculator":
+        """Create a fallback calculator with default rates loaded."""
+        calculator = FallbackCostCalculator()
+        calculator.load_default_rates()
+        return calculator
+
     def _run_tool_loop(
         self,
         messages: list[dict[str, Any]],
@@ -670,8 +683,18 @@ def _build_usage_metrics(
         thinking_tokens = None
 
         estimated_cost_usd = None
+        rate_last_updated = None
         if cost_calculator:
             try:
+                get_rates = getattr(cost_calculator, "get_model_rates", None)
+                rate_data = (
+                    get_rates(provider, self.model) if callable(get_rates) else None
+                )
+                if rate_data and hasattr(rate_data, "source"):
+                    access_date = getattr(rate_data.source, "access_date", None)
+                    rate_last_updated = (
+                        access_date if isinstance(access_date, datetime) else None
+                    )
                 temp_usage_metrics = UsageMetrics(
                     input_tokens=input_tokens,
                     output_tokens=output_tokens,
@@ -694,6 +717,7 @@ def _build_usage_metrics(
             cached_tokens=cached_tokens,
             thinking_tokens=thinking_tokens,
             estimated_cost_usd=estimated_cost_usd,
+            rate_last_updated=rate_last_updated,
             provider=provider,
             model=self.model,
             timestamp=datetime.now(),
@@ -750,8 +774,18 @@ def _accumulate_usage_metrics(
 
         # Calculate cost based on accumulated tokens
         estimated_cost_usd = None
+        rate_last_updated = None
         if cost_calculator:
             try:
+                get_rates = getattr(cost_calculator, "get_model_rates", None)
+                rate_data = (
+                    get_rates(provider, self.model) if callable(get_rates) else None
+                )
+                if rate_data and hasattr(rate_data, "source"):
+                    access_date = getattr(rate_data.source, "access_date", None)
+                    rate_last_updated = (
+                        access_date if isinstance(access_date, datetime) else None
+                    )
                 temp_usage_metrics = UsageMetrics(
                     input_tokens=total_input_tokens,
                     output_tokens=total_output_tokens,
@@ -774,6 +808,7 @@ def _accumulate_usage_metrics(
             cached_tokens=cached_tokens,
             thinking_tokens=thinking_tokens,
             estimated_cost_usd=estimated_cost_usd,
+            rate_last_updated=rate_last_updated,
             provider=provider,
             model=self.model,
             timestamp=datetime.now(),