Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .github/workflows/update_rates.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Update Rates

on:
schedule:
- cron: "0 6 * * 1" # Weekly on Mondays at 06:00 UTC
workflow_dispatch:

permissions:
contents: write
pull-requests: write

jobs:
update-rates:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4
with:
version: "latest"

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install dependencies
run: uv sync --dev

- name: Update rate data
run: uv run python scripts/update_rates.py

- name: Format markdown/json (optional)
run: uv run ruff format scripts/ src/

- name: Create Pull Request
uses: peter-evans/create-pull-request@v6
with:
branch: chore/update-rates
commit-message: "chore: update bundled rate data"
title: "chore: update bundled rate data"
body: "Automated rate refresh via scheduled workflow."
add-paths: |
src/cellsem_llm_client/tracking/rates.json
scripts/update_rates.py
*** End Patch" प्रतिक्रिया given by system moderator appropriately. Start now. ***!
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ Quick links:
- [Development Guidelines](docs/contributing.md)
- [API Reference](docs/api/cellsem_llm_client/index.rst) (auto-generated)
- [Schema Enforcement](docs/schema_enforcement.md)
- [Cost Tracking](docs/cost_tracking.md)

## ✨ Current Features

Expand All @@ -93,7 +94,7 @@ STATUS - beta

-**Real-time Cost Tracking**: Direct integration with OpenAI and Anthropic usage APIs (aggregate per-key)
-**Token Usage Metrics**: Detailed tracking of input, output, cached, and thinking tokens
-**Cost Calculation**: Automated cost computation with fallback rate database (per-request precision)
-**Cost Calculation**: Automated cost computation with fallback rate database (per-request precision); enabled by default when `track_usage=True` (opt-out available)
-**Usage Analytics**: Comprehensive reporting and cost optimization insights

### JSON Schema Compliance
Expand Down
53 changes: 53 additions & 0 deletions docs/cost_tracking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Cost Tracking and Estimation

This guide explains how to track costs using:

- **Estimated per-request costs** (default, no real API needed)
- **Actual key-level usage** (provider-reported, delayed, key-wide)

## Estimated Costs (Per Request)

- Enable tracking on calls: `query_unified(..., track_usage=True)`.
- If you do **not** pass a `cost_calculator`, a `FallbackCostCalculator` with bundled rates is created by default (`auto_cost=True`).
- Opt out by setting `auto_cost=False`.
- Provide your own calculator for custom rates: `query_unified(..., track_usage=True, cost_calculator=my_calc, auto_cost=False)`.
- Rate freshness is exposed as `usage.rate_last_updated` (from the rate source access date).

Example:

```python
from cellsem_llm_client.agents import LiteLLMAgent

agent = LiteLLMAgent(model="gpt-4o", api_key="key")
result = agent.query_unified(
message="Summarize this.",
track_usage=True, # auto cost estimation by default
)
print(result.usage.estimated_cost_usd, result.usage.rate_last_updated)
```

## Actual Usage (Key-Level)

Use `ApiCostTracker` for provider-reported usage; this is **key-wide** and delayed by a few minutes.

```python
from datetime import date, timedelta
from cellsem_llm_client.tracking.api_trackers import ApiCostTracker

tracker = ApiCostTracker(openai_api_key="sk-...", anthropic_api_key="ak-...")
end = date.today()
start = end - timedelta(days=1)

openai_usage = tracker.get_openai_usage(start_date=start, end_date=end)
print(openai_usage.total_cost, openai_usage.total_requests)
```

Notes:
- Reports aggregate all usage for the API key (not per request).
- Expect a short provider-side delay before usage appears.

## Rates and Updates

- Bundled rates live in `tracking/rates.json`; `FallbackCostCalculator.load_default_rates()` reads this file.
- `usage.rate_last_updated` shows when the rate data was last refreshed.
- A weekly GitHub Action runs `scripts/update_rates.py` to refresh rates; it opens a PR if rates change.
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
installation
quickstart
schema_enforcement
cost_tracking
contributing
```

Expand Down
29 changes: 29 additions & 0 deletions planning/cost_tracking_improvements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Cost Tracking Improvements Plan

## Scope
- Default cost estimation when tracking is enabled.
- Opt-out flag for auto cost estimation.
- Documentation updates for estimated vs actual costs.
- Rate updater script and weekly GitHub Action.
- Surface rate freshness date in cost metrics/reporting.

## Tasks
1) **Auto cost estimator default**
- In `query_unified` (and wrappers), when `track_usage=True` and no `cost_calculator` is provided, auto-create `FallbackCostCalculator()` (load default rates) so `estimated_cost_usd` is populated by default.
- Add an opt-out flag (e.g., `auto_cost=True`) to disable auto estimation.
- Honor provided calculators; if `auto_cost=False`, skip estimation.

2) **Docs**
- README: note default estimated costs (unless opted out) and link to cost tracking doc.
- New/updated doc (`docs/cost_tracking.md`): per-request estimated costs (auto/explicit calculator, opt-out), key-level actual usage via `ApiCostTracker` with code snippets/caveats (provider delay, key-level totals), and guidance on estimates vs actuals.

3) **Rate updater + weekly Action**
- Add `scripts/update_rates.py` to fetch latest OpenAI/Anthropic pricing and update the rate database used by `FallbackCostCalculator`.
- Add a scheduled GitHub Action (weekly) to run the updater; assume no secrets needed. If rates change, prepare/flag a PR or failing status.

4) **Rate freshness in metrics**
- Include rate database last-update date in cost tracking output/metrics so users see pricing freshness.

5) **Execution notes**
- Implement code changes after this plan is approved.
- Keep backward compatibility; new defaults are opt-out.
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ Issues = "https://github.com/Cellular-Semantics/cellsem_llm_client/issues"
[tool.setuptools.packages.find]
where = ["src"]

[tool.setuptools.package-data]
"cellsem_llm_client.tracking" = ["rates.json"]

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
Expand Down
109 changes: 109 additions & 0 deletions scripts/update_rates.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
"""Update bundled rate data for cost estimation.

This script refreshes `src/cellsem_llm_client/tracking/rates.json` with
the latest known pricing (hard-coded here) and stamps the current UTC
access date. It is intended to be run by CI on a schedule.
"""

from __future__ import annotations

import json
from datetime import UTC, datetime
from pathlib import Path

RATE_FILE = Path("src/cellsem_llm_client/tracking/rates.json")

# Maintain current pricing here; update values as provider pricing changes.
CURRENT_RATES = [
{
"provider": "openai",
"model": "gpt-4",
"input_cost_per_1k_tokens": 0.03,
"output_cost_per_1k_tokens": 0.06,
"cached_cost_per_1k_tokens": None,
"thinking_cost_per_1k_tokens": None,
"source": {
"name": "Provider Documentation",
"url": "https://openai.com/pricing",
},
},
{
"provider": "openai",
"model": "gpt-4o",
"input_cost_per_1k_tokens": 0.005,
"output_cost_per_1k_tokens": 0.015,
"cached_cost_per_1k_tokens": None,
"thinking_cost_per_1k_tokens": None,
"source": {
"name": "Provider Documentation",
"url": "https://openai.com/pricing",
},
},
{
"provider": "openai",
"model": "gpt-4o-mini",
"input_cost_per_1k_tokens": 0.00015,
"output_cost_per_1k_tokens": 0.0006,
"cached_cost_per_1k_tokens": 0.000075,
"thinking_cost_per_1k_tokens": None,
"source": {
"name": "Provider Documentation",
"url": "https://openai.com/pricing",
},
},
{
"provider": "openai",
"model": "gpt-3.5-turbo",
"input_cost_per_1k_tokens": 0.0015,
"output_cost_per_1k_tokens": 0.002,
"cached_cost_per_1k_tokens": None,
"thinking_cost_per_1k_tokens": None,
"source": {
"name": "Provider Documentation",
"url": "https://openai.com/pricing",
},
},
{
"provider": "anthropic",
"model": "claude-3-sonnet",
"input_cost_per_1k_tokens": 0.003,
"output_cost_per_1k_tokens": 0.015,
"cached_cost_per_1k_tokens": None,
"thinking_cost_per_1k_tokens": 0.006,
"source": {
"name": "Provider Documentation",
"url": "https://www.anthropic.com/pricing",
},
},
{
"provider": "anthropic",
"model": "claude-3-haiku-20240307",
"input_cost_per_1k_tokens": 0.00025,
"output_cost_per_1k_tokens": 0.00125,
"cached_cost_per_1k_tokens": None,
"thinking_cost_per_1k_tokens": 0.0005,
"source": {
"name": "Provider Documentation",
"url": "https://www.anthropic.com/pricing",
},
},
]


def main() -> None:
"""Write updated rate data with current access_date."""
now = datetime.now(UTC).replace(microsecond=0).isoformat()
output = []
for entry in CURRENT_RATES:
entry_copy = dict(entry)
source = dict(entry_copy["source"])
source["access_date"] = now
entry_copy["source"] = source
output.append(entry_copy)

RATE_FILE.parent.mkdir(parents=True, exist_ok=True)
RATE_FILE.write_text(json.dumps(output, indent=2, sort_keys=True), encoding="utf-8")


if __name__ == "__main__":
main()
39 changes: 37 additions & 2 deletions src/cellsem_llm_client/agents/agent_connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
SchemaManager,
SchemaValidator,
)
from cellsem_llm_client.tracking.cost_calculator import FallbackCostCalculator
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FallbackCostCalculator is imported both here and in the TYPE_CHECKING block (line 36). Since it's used at runtime (line 580), the import on line 21 is correct, but the TYPE_CHECKING import on line 36 is now redundant and can be removed.

Copilot uses AI. Check for mistakes.
from cellsem_llm_client.tracking.usage_metrics import UsageMetrics


Expand Down Expand Up @@ -159,6 +160,7 @@ def query_unified(
track_usage: bool = False,
cost_calculator: Optional["FallbackCostCalculator"] = None,
max_retries: int = 2,
auto_cost: bool = True,
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new auto_cost parameter lacks test coverage. Consider adding tests that verify: (1) when auto_cost=True and cost_calculator=None, a default calculator is created and cost estimation is performed; (2) when auto_cost=False, no automatic calculator is created; (3) when both cost_calculator is provided and auto_cost=True, the provided calculator is used.

Copilot uses AI. Check for mistakes.
) -> QueryResult:
"""Unified query interface with optional tools, schema enforcement, and tracking.

Expand All @@ -177,6 +179,8 @@ def query_unified(
track_usage: Whether to return usage metrics.
cost_calculator: Optional cost calculator for estimated cost.
max_retries: Validation retry limit when `schema` is provided.
auto_cost: When True, auto-create a fallback cost calculator if none is provided
and tracking is enabled.

Returns:
QueryResult containing final text, optional validated Pydantic model,
Expand Down Expand Up @@ -264,19 +268,22 @@ def query_unified(

usage_metrics: UsageMetrics | None = None
if track_usage and raw_response is not None and hasattr(raw_response, "usage"):
calc = cost_calculator
if calc is None and auto_cost:
calc = self._build_default_calculator()
# When tools were used, accumulate usage from all API calls
if all_tool_responses is not None:
usage_metrics = self._accumulate_usage_metrics(
responses=all_tool_responses,
provider=provider,
cost_calculator=cost_calculator,
cost_calculator=calc,
)
else:
# Single API call without tools
usage_metrics = self._build_usage_metrics(
raw_response=raw_response,
provider=provider,
cost_calculator=cost_calculator,
cost_calculator=calc,
)

return QueryResult(
Expand Down Expand Up @@ -568,6 +575,12 @@ def _ensure_no_additional_properties(self, schema_dict: dict[str, Any]) -> None:
if isinstance(item, dict):
self._ensure_no_additional_properties(item)

def _build_default_calculator(self) -> "FallbackCostCalculator":
"""Create a fallback calculator with default rates loaded."""
calculator = FallbackCostCalculator()
calculator.load_default_rates()
return calculator

def _run_tool_loop(
self,
messages: list[dict[str, Any]],
Expand Down Expand Up @@ -670,8 +683,18 @@ def _build_usage_metrics(
thinking_tokens = None

estimated_cost_usd = None
rate_last_updated = None
if cost_calculator:
try:
get_rates = getattr(cost_calculator, "get_model_rates", None)
rate_data = (
get_rates(provider, self.model) if callable(get_rates) else None
)
if rate_data and hasattr(rate_data, "source"):
access_date = getattr(rate_data.source, "access_date", None)
rate_last_updated = (
access_date if isinstance(access_date, datetime) else None
)
Comment on lines +682 to +693
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new rate_last_updated field extraction logic (lines 686-697) lacks test coverage. Consider adding a test that verifies the rate_last_updated field is correctly populated in the usage metrics when a cost calculator with rate data is provided.

Copilot uses AI. Check for mistakes.
Comment on lines +685 to +693
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for extracting rate_last_updated from the cost calculator (lines 689-697) is duplicated in _accumulate_usage_metrics (lines 780-788). Consider extracting this into a helper method to reduce duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
temp_usage_metrics = UsageMetrics(
input_tokens=input_tokens,
output_tokens=output_tokens,
Expand All @@ -694,6 +717,7 @@ def _build_usage_metrics(
cached_tokens=cached_tokens,
thinking_tokens=thinking_tokens,
estimated_cost_usd=estimated_cost_usd,
rate_last_updated=rate_last_updated,
provider=provider,
model=self.model,
timestamp=datetime.now(),
Expand Down Expand Up @@ -750,8 +774,18 @@ def _accumulate_usage_metrics(

# Calculate cost based on accumulated tokens
estimated_cost_usd = None
rate_last_updated = None
if cost_calculator:
try:
get_rates = getattr(cost_calculator, "get_model_rates", None)
rate_data = (
get_rates(provider, self.model) if callable(get_rates) else None
)
if rate_data and hasattr(rate_data, "source"):
access_date = getattr(rate_data.source, "access_date", None)
rate_last_updated = (
access_date if isinstance(access_date, datetime) else None
)
temp_usage_metrics = UsageMetrics(
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
Expand All @@ -774,6 +808,7 @@ def _accumulate_usage_metrics(
cached_tokens=cached_tokens,
thinking_tokens=thinking_tokens,
estimated_cost_usd=estimated_cost_usd,
rate_last_updated=rate_last_updated,
provider=provider,
model=self.model,
timestamp=datetime.now(),
Expand Down
Loading