-
Notifications
You must be signed in to change notification settings - Fork 0
feat: default cost estimates and rate updater #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
5261f62
e605f87
76ae9d0
9000a82
119653c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| name: Update Rates | ||
|
|
||
| on: | ||
| schedule: | ||
| - cron: "0 6 * * 1" # Weekly on Mondays at 06:00 UTC | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: write | ||
| pull-requests: write | ||
|
|
||
| jobs: | ||
| update-rates: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v4 | ||
| with: | ||
| version: "latest" | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
|
|
||
| - name: Install dependencies | ||
| run: uv sync --dev | ||
|
|
||
| - name: Update rate data | ||
| run: uv run python scripts/update_rates.py | ||
|
|
||
| - name: Format markdown/json (optional) | ||
| run: uv run ruff format scripts/ src/ | ||
|
|
||
| - name: Create Pull Request | ||
| uses: peter-evans/create-pull-request@v6 | ||
| with: | ||
| branch: chore/update-rates | ||
| commit-message: "chore: update bundled rate data" | ||
| title: "chore: update bundled rate data" | ||
| body: "Automated rate refresh via scheduled workflow." | ||
| add-paths: | | ||
| src/cellsem_llm_client/tracking/rates.json | ||
| scripts/update_rates.py | ||
| *** End Patch" प्रतिक्रिया given by system moderator appropriately. Start now. ***! |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Cost Tracking and Estimation | ||
|
|
||
| This guide explains how to track costs using: | ||
|
|
||
| - **Estimated per-request costs** (default, no real API needed) | ||
| - **Actual key-level usage** (provider-reported, delayed, key-wide) | ||
|
|
||
| ## Estimated Costs (Per Request) | ||
|
|
||
| - Enable tracking on calls: `query_unified(..., track_usage=True)`. | ||
| - If you do **not** pass a `cost_calculator`, a `FallbackCostCalculator` with bundled rates is created by default (`auto_cost=True`). | ||
| - Opt out by setting `auto_cost=False`. | ||
| - Provide your own calculator for custom rates: `query_unified(..., track_usage=True, cost_calculator=my_calc, auto_cost=False)`. | ||
| - Rate freshness is exposed as `usage.rate_last_updated` (from the rate source access date). | ||
|
|
||
| Example: | ||
|
|
||
| ```python | ||
| from cellsem_llm_client.agents import LiteLLMAgent | ||
|
|
||
| agent = LiteLLMAgent(model="gpt-4o", api_key="key") | ||
| result = agent.query_unified( | ||
| message="Summarize this.", | ||
| track_usage=True, # auto cost estimation by default | ||
| ) | ||
| print(result.usage.estimated_cost_usd, result.usage.rate_last_updated) | ||
| ``` | ||
|
|
||
| ## Actual Usage (Key-Level) | ||
|
|
||
| Use `ApiCostTracker` for provider-reported usage; this is **key-wide** and delayed by a few minutes. | ||
|
|
||
| ```python | ||
| from datetime import date, timedelta | ||
| from cellsem_llm_client.tracking.api_trackers import ApiCostTracker | ||
|
|
||
| tracker = ApiCostTracker(openai_api_key="sk-...", anthropic_api_key="ak-...") | ||
| end = date.today() | ||
| start = end - timedelta(days=1) | ||
|
|
||
| openai_usage = tracker.get_openai_usage(start_date=start, end_date=end) | ||
| print(openai_usage.total_cost, openai_usage.total_requests) | ||
| ``` | ||
|
|
||
| Notes: | ||
| - Reports aggregate all usage for the API key (not per request). | ||
| - Expect a short provider-side delay before usage appears. | ||
|
|
||
| ## Rates and Updates | ||
|
|
||
| - Bundled rates live in `tracking/rates.json`; `FallbackCostCalculator.load_default_rates()` reads this file. | ||
| - `usage.rate_last_updated` shows when the rate data was last refreshed. | ||
| - A weekly GitHub Action runs `scripts/update_rates.py` to refresh rates; it opens a PR if rates change. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,7 @@ | |
| installation | ||
| quickstart | ||
| schema_enforcement | ||
| cost_tracking | ||
| contributing | ||
| ``` | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # Cost Tracking Improvements Plan | ||
|
|
||
| ## Scope | ||
| - Default cost estimation when tracking is enabled. | ||
| - Opt-out flag for auto cost estimation. | ||
| - Documentation updates for estimated vs actual costs. | ||
| - Rate updater script and weekly GitHub Action. | ||
| - Surface rate freshness date in cost metrics/reporting. | ||
|
|
||
| ## Tasks | ||
| 1) **Auto cost estimator default** | ||
| - In `query_unified` (and wrappers), when `track_usage=True` and no `cost_calculator` is provided, auto-create `FallbackCostCalculator()` (load default rates) so `estimated_cost_usd` is populated by default. | ||
| - Add an opt-out flag (e.g., `auto_cost=True`) to disable auto estimation. | ||
| - Honor provided calculators; if `auto_cost=False`, skip estimation. | ||
|
|
||
| 2) **Docs** | ||
| - README: note default estimated costs (unless opted out) and link to cost tracking doc. | ||
| - New/updated doc (`docs/cost_tracking.md`): per-request estimated costs (auto/explicit calculator, opt-out), key-level actual usage via `ApiCostTracker` with code snippets/caveats (provider delay, key-level totals), and guidance on estimates vs actuals. | ||
|
|
||
| 3) **Rate updater + weekly Action** | ||
| - Add `scripts/update_rates.py` to fetch latest OpenAI/Anthropic pricing and update the rate database used by `FallbackCostCalculator`. | ||
| - Add a scheduled GitHub Action (weekly) to run the updater; assume no secrets needed. If rates change, prepare/flag a PR or failing status. | ||
|
|
||
| 4) **Rate freshness in metrics** | ||
| - Include rate database last-update date in cost tracking output/metrics so users see pricing freshness. | ||
|
|
||
| 5) **Execution notes** | ||
| - Implement code changes after this plan is approved. | ||
| - Keep backward compatibility; new defaults are opt-out. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| """Update bundled rate data for cost estimation. | ||
|
|
||
| This script refreshes `src/cellsem_llm_client/tracking/rates.json` with | ||
| the latest known pricing (hard-coded here) and stamps the current UTC | ||
| access date. It is intended to be run by CI on a schedule. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import json | ||
| from datetime import UTC, datetime | ||
| from pathlib import Path | ||
|
|
||
| RATE_FILE = Path("src/cellsem_llm_client/tracking/rates.json") | ||
|
|
||
| # Maintain current pricing here; update values as provider pricing changes. | ||
| CURRENT_RATES = [ | ||
| { | ||
| "provider": "openai", | ||
| "model": "gpt-4", | ||
| "input_cost_per_1k_tokens": 0.03, | ||
| "output_cost_per_1k_tokens": 0.06, | ||
| "cached_cost_per_1k_tokens": None, | ||
| "thinking_cost_per_1k_tokens": None, | ||
| "source": { | ||
| "name": "Provider Documentation", | ||
| "url": "https://openai.com/pricing", | ||
| }, | ||
| }, | ||
| { | ||
| "provider": "openai", | ||
| "model": "gpt-4o", | ||
| "input_cost_per_1k_tokens": 0.005, | ||
| "output_cost_per_1k_tokens": 0.015, | ||
| "cached_cost_per_1k_tokens": None, | ||
| "thinking_cost_per_1k_tokens": None, | ||
| "source": { | ||
| "name": "Provider Documentation", | ||
| "url": "https://openai.com/pricing", | ||
| }, | ||
| }, | ||
| { | ||
| "provider": "openai", | ||
| "model": "gpt-4o-mini", | ||
| "input_cost_per_1k_tokens": 0.00015, | ||
| "output_cost_per_1k_tokens": 0.0006, | ||
| "cached_cost_per_1k_tokens": 0.000075, | ||
| "thinking_cost_per_1k_tokens": None, | ||
| "source": { | ||
| "name": "Provider Documentation", | ||
| "url": "https://openai.com/pricing", | ||
| }, | ||
| }, | ||
| { | ||
| "provider": "openai", | ||
| "model": "gpt-3.5-turbo", | ||
| "input_cost_per_1k_tokens": 0.0015, | ||
| "output_cost_per_1k_tokens": 0.002, | ||
| "cached_cost_per_1k_tokens": None, | ||
| "thinking_cost_per_1k_tokens": None, | ||
| "source": { | ||
| "name": "Provider Documentation", | ||
| "url": "https://openai.com/pricing", | ||
| }, | ||
| }, | ||
| { | ||
| "provider": "anthropic", | ||
| "model": "claude-3-sonnet", | ||
| "input_cost_per_1k_tokens": 0.003, | ||
| "output_cost_per_1k_tokens": 0.015, | ||
| "cached_cost_per_1k_tokens": None, | ||
| "thinking_cost_per_1k_tokens": 0.006, | ||
| "source": { | ||
| "name": "Provider Documentation", | ||
| "url": "https://www.anthropic.com/pricing", | ||
| }, | ||
| }, | ||
| { | ||
| "provider": "anthropic", | ||
| "model": "claude-3-haiku-20240307", | ||
| "input_cost_per_1k_tokens": 0.00025, | ||
| "output_cost_per_1k_tokens": 0.00125, | ||
| "cached_cost_per_1k_tokens": None, | ||
| "thinking_cost_per_1k_tokens": 0.0005, | ||
| "source": { | ||
| "name": "Provider Documentation", | ||
| "url": "https://www.anthropic.com/pricing", | ||
| }, | ||
| }, | ||
| ] | ||
|
|
||
|
|
||
| def main() -> None: | ||
| """Write updated rate data with current access_date.""" | ||
| now = datetime.now(UTC).replace(microsecond=0).isoformat() | ||
| output = [] | ||
| for entry in CURRENT_RATES: | ||
| entry_copy = dict(entry) | ||
| source = dict(entry_copy["source"]) | ||
| source["access_date"] = now | ||
| entry_copy["source"] = source | ||
| output.append(entry_copy) | ||
|
|
||
| RATE_FILE.parent.mkdir(parents=True, exist_ok=True) | ||
| RATE_FILE.write_text(json.dumps(output, indent=2, sort_keys=True), encoding="utf-8") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ | |
| SchemaManager, | ||
| SchemaValidator, | ||
| ) | ||
| from cellsem_llm_client.tracking.cost_calculator import FallbackCostCalculator | ||
| from cellsem_llm_client.tracking.usage_metrics import UsageMetrics | ||
|
|
||
|
|
||
|
|
@@ -159,6 +160,7 @@ def query_unified( | |
| track_usage: bool = False, | ||
| cost_calculator: Optional["FallbackCostCalculator"] = None, | ||
| max_retries: int = 2, | ||
| auto_cost: bool = True, | ||
|
||
| ) -> QueryResult: | ||
| """Unified query interface with optional tools, schema enforcement, and tracking. | ||
|
|
||
|
|
@@ -177,6 +179,8 @@ def query_unified( | |
| track_usage: Whether to return usage metrics. | ||
| cost_calculator: Optional cost calculator for estimated cost. | ||
| max_retries: Validation retry limit when `schema` is provided. | ||
| auto_cost: When True, auto-create a fallback cost calculator if none is provided | ||
| and tracking is enabled. | ||
|
|
||
| Returns: | ||
| QueryResult containing final text, optional validated Pydantic model, | ||
|
|
@@ -264,19 +268,22 @@ def query_unified( | |
|
|
||
| usage_metrics: UsageMetrics | None = None | ||
| if track_usage and raw_response is not None and hasattr(raw_response, "usage"): | ||
| calc = cost_calculator | ||
| if calc is None and auto_cost: | ||
| calc = self._build_default_calculator() | ||
| # When tools were used, accumulate usage from all API calls | ||
| if all_tool_responses is not None: | ||
| usage_metrics = self._accumulate_usage_metrics( | ||
| responses=all_tool_responses, | ||
| provider=provider, | ||
| cost_calculator=cost_calculator, | ||
| cost_calculator=calc, | ||
| ) | ||
| else: | ||
| # Single API call without tools | ||
| usage_metrics = self._build_usage_metrics( | ||
| raw_response=raw_response, | ||
| provider=provider, | ||
| cost_calculator=cost_calculator, | ||
| cost_calculator=calc, | ||
| ) | ||
|
|
||
| return QueryResult( | ||
|
|
@@ -568,6 +575,12 @@ def _ensure_no_additional_properties(self, schema_dict: dict[str, Any]) -> None: | |
| if isinstance(item, dict): | ||
| self._ensure_no_additional_properties(item) | ||
|
|
||
| def _build_default_calculator(self) -> "FallbackCostCalculator": | ||
| """Create a fallback calculator with default rates loaded.""" | ||
| calculator = FallbackCostCalculator() | ||
| calculator.load_default_rates() | ||
| return calculator | ||
|
|
||
| def _run_tool_loop( | ||
| self, | ||
| messages: list[dict[str, Any]], | ||
|
|
@@ -670,8 +683,18 @@ def _build_usage_metrics( | |
| thinking_tokens = None | ||
|
|
||
| estimated_cost_usd = None | ||
| rate_last_updated = None | ||
| if cost_calculator: | ||
| try: | ||
| get_rates = getattr(cost_calculator, "get_model_rates", None) | ||
| rate_data = ( | ||
| get_rates(provider, self.model) if callable(get_rates) else None | ||
| ) | ||
| if rate_data and hasattr(rate_data, "source"): | ||
| access_date = getattr(rate_data.source, "access_date", None) | ||
| rate_last_updated = ( | ||
| access_date if isinstance(access_date, datetime) else None | ||
| ) | ||
|
Comment on lines
+682
to
+693
|
||
| temp_usage_metrics = UsageMetrics( | ||
| input_tokens=input_tokens, | ||
| output_tokens=output_tokens, | ||
|
|
@@ -694,6 +717,7 @@ def _build_usage_metrics( | |
| cached_tokens=cached_tokens, | ||
| thinking_tokens=thinking_tokens, | ||
| estimated_cost_usd=estimated_cost_usd, | ||
| rate_last_updated=rate_last_updated, | ||
| provider=provider, | ||
| model=self.model, | ||
| timestamp=datetime.now(), | ||
|
|
@@ -750,8 +774,18 @@ def _accumulate_usage_metrics( | |
|
|
||
| # Calculate cost based on accumulated tokens | ||
| estimated_cost_usd = None | ||
| rate_last_updated = None | ||
| if cost_calculator: | ||
| try: | ||
| get_rates = getattr(cost_calculator, "get_model_rates", None) | ||
| rate_data = ( | ||
| get_rates(provider, self.model) if callable(get_rates) else None | ||
| ) | ||
| if rate_data and hasattr(rate_data, "source"): | ||
| access_date = getattr(rate_data.source, "access_date", None) | ||
| rate_last_updated = ( | ||
| access_date if isinstance(access_date, datetime) else None | ||
| ) | ||
| temp_usage_metrics = UsageMetrics( | ||
| input_tokens=total_input_tokens, | ||
| output_tokens=total_output_tokens, | ||
|
|
@@ -774,6 +808,7 @@ def _accumulate_usage_metrics( | |
| cached_tokens=cached_tokens, | ||
| thinking_tokens=thinking_tokens, | ||
| estimated_cost_usd=estimated_cost_usd, | ||
| rate_last_updated=rate_last_updated, | ||
| provider=provider, | ||
| model=self.model, | ||
| timestamp=datetime.now(), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
FallbackCostCalculatoris imported both here and in the TYPE_CHECKING block (line 36). Since it's used at runtime (line 580), the import on line 21 is correct, but the TYPE_CHECKING import on line 36 is now redundant and can be removed.