feat: add 2026 provider models + fix model-name cost-matching bugs by blisspixel · Pull Request #1 · blisspixel/deepr

blisspixel · 2026-05-23T00:55:00Z

Summary

Adds the latest 2026 provider models and fixes a recurring class of silent model-name cost-matching bugs found while wiring them in.

New models

Added to the registry (plus provider mappings, benchmark tiers, docs), each validated against the live provider API and benchmarked via deepr eval new:

Gemini 3.5 Flash, Gemini 3.1 Flash-Lite (GA)
Claude Opus 4.7, Claude Sonnet 4.6
GPT-5.4-mini, GPT-5.4-nano

Cost/pricing bug fixes (with regression tests)

registry.get_cost_estimate: was first-substring-match (order-dependent). A gpt-5.4-pro-<snapshot> resolved to the cheaper gpt-5.4, an under-estimate that lets budget pre-flight approve an expensive job. Now longest-match-first plus dot/hyphen normalized (mirrors get_token_pricing).
Gemini _calculate_cost: gemini-2.5-flash-lite was billed at the gemini-2.5-flash rate (~5x overcharge). Fixed with longest-match-first.
Grok provider: registry uses hyphenated grok-4-20-* but mappings/pricing only had the dotted API form, so a routed name went unmapped (wrong API id plus ~11x undercharge). Added both forms.
api/app.py: replaced a "mini" in model cost heuristic (mis-estimated nano/flash-lite over, deep-research under) with get_cost_estimate().
Benchmark: report the actual run cost instead of the merged-history total; fixed dotted grok-4.3 tier-list keys that dropped results from routing.

Model discovery tooling

deepr providers models: diffs live provider model lists against the registry, scoped by default to newer versions of families already in use, with paste-ready registry stubs.
discover_models.py: loads .env, fixes a Windows cp1252 unicode crash, and uses canonical (dot/hyphen plus date-snapshot) matching to eliminate false positives.
deepr eval preflight warns when relevant new models are missing.

Testing

4783 passed, 1 skipped, coverage 81.96% (Python 3.13, the CI command)
ruff check deepr/ and ruff format --check deepr/ clean
New regression tests: TestCostEstimateMatching, TestGrokHyphenatedRegistryForms

New models (registry + provider mappings + benchmark tiers + docs), all validated against live provider APIs and benchmarked via `eval new`: - Gemini 3.5 Flash, Gemini 3.1 Flash-Lite (GA) - Claude Opus 4.7, Claude Sonnet 4.6 - GPT-5.4-mini, GPT-5.4-nano Cost/pricing bug fixes (silent-money class, with regression tests): - registry.get_cost_estimate: longest-match-first + dot/hyphen normalize. Was first-substring-match: under-estimated gpt-5.4-pro snapshots to the cheaper gpt-5.4 price, letting budget pre-flight approve expensive jobs. - gemini provider _calculate_cost: longest-match-first (Flash-Lite was billed at the Flash rate, ~5x overcharge). - grok provider: register hyphenated grok-4-20 forms in mappings + pricing. A routed registry name (grok-4-20-reasoning) went unmapped -> wrong API id + ~11x cost undercharge. - api/app.py: estimate job cost from the registry instead of a "mini in name" heuristic that mis-estimated nano/flash-lite/deep-research. - benchmark: report this-run cost instead of the merged-history total; fix dotted grok-4.3 tier-list keys that dropped results from routing. Model discovery tooling: - `deepr providers models`: diff live provider model lists against the registry, scoped to newer versions of families already in use, with paste-ready ModelCapability stubs. - discover_models.py: load .env, fix Windows cp1252 unicode crash, and canonical (dot/hyphen + date-snapshot) matching to kill false positives. - `deepr eval` preflight warns when relevant new models are missing. Tests: 4783 passed, coverage 81.96% (py3.13).

Copilot

Pull request overview

This PR expands Deepr’s model registry and tooling for 2026-era provider models, and hardens cost estimation/matching logic to prevent silent mispricing (especially around snapshot/variant model names and dot-vs-hyphen differences). It also adds model-discovery UX (CLI + script improvements) and updates benchmarking/reporting to better reflect actual run cost.

Changes:

Add new 2026 models across OpenAI/Gemini/Anthropic and update benchmark tier lists + docs.
Fix cost-estimate/pricing matching to prefer most-specific (longest) matches and normalize dot/hyphen variants; add regression tests.
Add/extend model discovery tooling (deepr providers models, discover_models.py) and add a benchmark preflight warning for newer provider models.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/unit/test_providers/test_registry.py`	Adds regression coverage for `get_cost_estimate()` specificity + normalization + tiered pricing.
`tests/unit/test_providers/test_grok_provider.py`	Adds regression tests ensuring hyphenated Grok registry names map/price correctly.
`scripts/discover_models.py`	Adds `.env` loading, canonical name matching (date/dot/hyphen), relevance filtering, JSON shape updates, and stub emission.
`scripts/benchmark_models.py`	Updates tier model lists, adds best-effort “newer models available” preflight warning, and corrects reported cost to “this run” only.
`ROADMAP.md`	Updates roadmap notes/checklists to reflect model discovery and May 2026 status.
`docs/MODELS.md`	Updates model guide with new models and discovery command guidance.
`deepr/providers/registry.py`	Adds new model capability entries and fixes `get_cost_estimate()` matching/normalization logic.
`deepr/providers/grok_provider.py`	Adds hyphenated registry forms to mappings/pricing to avoid unmapped routing + mispricing.
`deepr/providers/gemini_provider.py`	Adds new Gemini models and fixes pricing-key matching to prefer longest key first.
`deepr/cli/commands/providers.py`	Adds `deepr providers models` CLI command that shells out to discovery script.
`deepr/api/app.py`	Replaces a name-based cost heuristic with registry-based estimation for API job submission responses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    for canon_key, dm in discovered_by_canon.items():
+        rm = registry_by_canon.get(canon_key)
+        if rm is not None:
+            key = f"{rm.provider}/{rm.model}"


+        reg = dm.load_registry()
+        discovered = dm.discover_via_api()
+        if not discovered:
+            return
+        report = dm.compare_registry(reg, discovered)
+        relevant, _ = dm.classify_new_models(report["new_models"], reg)


+    # Calculate cost estimate from the registry (source of truth). A prior
+    # name heuristic ("mini" -> $0.5 else $5.0) wildly misestimated nano /
+    # flash-lite (over) and deep-research (under) models.
+    from deepr.providers.registry import get_cost_estimate
+
+    avg_cost = get_cost_estimate(model)
    estimated_cost = {


+    return (
+        f'    "{m.provider}/{m.model_id}": ModelCapability(\n'
+        f'        provider="{m.provider}",\n'
+        f'        model="{m.model_id}",\n'
+        f"        cost_per_query=0.0,  # TODO: estimate per-query cost\n"
+        f"        latency_ms=2000,  # TODO: measure\n"
+        f"        context_window={cw if cw else 'TODO'},\n"


Copilot AI review requested due to automatic review settings May 23, 2026 00:55

Copilot started reviewing on behalf of blisspixel May 23, 2026 00:55 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

blisspixel merged commit defb8a4 into main May 23, 2026
4 checks passed

blisspixel deleted the add-2026-models-cost-fixes branch May 23, 2026 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add 2026 provider models + fix model-name cost-matching bugs#1

feat: add 2026 provider models + fix model-name cost-matching bugs#1
blisspixel merged 1 commit into
mainfrom
add-2026-models-cost-fixes

blisspixel commented May 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blisspixel commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New models

Cost/pricing bug fixes (with regression tests)

Model discovery tooling

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

blisspixel commented May 23, 2026 •

edited

Loading