feat(llm): add shared model and run registry by houtanb · Pull Request #26 · forecastingresearch/utils

houtanb · 2026-05-29T07:25:47Z

Move canonical LLM model metadata and benchmarkable model-run declarations into utils so downstream repos can select shared runs by stable model_run_key.

Add Models.dev and Artificial Analysis metadata snapshots and loaders. Resolve release dates from Models.dev with manual fallbacks, separate canonical model_key values from provider_model_id routing strings, and validate model declarations during registry construction.

Require every ModelRun to declare an explicit, filename-safe model_run_key. Keep build_model_run_key as a naming helper for option coverage, validate duplicate keys, and expose MODEL_RUNS_BY_KEY/select_model_runs for benchmark selection.

Add Model.active and ACTIVE_MODEL_RUNS so historical runs remain in MODEL_RUNS while runs depending on inactive provider routes are excluded from current live-callable benchmark sweeps. Mark the Together deepseek-v3.1 route inactive and replace live smoke tests with the active MiniMax M2.7 route.

Add Artificial Analysis-backed model-run declarations as benchmark-selectable runs that are automatically included in MODEL_RUNS, with display names resolved from a minimized checked-in AA snapshot containing only stable IDs and display names.

Add third-party notices for Models.dev's MIT license and Artificial Analysis attribution, and include those notices in built wheel license metadata.

Move shared LLM provider dependencies into pyproject metadata, make requirements.txt delegate to .[dev], configure the package for Python 3.14, and preserve pytest-xdist for parallel integration tests.

Document registry conventions, local dev setup, validation commands, and Claude/agent handoff files. Add unit and integration coverage for metadata snapshots, registry validation, provider routing, explicit model-run keys, active model-run filtering, third-party notices, and selectable shared model runs.

As a byproduct of using Models.dev, the following model release dates have changed:

mistral-large-2411: 2024-11-18 -> 2024-11-01
deepseek-r1: 2025-01-20 -> 2024-12-26
deepseek-v3: 2024-12-25 -> 2025-01-20
glm-4.6: 2025-11-13 -> 2025-09-30
kimi-k2-thinking: 2025-11-05 -> 2025-11-06
kimi-k2.5: 2026-01-30 -> 2026-01-27
glm-5: 2026-02-12 -> 2026-02-11
glm-5.1: 2026-04-07 -> 2026-03-27
kimi-k2.6: 2026-04-20 -> 2026-04-21
claude-3-7-sonnet-20250219: 2025-02-24 -> 2025-02-19
claude-haiku-4-5-20251001: 2025-10-01 -> 2025-10-15
claude-opus-4-5-20251101: 2025-11-24 -> 2025-11-01
grok-4.3: 2026-05-01 -> 2026-04-17
gemini-2.5-flash: 2025-06-17 -> 2025-03-20
gemini-2.5-pro: 2025-06-17 -> 2025-03-20
gemini-3.1-flash-lite: 2026-05-08 -> 2026-05-07

Move canonical LLM model metadata and benchmarkable model-run declarations into utils so downstream repos can select shared runs by stable model_run_key. Add Models.dev and Artificial Analysis metadata snapshots and loaders. Resolve release dates from Models.dev with manual fallbacks, separate canonical model_key values from provider_model_id routing strings, and validate model declarations during registry construction. Require every ModelRun to declare an explicit, filename-safe model_run_key. Keep build_model_run_key as a naming helper for option coverage, validate duplicate keys, and expose MODEL_RUNS_BY_KEY/select_model_runs for benchmark selection. Add Model.active and ACTIVE_MODEL_RUNS so historical runs remain in MODEL_RUNS while runs depending on inactive provider routes are excluded from current live-callable benchmark sweeps. Mark the Together deepseek-v3.1 route inactive and replace live smoke tests with the active MiniMax M2.7 route. Add Artificial Analysis-backed model-run declarations as benchmark-selectable runs that are automatically included in MODEL_RUNS, with display names resolved from a minimized checked-in AA snapshot containing only stable IDs and display names. Add third-party notices for Models.dev's MIT license and Artificial Analysis attribution, and include those notices in built wheel license metadata. Move shared LLM provider dependencies into pyproject metadata, make requirements.txt delegate to .[dev], configure the package for Python 3.14, and preserve pytest-xdist for parallel integration tests. Document registry conventions, local dev setup, validation commands, and Claude/agent handoff files. Add unit and integration coverage for metadata snapshots, registry validation, provider routing, explicit model-run keys, active model-run filtering, third-party notices, and selectable shared model runs. As a byproduct of using Models.dev, the following model release dates have changed: mistral-large-2411: 2024-11-18 -> 2024-11-01 deepseek-r1: 2025-01-20 -> 2024-12-26 deepseek-v3: 2024-12-25 -> 2025-01-20 glm-4.6: 2025-11-13 -> 2025-09-30 kimi-k2-thinking: 2025-11-05 -> 2025-11-06 kimi-k2.5: 2026-01-30 -> 2026-01-27 glm-5: 2026-02-12 -> 2026-02-11 glm-5.1: 2026-04-07 -> 2026-03-27 kimi-k2.6: 2026-04-20 -> 2026-04-21 claude-3-7-sonnet-20250219: 2025-02-24 -> 2025-02-19 claude-haiku-4-5-20251001: 2025-10-01 -> 2025-10-15 claude-opus-4-5-20251101: 2025-11-24 -> 2025-11-01 grok-4.3: 2026-05-01 -> 2026-04-17 gemini-2.5-flash: 2025-06-17 -> 2025-03-20 gemini-2.5-pro: 2025-06-17 -> 2025-03-20 gemini-3.1-flash-lite: 2026-05-08 -> 2026-05-07

houtanb requested a review from elsehow May 29, 2026 07:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): add shared model and run registry#26

feat(llm): add shared model and run registry#26
houtanb wants to merge 1 commit into
mainfrom
llm-model-runs

houtanb commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

houtanb commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant