Skip to content

feat(llm): add shared model and run registry#26

Open
houtanb wants to merge 1 commit into
mainfrom
llm-model-runs
Open

feat(llm): add shared model and run registry#26
houtanb wants to merge 1 commit into
mainfrom
llm-model-runs

Conversation

@houtanb
Copy link
Copy Markdown
Member

@houtanb houtanb commented May 29, 2026

Move canonical LLM model metadata and benchmarkable model-run declarations into utils so downstream repos can select shared runs by stable model_run_key.

Add Models.dev and Artificial Analysis metadata snapshots and loaders. Resolve release dates from Models.dev with manual fallbacks, separate canonical model_key values from provider_model_id routing strings, and validate model declarations during registry construction.

Require every ModelRun to declare an explicit, filename-safe model_run_key. Keep build_model_run_key as a naming helper for option coverage, validate duplicate keys, and expose MODEL_RUNS_BY_KEY/select_model_runs for benchmark selection.

Add Model.active and ACTIVE_MODEL_RUNS so historical runs remain in MODEL_RUNS while runs depending on inactive provider routes are excluded from current live-callable benchmark sweeps. Mark the Together deepseek-v3.1 route inactive and replace live smoke tests with the active MiniMax M2.7 route.

Add Artificial Analysis-backed model-run declarations as benchmark-selectable runs that are automatically included in MODEL_RUNS, with display names resolved from a minimized checked-in AA snapshot containing only stable IDs and display names.

Add third-party notices for Models.dev's MIT license and Artificial Analysis attribution, and include those notices in built wheel license metadata.

Move shared LLM provider dependencies into pyproject metadata, make requirements.txt delegate to .[dev], configure the package for Python 3.14, and preserve pytest-xdist for parallel integration tests.

Document registry conventions, local dev setup, validation commands, and Claude/agent handoff files. Add unit and integration coverage for metadata snapshots, registry validation, provider routing, explicit model-run keys, active model-run filtering, third-party notices, and selectable shared model runs.

As a byproduct of using Models.dev, the following model release dates have changed:

mistral-large-2411: 2024-11-18 -> 2024-11-01
deepseek-r1: 2025-01-20 -> 2024-12-26
deepseek-v3: 2024-12-25 -> 2025-01-20
glm-4.6: 2025-11-13 -> 2025-09-30
kimi-k2-thinking: 2025-11-05 -> 2025-11-06
kimi-k2.5: 2026-01-30 -> 2026-01-27
glm-5: 2026-02-12 -> 2026-02-11
glm-5.1: 2026-04-07 -> 2026-03-27
kimi-k2.6: 2026-04-20 -> 2026-04-21
claude-3-7-sonnet-20250219: 2025-02-24 -> 2025-02-19
claude-haiku-4-5-20251001: 2025-10-01 -> 2025-10-15
claude-opus-4-5-20251101: 2025-11-24 -> 2025-11-01
grok-4.3: 2026-05-01 -> 2026-04-17
gemini-2.5-flash: 2025-06-17 -> 2025-03-20
gemini-2.5-pro: 2025-06-17 -> 2025-03-20
gemini-3.1-flash-lite: 2026-05-08 -> 2026-05-07

Move canonical LLM model metadata and benchmarkable model-run declarations into utils so downstream repos can select shared runs by stable model_run_key.

Add Models.dev and Artificial Analysis metadata snapshots and loaders. Resolve release dates from Models.dev with manual fallbacks, separate canonical model_key values from provider_model_id routing strings, and validate model declarations during registry construction.

Require every ModelRun to declare an explicit, filename-safe model_run_key. Keep build_model_run_key as a naming helper for option coverage, validate duplicate keys, and expose MODEL_RUNS_BY_KEY/select_model_runs for benchmark selection.

Add Model.active and ACTIVE_MODEL_RUNS so historical runs remain in MODEL_RUNS while runs depending on inactive provider routes are excluded from current live-callable benchmark sweeps. Mark the Together deepseek-v3.1 route inactive and replace live smoke tests with the active MiniMax M2.7 route.

Add Artificial Analysis-backed model-run declarations as benchmark-selectable runs that are automatically included in MODEL_RUNS, with display names resolved from a minimized checked-in AA snapshot containing only stable IDs and display names.

Add third-party notices for Models.dev's MIT license and Artificial Analysis attribution, and include those notices in built wheel license metadata.

Move shared LLM provider dependencies into pyproject metadata, make requirements.txt delegate to .[dev], configure the package for Python 3.14, and preserve pytest-xdist for parallel integration tests.

Document registry conventions, local dev setup, validation commands, and Claude/agent handoff files. Add unit and integration coverage for metadata snapshots, registry validation, provider routing, explicit model-run keys, active model-run filtering, third-party notices, and selectable shared model runs.

As a byproduct of using Models.dev, the following model release dates have changed:

  mistral-large-2411: 2024-11-18 -> 2024-11-01
  deepseek-r1: 2025-01-20 -> 2024-12-26
  deepseek-v3: 2024-12-25 -> 2025-01-20
  glm-4.6: 2025-11-13 -> 2025-09-30
  kimi-k2-thinking: 2025-11-05 -> 2025-11-06
  kimi-k2.5: 2026-01-30 -> 2026-01-27
  glm-5: 2026-02-12 -> 2026-02-11
  glm-5.1: 2026-04-07 -> 2026-03-27
  kimi-k2.6: 2026-04-20 -> 2026-04-21
  claude-3-7-sonnet-20250219: 2025-02-24 -> 2025-02-19
  claude-haiku-4-5-20251001: 2025-10-01 -> 2025-10-15
  claude-opus-4-5-20251101: 2025-11-24 -> 2025-11-01
  grok-4.3: 2026-05-01 -> 2026-04-17
  gemini-2.5-flash: 2025-06-17 -> 2025-03-20
  gemini-2.5-pro: 2025-06-17 -> 2025-03-20
  gemini-3.1-flash-lite: 2026-05-08 -> 2026-05-07
@houtanb houtanb requested a review from elsehow May 29, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant