feat(llm): add shared model and run registry#26
Open
houtanb wants to merge 1 commit into
Open
Conversation
Move canonical LLM model metadata and benchmarkable model-run declarations into utils so downstream repos can select shared runs by stable model_run_key. Add Models.dev and Artificial Analysis metadata snapshots and loaders. Resolve release dates from Models.dev with manual fallbacks, separate canonical model_key values from provider_model_id routing strings, and validate model declarations during registry construction. Require every ModelRun to declare an explicit, filename-safe model_run_key. Keep build_model_run_key as a naming helper for option coverage, validate duplicate keys, and expose MODEL_RUNS_BY_KEY/select_model_runs for benchmark selection. Add Model.active and ACTIVE_MODEL_RUNS so historical runs remain in MODEL_RUNS while runs depending on inactive provider routes are excluded from current live-callable benchmark sweeps. Mark the Together deepseek-v3.1 route inactive and replace live smoke tests with the active MiniMax M2.7 route. Add Artificial Analysis-backed model-run declarations as benchmark-selectable runs that are automatically included in MODEL_RUNS, with display names resolved from a minimized checked-in AA snapshot containing only stable IDs and display names. Add third-party notices for Models.dev's MIT license and Artificial Analysis attribution, and include those notices in built wheel license metadata. Move shared LLM provider dependencies into pyproject metadata, make requirements.txt delegate to .[dev], configure the package for Python 3.14, and preserve pytest-xdist for parallel integration tests. Document registry conventions, local dev setup, validation commands, and Claude/agent handoff files. Add unit and integration coverage for metadata snapshots, registry validation, provider routing, explicit model-run keys, active model-run filtering, third-party notices, and selectable shared model runs. As a byproduct of using Models.dev, the following model release dates have changed: mistral-large-2411: 2024-11-18 -> 2024-11-01 deepseek-r1: 2025-01-20 -> 2024-12-26 deepseek-v3: 2024-12-25 -> 2025-01-20 glm-4.6: 2025-11-13 -> 2025-09-30 kimi-k2-thinking: 2025-11-05 -> 2025-11-06 kimi-k2.5: 2026-01-30 -> 2026-01-27 glm-5: 2026-02-12 -> 2026-02-11 glm-5.1: 2026-04-07 -> 2026-03-27 kimi-k2.6: 2026-04-20 -> 2026-04-21 claude-3-7-sonnet-20250219: 2025-02-24 -> 2025-02-19 claude-haiku-4-5-20251001: 2025-10-01 -> 2025-10-15 claude-opus-4-5-20251101: 2025-11-24 -> 2025-11-01 grok-4.3: 2026-05-01 -> 2026-04-17 gemini-2.5-flash: 2025-06-17 -> 2025-03-20 gemini-2.5-pro: 2025-06-17 -> 2025-03-20 gemini-3.1-flash-lite: 2026-05-08 -> 2026-05-07
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Move canonical LLM model metadata and benchmarkable model-run declarations into utils so downstream repos can select shared runs by stable model_run_key.
Add Models.dev and Artificial Analysis metadata snapshots and loaders. Resolve release dates from Models.dev with manual fallbacks, separate canonical model_key values from provider_model_id routing strings, and validate model declarations during registry construction.
Require every ModelRun to declare an explicit, filename-safe model_run_key. Keep build_model_run_key as a naming helper for option coverage, validate duplicate keys, and expose MODEL_RUNS_BY_KEY/select_model_runs for benchmark selection.
Add Model.active and ACTIVE_MODEL_RUNS so historical runs remain in MODEL_RUNS while runs depending on inactive provider routes are excluded from current live-callable benchmark sweeps. Mark the Together deepseek-v3.1 route inactive and replace live smoke tests with the active MiniMax M2.7 route.
Add Artificial Analysis-backed model-run declarations as benchmark-selectable runs that are automatically included in MODEL_RUNS, with display names resolved from a minimized checked-in AA snapshot containing only stable IDs and display names.
Add third-party notices for Models.dev's MIT license and Artificial Analysis attribution, and include those notices in built wheel license metadata.
Move shared LLM provider dependencies into pyproject metadata, make requirements.txt delegate to .[dev], configure the package for Python 3.14, and preserve pytest-xdist for parallel integration tests.
Document registry conventions, local dev setup, validation commands, and Claude/agent handoff files. Add unit and integration coverage for metadata snapshots, registry validation, provider routing, explicit model-run keys, active model-run filtering, third-party notices, and selectable shared model runs.
As a byproduct of using Models.dev, the following model release dates have changed:
mistral-large-2411: 2024-11-18 -> 2024-11-01
deepseek-r1: 2025-01-20 -> 2024-12-26
deepseek-v3: 2024-12-25 -> 2025-01-20
glm-4.6: 2025-11-13 -> 2025-09-30
kimi-k2-thinking: 2025-11-05 -> 2025-11-06
kimi-k2.5: 2026-01-30 -> 2026-01-27
glm-5: 2026-02-12 -> 2026-02-11
glm-5.1: 2026-04-07 -> 2026-03-27
kimi-k2.6: 2026-04-20 -> 2026-04-21
claude-3-7-sonnet-20250219: 2025-02-24 -> 2025-02-19
claude-haiku-4-5-20251001: 2025-10-01 -> 2025-10-15
claude-opus-4-5-20251101: 2025-11-24 -> 2025-11-01
grok-4.3: 2026-05-01 -> 2026-04-17
gemini-2.5-flash: 2025-06-17 -> 2025-03-20
gemini-2.5-pro: 2025-06-17 -> 2025-03-20
gemini-3.1-flash-lite: 2026-05-08 -> 2026-05-07