ping page: per-row Ping buttons + show every model#669
Merged
Conversation
Reworks /ping for troubleshooting individual models. Old behaviour: page opened an SSE stream that pinged every *prioritized* model in sequence and appended rows as results came back. Models without a `priority` field were silently skipped, and there was no way to retry a single model. New behaviour: - The full table renders on page load with one row per configured model across every llm_config profile, prioritized models first. - A new "Priority" column shows each model's priority value (blank when unset). - Each row has its own "Ping" button. - A top "Ping All" button iterates rows sequentially. Backend: - New `/llm-list` returns every (profile, llm_name, priority) tuple in JSON. Replaces the prior priority-only enumeration inside the SSE handler. - New `/llm-ping-one?profile=X&llm_name=Y` pings one model and returns the result as JSON. - Removed `/llm-ping` (SSE) and the matching `/ping/stream` proxy in frontend_multi_user. - New helper `get_all_llm_names_with_priority()` in llm_factory.py returns every model in the profile, prioritized first (sorted asc), unprioritized after in declaration order. Bundled config changes (caught while exercising the new page): - Add `deepseek-v4-flash` (DeepSeek native API, api.deepseek.com/v1, thinking mode disabled via `additional_kwargs.extra_body`, $0.14/M in / $0.28/M out). - Replace `openrouter-elephant-alpha` with `openrouter-ling-2.6-flash` — Elephant Alpha was a stealth alias and now 404s; production name is `inclusionai/ling-2.6-flash` (262K context, $0.08/M in / $0.24/M out).
Add: - openrouter-granite-4.1-8b (ibm-granite/granite-4.1-8b, 131K context, $0.05/$0.10). - openrouter-laguna-xs.2-free (poolside/laguna-xs.2:free, 128K context, free). - openrouter-nemotron-3-nano-omni-30b-reasoning-free (nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free, 256K context, free, reasoning model). Rename: - deepseek-v4-flash -> deepseek-v4-flash-thinking-disabled. The "model" argument is unchanged (deepseek-v4-flash is the DeepSeek API id); only the config key gets the suffix to make the no-thinking behaviour explicit alongside any future thinking-enabled variant.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reworks
/pingfor troubleshooting individual models, plus two relatedllm_config/baseline.jsonupdates.Page behaviour
Before: the page opened an SSE stream that pinged every prioritized model in sequence and appended rows as results came back. Models without a
priorityfield were silently skipped, and there was no way to retry a single model.After:
llm_configprofile.Backend
GET /llm-listreturns every(profile, llm_name, priority)tuple as JSON.GET /llm-ping-one?profile=X&llm_name=Ypings one model and returns the result as JSON.GET /llm-ping(SSE) and the matching/ping/streamproxy infrontend_multi_user.get_all_llm_names_with_priority()inllm_factory.py.Bundled
llm_config/baseline.jsonchangesBoth caught while exercising the new ping page:
deepseek-v4-flash(DeepSeek native API atapi.deepseek.com/v1, thinking mode explicitly disabled viaadditional_kwargs.extra_body, $0.14/M input / $0.28/M output).openrouter-elephant-alphawithopenrouter-ling-2.6-flash— Elephant Alpha was a stealth alias and now returns 404; production name isinclusionai/ling-2.6-flash(262K context, $0.08/M in / $0.24/M out).Test plan
/ping, confirm every model appears (incl. ones withoutpriority).success+ response time.deepseek-v4-flashpings green with aDEEPSEEK_API_KEYset.openrouter-ling-2.6-flashpings green (no longer 404).🤖 Generated with Claude Code