Skip to content

Fix: report model_is_loaded status immediately instead of waiting up to 10s#430

Open
cemalgnlts wants to merge 2 commits into
vast-ai:masterfrom
cemalgnlts:cemalgnlts
Open

Fix: report model_is_loaded status immediately instead of waiting up to 10s#430
cemalgnlts wants to merge 2 commits into
vast-ai:masterfrom
cemalgnlts:cemalgnlts

Conversation

@cemalgnlts

@cemalgnlts cemalgnlts commented Jun 22, 2026

Copy link
Copy Markdown

Hi,

This commit reduces the serverless cold start time by ~10 seconds.

Additionally, I've updated the legacy scripts/start_server.sh file to match the one at vast-ai/pyworker/main/start_server.sh.

_model_loaded() sets model_is_loaded=True but doesn't set update_pending,
so _send_metrics_loop can wait up to 10s before notifying report_addr.
This adds update_pending = True to push the status on the next tick (~1s).

Example
Before commit template: cloud.vast.ai/?ref_id=404507&creator_id=404507&name=Qwen3.5 9B - Slow
After commit template: cloud.vast.ai/?ref_id=404507&creator_id=404507&name=Qwen3.5 9B - Fast

Example request template

curl -o /dev/null -s -w "%{time_total} seconds" --request POST \
  --url https://openai.vast.ai/$ENDPOINT/chat/completions \
  --header 'authorization: Bearer $API_KEY' \
  --header 'content-type: application/json' \
  --data '{"messages": [{ "role": "user", "content": "1 + 1 = ?" }]}'

Before commit: 17 - 20 seconds
After commit: 7.9 - 10 seconds

@cemalgnlts cemalgnlts requested a review from a team as a code owner June 22, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant