Add Ollama warmth lifetime scoring as bounded placement tiebreaker#151
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces model warmth scoring and ranking for Ollama resident models. It updates the local discovery script to parse JSON from ollama ps -qq and retrieve the process-level default_keep_alive duration, which is then used to compute a continuous warmth score based on the model's expiration time. This score is integrated into the candidate ranking logic as a bounded tiebreaker. Feedback on the changes includes making the embedded Python script more robust against unexpected JSON structures or types, and handling bare integers representing seconds in DefaultOllamaKeepAlive to prevent parsing failures.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Promote the resident-model 'is loaded' boolean into a continuous 0.0-1.0
warmth score derived from Ollama's /api/ps expires_at and
default_keep_alive. Warmth becomes a bounded tiebreaker at position 10 of
the rank comparator, after RAM, GPU, pressure, and reservation ratio.
FilterCandidates is unchanged: warmth is consulted only among nodes that
already passed eval.Eligible(). It cannot promote an undersized node, and
the three-bucket discretization (cold/warm/hot at 0.5 and 0.9) keeps
ranking stable.
Probe layer reads the new fields when Ollama 0.3.10+ is present
('ollama ps -qq' JSON path), and degrades gracefully to the existing awk
parser on older Ollama - no expires_at is emitted in that case and
WarmthScore remains 0 (cold). /api/ps is also queried for
default_keep_alive, falling back to 5m when missing or unparseable.
Adds ResidentModel.ExpiresAt, ResidentModel.WarmthScore, and
OllamaInfo.DefaultKeepAlive (all omitempty, additive JSON), plus
ApplyOllamaWarmth / DefaultOllamaKeepAlive helpers in the facts layer and
modelWarmthRank in the ranker. Tests cover: warmth loses to allocatable
RAM, warmth breaks ties on equal RAM, warmth is ignored when FilterCandidates
rejects, boundary cases (0, 0.5, 0.51, 0.9, 0.91, 1.0), highest-relevant
wins, other-runtime warmth is ignored, and time math for zero / future /
past ExpiresAt.
…ing robust
Improve robustness of the Ollama resident model discovery script by handling non-list JSON formats and string type conversions for VRAM size safely.
Handle bare integer keep-alive duration strings in DefaultOllamaKeepAlive by appending seconds ("s") unit prior to duration parsing.
Addresses review comments from gemini-code-assist[bot] on PR 151.
Co-Authored-By: Antigravity <noreply@gemini.google.com>
bc78644 to
9ff97d6
Compare
Summary
Promote the resident-model "is loaded" boolean into a continuous 0.0-1.0 warmth score derived from Ollama's
/api/psexpires_atanddefault_keep_alive. Warmth becomes a bounded tiebreaker at position 10 of the placement rank comparator, after RAM, GPU, pressure, and reservation ratio.FilterCandidatesis unchanged: warmth is consulted only among nodes that already passedeval.Eligible(). It cannot promote an undersized node, and the three-bucket discretization (cold/warm/hot at 0.5 and 0.9) keeps ranking stable.What ships
ResidentModel.ExpiresAt(time.Time, omitempty) andResidentModel.WarmthScore(float64, omitempty) ininternal/models/types.goOllamaInfo.DefaultKeepAlive(string, omitempty) for the process-level Ollama defaultApplyOllamaWarmthandDefaultOllamaKeepAlivehelpers ininternal/facts/local.go(exported for testability)OllamaDiscoveryScriptupdated to readollama ps -qq(JSON path, Ollama 0.3.10+) and fall back to the existing awk parser on older Ollama; queries/api/psfordefault_keep_aliveand falls back to 5m when missing or unparseablemodelWarmthRankandwarmthToRankininternal/placement/empirical.go(3 buckets: 0 cold, 1 warm, 2 hot at 0.5 and 0.9)modelWarmthRankwired intorankKeyat position 10 of the comparator ininternal/placement/ranker.gointernal/placement/warmth_test.gocovering: warmth loses to allocatable RAM, warmth breaks ties on equal RAM, warmth is ignored whenFilterCandidatesrejects, boundary cases (0, 0.5, 0.51, 0.9, 0.91, 1.0, 2.0), highest-relevant-wins, other-runtime warmth is ignored, and time math for zero / future / pastExpiresAtSafety contract
FilterCandidates(ranker.go) callseval.Eligible()before any ranking begins; warmth cannot override RAM, GPU, or pressure eligibility.omitempty, so older Ollama, other runtimes (llama-server,mlx_lm.server), and zero/expiredExpiresAtall leaveWarmthScoreat 0 and behave as cold.ollama ps -qqfailing on older Ollama (e.g.,0.23.3rejects-qqwith "unknown shorthand flag") and gracefully falls back to the existing awk parser, which emits noexpires_at— those entries remain cold.Quality gates
go build ./...✓go test ./... -count=1✓go test -race ./... -count=1✓gofmt -l .clean ✓go vet ./...clean ✓make coverage✓ (knowledge 90.9% / api 80.9% / mcp 83.7% / ui 94.0% / total 69.1%)./hack/verify-repo-truth.sh✓make build✓ — binary exposesexpires_atanddefault_keep_aliveinaxis facts --format jsonTest coverage
11 new tests in
internal/placement/warmth_test.go:TestRankCandidatesWarmthLosesToAllocatableRAMTestRankCandidatesWarmthBreaksTieOnEqualAllocatableRAMTestRankCandidatesWarmthFilteredBeforeRankingFilterCandidatesTestWarmthToRankBoundaries>comparisons at 0.5 and 0.9 (exactly 0.5 is cold; exactly 0.9 is warm)TestModelWarmthRankPicksHighestRelevantTestModelWarmthRankIgnoresOtherRuntimesTestApplyOllamaWarmthTimeZerotime.Time{}(omitted) → all coldTestApplyOllamaWarmthInFuturePopulatesExpiresAt→WarmthScore > 0TestApplyOllamaWarmthPastExpiresAtIsColdExpiresAt→ coldTestDefaultOllamaKeepAliveFallbacksTestDefaultOllamaKeepAliveParsesBackward compatibility
omitempty, so older/api/pspayloads and existing fixtures continue to unmarshal cleanly.Notes