Skip to content

Add Ollama warmth lifetime scoring as bounded placement tiebreaker#151

Merged
toasterbook88 merged 2 commits into
mainfrom
feat/o9-ollama-warmth
Jun 2, 2026
Merged

Add Ollama warmth lifetime scoring as bounded placement tiebreaker#151
toasterbook88 merged 2 commits into
mainfrom
feat/o9-ollama-warmth

Conversation

@toasterbook88
Copy link
Copy Markdown
Owner

Summary

Promote the resident-model "is loaded" boolean into a continuous 0.0-1.0 warmth score derived from Ollama's /api/ps expires_at and default_keep_alive. Warmth becomes a bounded tiebreaker at position 10 of the placement rank comparator, after RAM, GPU, pressure, and reservation ratio.

FilterCandidates is unchanged: warmth is consulted only among nodes that already passed eval.Eligible(). It cannot promote an undersized node, and the three-bucket discretization (cold/warm/hot at 0.5 and 0.9) keeps ranking stable.

What ships

  • ResidentModel.ExpiresAt (time.Time, omitempty) and ResidentModel.WarmthScore (float64, omitempty) in internal/models/types.go
  • OllamaInfo.DefaultKeepAlive (string, omitempty) for the process-level Ollama default
  • ApplyOllamaWarmth and DefaultOllamaKeepAlive helpers in internal/facts/local.go (exported for testability)
  • OllamaDiscoveryScript updated to read ollama ps -qq (JSON path, Ollama 0.3.10+) and fall back to the existing awk parser on older Ollama; queries /api/ps for default_keep_alive and falls back to 5m when missing or unparseable
  • modelWarmthRank and warmthToRank in internal/placement/empirical.go (3 buckets: 0 cold, 1 warm, 2 hot at 0.5 and 0.9)
  • modelWarmthRank wired into rankKey at position 10 of the comparator in internal/placement/ranker.go
  • 11 new tests in internal/placement/warmth_test.go covering: warmth loses to allocatable RAM, warmth breaks ties on equal RAM, warmth is ignored when FilterCandidates rejects, boundary cases (0, 0.5, 0.51, 0.9, 0.91, 1.0, 2.0), highest-relevant-wins, other-runtime warmth is ignored, and time math for zero / future / past ExpiresAt

Safety contract

  • Warmth is strictly a tiebreaker. FilterCandidates (ranker.go) calls eval.Eligible() before any ranking begins; warmth cannot override RAM, GPU, or pressure eligibility.
  • The 3-bucket discretization (cold/warm/hot) means warmth is stable under small time changes and cannot induce rank flips on transient observations.
  • All new fields use omitempty, so older Ollama, other runtimes (llama-server, mlx_lm.server), and zero/expired ExpiresAt all leave WarmthScore at 0 and behave as cold.
  • Probe path tolerates ollama ps -qq failing on older Ollama (e.g., 0.23.3 rejects -qq with "unknown shorthand flag") and gracefully falls back to the existing awk parser, which emits no expires_at — those entries remain cold.

Quality gates

  • go build ./...
  • go test ./... -count=1
  • go test -race ./... -count=1
  • gofmt -l . clean ✓
  • go vet ./... clean ✓
  • make coverage ✓ (knowledge 90.9% / api 80.9% / mcp 83.7% / ui 94.0% / total 69.1%)
  • ./hack/verify-repo-truth.sh
  • make build ✓ — binary exposes expires_at and default_keep_alive in axis facts --format json

Test coverage

11 new tests in internal/placement/warmth_test.go:

Test Locks in
TestRankCandidatesWarmthLosesToAllocatableRAM small-hot loses to large-cold (warmth cannot promote undersized node)
TestRankCandidatesWarmthBreaksTieOnEqualAllocatableRAM warmth resolves ties on equal RAM
TestRankCandidatesWarmthFilteredBeforeRanking hot node with insufficient RAM is dropped by FilterCandidates
TestWarmthToRankBoundaries strict > comparisons at 0.5 and 0.9 (exactly 0.5 is cold; exactly 0.9 is warm)
TestModelWarmthRankPicksHighestRelevant ranks by highest matching model's warmth, not average
TestModelWarmthRankIgnoresOtherRuntimes non-ollama warmth is ignored
TestApplyOllamaWarmthTimeZero time.Time{} (omitted) → all cold
TestApplyOllamaWarmthInFuturePopulates future ExpiresAtWarmthScore > 0
TestApplyOllamaWarmthPastExpiresAtIsCold past ExpiresAt → cold
TestDefaultOllamaKeepAliveFallbacks empty/garbage/zero → 5m
TestDefaultOllamaKeepAliveParses valid durations parse; 5m default when negative

Backward compatibility

  • All new fields are omitempty, so older /api/ps payloads and existing fixtures continue to unmarshal cleanly.
  • Older Ollama (<0.3.10) falls back to the awk parser; warmth for those nodes is 0 (cold), which is the correct conservative behavior.
  • The ranker comparator insert is additive — no existing comparator field is renamed, retyped, or removed.

Notes

  • No public-repo-sensitive content (no real hostnames, IPs, SSH users, model names beyond generic placeholders, or per-host output) is included in this PR.
  • Author identity is the project's public-safe contributor handle.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces model warmth scoring and ranking for Ollama resident models. It updates the local discovery script to parse JSON from ollama ps -qq and retrieve the process-level default_keep_alive duration, which is then used to compute a continuous warmth score based on the model's expiration time. This score is integrated into the candidate ranking logic as a bounded tiebreaker. Feedback on the changes includes making the embedded Python script more robust against unexpected JSON structures or types, and handling bare integers representing seconds in DefaultOllamaKeepAlive to prevent parsing failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread internal/facts/tools.go Outdated
Comment thread internal/facts/local.go
AXIS Contributor and others added 2 commits June 2, 2026 12:15
Promote the resident-model 'is loaded' boolean into a continuous 0.0-1.0
warmth score derived from Ollama's /api/ps expires_at and
default_keep_alive. Warmth becomes a bounded tiebreaker at position 10 of
the rank comparator, after RAM, GPU, pressure, and reservation ratio.

FilterCandidates is unchanged: warmth is consulted only among nodes that
already passed eval.Eligible(). It cannot promote an undersized node, and
the three-bucket discretization (cold/warm/hot at 0.5 and 0.9) keeps
ranking stable.

Probe layer reads the new fields when Ollama 0.3.10+ is present
('ollama ps -qq' JSON path), and degrades gracefully to the existing awk
parser on older Ollama - no expires_at is emitted in that case and
WarmthScore remains 0 (cold). /api/ps is also queried for
default_keep_alive, falling back to 5m when missing or unparseable.

Adds ResidentModel.ExpiresAt, ResidentModel.WarmthScore, and
OllamaInfo.DefaultKeepAlive (all omitempty, additive JSON), plus
ApplyOllamaWarmth / DefaultOllamaKeepAlive helpers in the facts layer and
modelWarmthRank in the ranker. Tests cover: warmth loses to allocatable
RAM, warmth breaks ties on equal RAM, warmth is ignored when FilterCandidates
rejects, boundary cases (0, 0.5, 0.51, 0.9, 0.91, 1.0), highest-relevant
wins, other-runtime warmth is ignored, and time math for zero / future /
past ExpiresAt.
…ing robust

Improve robustness of the Ollama resident model discovery script by handling non-list JSON formats and string type conversions for VRAM size safely.

Handle bare integer keep-alive duration strings in DefaultOllamaKeepAlive by appending seconds ("s") unit prior to duration parsing.

Addresses review comments from gemini-code-assist[bot] on PR 151.

Co-Authored-By: Antigravity <noreply@gemini.google.com>
@toasterbook88 toasterbook88 force-pushed the feat/o9-ollama-warmth branch from bc78644 to 9ff97d6 Compare June 2, 2026 16:40
@toasterbook88 toasterbook88 merged commit 5089e7f into main Jun 2, 2026
8 checks passed
@toasterbook88 toasterbook88 deleted the feat/o9-ollama-warmth branch June 2, 2026 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant