Revise the default logic for the model cache RAM limit #7566

RyanJDick · 2025-01-16T22:30:22Z

Summary

This PR revises the logic for calculating the model cache RAM limit. See the code for thorough documentation of the change.

The updated logic is more conservative in the amount of RAM that it will use. This will likely be a better default for more users. Of course, users can still choose to set a more aggressive limit by overriding the logic with max_cache_ram_gb.

Related Issues / Discussions

Should help with High Windows Committed Memory (Virtual Memory) #7563

QA Instructions

Exercise all heuristics:

Heuristic 1
Heuristic 2
Heuristic 3
Heuristic 4

Merge Plan

Merge Add keep_ram_copy_of_weights config option #7565 first and update the target branch

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

psychedelicious · 2025-01-17T00:22:03Z

Code review looks good. Let me know if I can help with testing.

## Summary This PR adds a `keep_ram_copy_of_weights` config option the default (and legacy) behavior is `true`. The tradeoffs for this setting are as follows: - `keep_ram_copy_of_weights: true`: Faster model switching and LoRA patching. - `keep_ram_copy_of_weights: false`: Lower average RAM load (may not help significantly with peak RAM). ## Related Issues / Discussions - Helps with #7563 - The Low-VRAM docs are updated to include this feature in #7566 ## QA Instructions - Test with `enable_partial_load: false` and `keep_ram_copy_of_weights: false`. - [x] RAM usage when model is loaded is reduced. - [x] Model loading / unloading works as expected. - [x] LoRA patching still works. - Test with `enable_partial_load: false` and `keep_ram_copy_of_weights: true`. - [x] Behavior should be unchanged. - Test with `enable_partial_load: true` and `keep_ram_copy_of_weights: false`. - [x] RAM usage when model is loaded is reduced. - [x] Model loading / unloading works as expected. - [x] LoRA patching still works. - Test with `enable_partial_load: true` and `keep_ram_copy_of_weights: true`. - [x] Behavior should be unchanged. - [x] Smoke test CPU-only and MPS with default configs. ## Merge Plan - [x] Merge #7564 first and change target branch. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [ ] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_

psychedelicious · 2025-01-17T04:55:14Z

@RyanJDick I wonder if we should figure out special handling for MPS devices and their unified memory architecture.

github-actions bot added python PRs that change python files backend PRs that change backend files labels Jan 16, 2025

RyanJDick added 2 commits January 16, 2025 23:46

Revise the logic for calculating the RAM model cache limit.

0cf51ce

Update the Low-VRAM docs.

ce57c4e

github-actions bot added the docs PRs that change docs label Jan 16, 2025

RyanJDick force-pushed the ryan/lower-virtual-memory-3 branch from 9235ede to ce57c4e Compare January 16, 2025 23:46

RyanJDick mentioned this pull request Jan 16, 2025

Add keep_ram_copy_of_weights config option #7565

Merged

14 tasks

psychedelicious approved these changes Jan 17, 2025

View reviewed changes

RyanJDick marked this pull request as ready for review January 17, 2025 00:43

RyanJDick requested review from lstein, blessedcoolant, brandonrising and hipsterusername as code owners January 17, 2025 00:43

hipsterusername approved these changes Jan 17, 2025

View reviewed changes

Base automatically changed from ryan/lower-virtual-memory-2 to main January 17, 2025 00:57

RyanJDick merged commit c5d2de3 into main Jan 17, 2025
29 checks passed

RyanJDick deleted the ryan/lower-virtual-memory-3 branch January 17, 2025 00:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise the default logic for the model cache RAM limit #7566

Revise the default logic for the model cache RAM limit #7566

RyanJDick commented Jan 16, 2025 •

edited

Loading

psychedelicious commented Jan 17, 2025

psychedelicious commented Jan 17, 2025

Revise the default logic for the model cache RAM limit #7566

Revise the default logic for the model cache RAM limit #7566

Conversation

RyanJDick commented Jan 16, 2025 • edited Loading

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

psychedelicious commented Jan 17, 2025

psychedelicious commented Jan 17, 2025

RyanJDick commented Jan 16, 2025 •

edited

Loading