Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise the default logic for the model cache RAM limit #7566

Merged
merged 2 commits into from
Jan 17, 2025

Conversation

RyanJDick
Copy link
Collaborator

@RyanJDick RyanJDick commented Jan 16, 2025

Summary

This PR revises the logic for calculating the model cache RAM limit. See the code for thorough documentation of the change.

The updated logic is more conservative in the amount of RAM that it will use. This will likely be a better default for more users. Of course, users can still choose to set a more aggressive limit by overriding the logic with max_cache_ram_gb.

Related Issues / Discussions

QA Instructions

Exercise all heuristics:

  • Heuristic 1
  • Heuristic 2
  • Heuristic 3
  • Heuristic 4

Merge Plan

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files labels Jan 16, 2025
@github-actions github-actions bot added the docs PRs that change docs label Jan 16, 2025
@RyanJDick RyanJDick force-pushed the ryan/lower-virtual-memory-3 branch from 9235ede to ce57c4e Compare January 16, 2025 23:46
@psychedelicious
Copy link
Collaborator

Code review looks good. Let me know if I can help with testing.

@RyanJDick RyanJDick marked this pull request as ready for review January 17, 2025 00:43
RyanJDick added a commit that referenced this pull request Jan 17, 2025
## Summary

This PR adds a `keep_ram_copy_of_weights` config option the default (and
legacy) behavior is `true`. The tradeoffs for this setting are as
follows:
- `keep_ram_copy_of_weights: true`: Faster model switching and LoRA
patching.
- `keep_ram_copy_of_weights: false`: Lower average RAM load (may not
help significantly with peak RAM).

## Related Issues / Discussions

- Helps with #7563
- The Low-VRAM docs are updated to include this feature in
#7566

## QA Instructions

- Test with `enable_partial_load: false` and `keep_ram_copy_of_weights:
false`.
  - [x] RAM usage when model is loaded is reduced.
  - [x] Model loading / unloading works as expected.
  - [x] LoRA patching still works.
- Test with `enable_partial_load: false` and `keep_ram_copy_of_weights:
true`.
  - [x] Behavior should be unchanged.
- Test with `enable_partial_load: true` and `keep_ram_copy_of_weights:
false`.
  - [x] RAM usage when model is loaded is reduced.
  - [x] Model loading / unloading works as expected.
  - [x] LoRA patching still works.
- Test with `enable_partial_load: true` and `keep_ram_copy_of_weights:
true`.
  - [x] Behavior should be unchanged.

- [x] Smoke test CPU-only and MPS with default configs.

## Merge Plan

- [x] Merge #7564 first and
change target branch.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [ ] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
Base automatically changed from ryan/lower-virtual-memory-2 to main January 17, 2025 00:57
@RyanJDick RyanJDick merged commit c5d2de3 into main Jan 17, 2025
29 checks passed
@RyanJDick RyanJDick deleted the ryan/lower-virtual-memory-3 branch January 17, 2025 00:59
@psychedelicious
Copy link
Collaborator

@RyanJDick I wonder if we should figure out special handling for MPS devices and their unified memory architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files docs PRs that change docs python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants