-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat(llama.cpp): estimate vram usage #5299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces VRAM usage estimation improvements for models by extending GPU functions and updating GGUF parsing logic. The key changes include:
- Adding a function (TotalAvailableVRAM) to sum available GPU memory.
- Implementing VRAM estimation in the GGUF parsing using model metadata and architecture.
- Replacing outdated metadata calls (f.Model().Name) with the newer f.Metadata().Name.
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
pkg/xsysinfo/gpu.go | Adds TotalAvailableVRAM which aggregates memory from all available GPUs. |
pkg/xsysinfo/gguf.go | Implements EstimateGGUFVRAMUsage for calculating estimated VRAM usage. |
core/config/guesser.go | Removes redundant GPU option assignment. |
core/config/gguf.go | Updates GGUF configuration to use new metadata methods and adds VRAM estimation. |
core/cli/util.go | Updates logging calls to use f.Metadata().Name instead of f.Model().Name. |
Files not reviewed (1)
- go.mod: Language not supported
e33bb1b
to
c809ec5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces VRAM estimation functionality for gguf models while consolidating and updating GPU-related configuration logic. Key changes include:
- Adding a new function to calculate total available VRAM from detected GPUs.
- Implementing VRAM usage estimation for gguf models with a new VRAMEstimate struct.
- Replacing outdated model metadata accesses and adjusting GPU options and context estimations in configuration files.
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
pkg/xsysinfo/gpu.go | Added TotalAvailableVRAM() for aggregating GPU memory. |
pkg/xsysinfo/gguf.go | Introduced VRAMEstimate struct and EstimateGGUFVRAMUsage(). |
core/config/guesser.go | Removed redundant GPU options logic. |
core/config/gguf.go | Updated metadata access, GPU options, and VRAM estimation logic. |
core/cli/util.go | Updated logging to use updated metadata syntax. |
Files not reviewed (1)
- go.mod: Language not supported
Comments suppressed due to low confidence (1)
core/config/gguf.go:152
- Ensure that replacing EstimateLLaMACppUsage() with EstimateLLaMACppRun() maintains the intended estimation behavior, as these methods may have differing implementations.
ctxSize := f.EstimateLLaMACppRun().ContextSize
c809ec5
to
b8dc637
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a VRAM usage estimation feature for GGUF models while refactoring GPU handling and dependency imports.
- Added a function to calculate total available VRAM based on detected GPUs.
- Updated GGUF-related methods to use metadata instead of model properties and replaced old gguf parser imports.
- Refactored configuration logic to set GPU options and layer estimates appropriately.
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
pkg/xsysinfo/gpu.go | Added TotalAvailableVRAM to aggregate usable GPU memory. |
pkg/xsysinfo/gguf.go | Introduced VRAM estimation for GGUF models using metadata. |
core/config/guesser.go | Removed redundant GPU options setting, aligning with refactoring. |
core/config/gguf.go | Updated GGUF defaults, including renaming functions and metadata usage. |
core/cli/util.go | Replaced model name references with metadata name in logging. |
Files not reviewed (1)
- go.mod: Language not supported
Signed-off-by: Ettore Di Giacinto <[email protected]>
b8dc637
to
7f654fe
Compare
Description
This PR fixes #3541 , supersedes #3737
Notes for Reviewers
Not tested yet
Signed commits