hf-mem

A (simple) command-line to estimate inference memory requirements on Hugging Face

Usage

cargo install hf-mem

And then:

hf-mem --model-id meta-llama/Llama-3.1-8B-Instruct --token ...

Features

Fast and light command-line, with a single installable binary
Fetches just the required bytes from the safetensors files on the Hugging Face Hub that contain the metadata
Provides an estimation based on the count of the parameters on the different dtypes
Supports both sharded i.e. model-00000-of-00000.safetensors and not sharded i.e. model.safetensors files

What's next?

Add tracing and progress bars when fetching from the Hub
Support other file types as e.g. gguf
Read metadata from local files if existing, instead of just fetching from the Hub every single time
Add more flags to support estimations assuming quantization, extended context lengths, any added memory overhead, etc.

License

This project is licensed under either of the following licenses, at your option:

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

hf-mem

Usage

Features

What's next?

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

alvarobartt/hf-mem

Folders and files

Latest commit

History

Repository files navigation

hf-mem

Usage

Features

What's next?

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages