Skip to content

Conversation

@assafvayner
Copy link
Contributor

With the change to disable the chunk cache in hf-xet there was no way that was setup to allow users to optionally enable the chunk cache if they want to (I want to do this 😁).

In this PR I add a config env var to override the default behavior in hf-xet and enable the cache.

@assafvayner assafvayner requested a review from rajatarya October 29, 2025 21:31
@rajatarya
Copy link
Collaborator

Can't the chunk_cache be enabled by setting the size to > 0?

Meaning, why have an override when we already have a variable to control the behavior of the cache. If set to 0 then no chunk_cache, if set to something > 0 then that value is used and chunk_cache is used.

If setting HF_XET_CHUNK_CACHE_SIZE_BYTES > 0 does not enable the chunk cache then I would consider this a bug.

@assafvayner
Copy link
Contributor Author

If setting HF_XET_CHUNK_CACHE_SIZE_BYTES > 0 does not enable the chunk cache then I would consider this a bug.

Then we have a bug here. The Some(0) overwrites the config which contains previously the value that would be set by setting HF_XET_CHUNK_CACHE_SIZE_BYTES.

I could check for HF_XET_CHUNK_CACHE_SIZE_BYTES being set explicitly instead of a new environment variable.

I do think the way we pipe this configuration down is weird today.

@assafvayner assafvayner marked this pull request as ready for review October 29, 2025 22:45
@rajatarya
Copy link
Collaborator

If setting HF_XET_CHUNK_CACHE_SIZE_BYTES > 0 does not enable the chunk cache then I would consider this a bug.

Then we have a bug here. The Some(0) overwrites the config which contains previously the value that would be set by setting HF_XET_CHUNK_CACHE_SIZE_BYTES.

I could check for HF_XET_CHUNK_CACHE_SIZE_BYTES being set explicitly instead of a new environment variable.

I do think the way we pipe this configuration down is weird today.

Yeah, I think I introduced this bug in my changes last week. The configurable_constants macro maybe needs to be refactored or adjusted. I think the desired behavior / requirements are:

  1. default set in the utils crate
  2. can have overwritten default in utilizing crate, ex. hf_xet, git-xet
  3. can be overwritten at runtime by setting env variable.

Right now I think requirement 2. isn't possible - I think that is why things are wonky now.

Does this sound right? @hoytak is there a way to get these requirements met with what we've got now?

@hoytak
Copy link
Collaborator

hoytak commented Nov 4, 2025

There shouldn't be any need to change the configurable_constants! macro.

Let's redo the previous PR to simply make this a config feature on compile time. Introduce a default-no-cache feature and just have --features=default-no-cache and have the default value be DEFAULT_CACHE_SIZE = if cfg(features(default-no-cache)) { 0 } else { 10gb }; Then hf-xet has that feature on by default and git-xet does not.

We should be able to get the desired behavior without changing or introducing any code except that single line (from the pre-PR version) and using cargo's builtin features.

@rajatarya
Copy link
Collaborator

There shouldn't be any need to change the configurable_constants! macro.

Let's redo the previous PR to simply make this a config feature on compile time. Introduce a default-no-cache feature and just have --features=default-no-cache and have the default value be DEFAULT_CACHE_SIZE = if cfg(features(default-no-cache)) { 0 } else { 10gb }; Then hf-xet has that feature on by default and git-xet does not.

We should be able to get the desired behavior without changing or introducing any code except that single line (from the pre-PR version) and using cargo's builtin features.

Just to clarify, the desired behavior is git-xet has chunk_cache on by default and hf-xet does not. The goal of @assafvayner PR here is to make sure that env variable configuration for cache size still works for hf-xet. Meaning, if he wants to enable the chunk_cache for hf-xet he should be able to set the env variable and have it be honored.

Put another way - we want the cache size to be configurable at runtime through env variable at process startup - and if cache size is 0 then to disable the cache. This way by default on hf-xet the cache is 0 (disabled) but the user can override this by setting env variable on process startup. And in git-xet the cache is enabled (10gb).

@assafvayner assafvayner closed this Nov 4, 2025
@hoytak
Copy link
Collaborator

hoytak commented Nov 6, 2025

Just to clarify, the desired behavior is git-xet has chunk_cache on by default and hf-xet does not. The goal of @assafvayner PR here is to make sure that env variable configuration for cache size still works for hf-xet. Meaning, if he wants to enable the chunk_cache for hf-xet he should be able to set the env variable and have it be honored.

Put another way - we want the cache size to be configurable at runtime through env variable at process startup - and if cache size is 0 then to disable the cache. This way by default on hf-xet the cache is 0 (disabled) but the user can override this by setting env variable on process startup. And in git-xet the cache is enabled (10gb).

Yes, exactly, and what I am suggesting does exactly that. Let me put up a PR for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants