Hi,
I'm use Z-Image-Turbo on a RX7600 8GB card.
I'm trying to unload the text encoder model(QWen3-4B-Q4KM) after prompt processing, to save about 2.8GB vram.
I placed the unloadModel node right after the Prompt node.
Here is the console output, before loading the Z-Image-Turbo-Q4KM model(4.8GB), there is only 3.2GB of free VRAM, indicating the text encoder model is not actually unloaded:
got prompt
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
gguf qtypes: F32 (145), Q6_K (37), Q4_K (216)
Dequantizing token_embd.weight to prevent runtime OOM.
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load ZImageTEModel_
loaded completely; 6613.77 MB usable, 2813.50 MB loaded, full load: True
Unload Model:
- Clearing Cache...
gguf qtypes: F32 (245), BF16 (28), Q4_K (120), Q5_K (30), Q6_K (30)
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
Requested to load Lumina2
loaded partially; 3193.65 MB usable, 3108.40 MB loaded, 1725.66 MB offloaded, 85.25 MB buffer reserved, lowvram patches: 0
Is there any wrong with my flow? How can I get it working as expected?
Thanks,
Hi,
I'm use Z-Image-Turbo on a RX7600 8GB card.
I'm trying to unload the text encoder model(QWen3-4B-Q4KM) after prompt processing, to save about 2.8GB vram.
I placed the unloadModel node right after the Prompt node.
Here is the console output, before loading the Z-Image-Turbo-Q4KM model(4.8GB), there is only 3.2GB of free VRAM, indicating the text encoder model is not actually unloaded:
got prompt
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
gguf qtypes: F32 (145), Q6_K (37), Q4_K (216)
Dequantizing token_embd.weight to prevent runtime OOM.
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load ZImageTEModel_
loaded completely; 6613.77 MB usable, 2813.50 MB loaded, full load: True
Unload Model:
gguf qtypes: F32 (245), BF16 (28), Q4_K (120), Q5_K (30), Q6_K (30)
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
Requested to load Lumina2
loaded partially; 3193.65 MB usable, 3108.40 MB loaded, 1725.66 MB offloaded, 85.25 MB buffer reserved, lowvram patches: 0
Is there any wrong with my flow? How can I get it working as expected?
Thanks,