Skip to content

Nothing unloaded in a Z-Image-Turbo workflow #11

@wdx04

Description

@wdx04

Hi,

I'm use Z-Image-Turbo on a RX7600 8GB card.
I'm trying to unload the text encoder model(QWen3-4B-Q4KM) after prompt processing, to save about 2.8GB vram.
I placed the unloadModel node right after the Prompt node.

Image

Here is the console output, before loading the Z-Image-Turbo-Q4KM model(4.8GB), there is only 3.2GB of free VRAM, indicating the text encoder model is not actually unloaded:
got prompt
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
gguf qtypes: F32 (145), Q6_K (37), Q4_K (216)
Dequantizing token_embd.weight to prevent runtime OOM.
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load ZImageTEModel_
loaded completely; 6613.77 MB usable, 2813.50 MB loaded, full load: True
Unload Model:

  • Clearing Cache...
    gguf qtypes: F32 (245), BF16 (28), Q4_K (120), Q5_K (30), Q6_K (30)
    model weight dtype torch.bfloat16, manual cast: None
    model_type FLOW
    Requested to load Lumina2
    loaded partially; 3193.65 MB usable, 3108.40 MB loaded, 1725.66 MB offloaded, 85.25 MB buffer reserved, lowvram patches: 0

Is there any wrong with my flow? How can I get it working as expected?

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions