Skip to content

Quantized model time performance #1151

@HemanLin-cl

Description

@HemanLin-cl

Dear authors,

Thanks for the great help. Currently we've tried to measure SDXL base model.
I've tried (win-vulkan)

sd-cli.exe -M convert -m sd_xl_diffusion_base_x1.safetensors -o sd_xl_diffusion_base_x1_q8_0.gguf -v --type q8_0
sd-cli.exe -M convert -m sd_xl_diffusion_base_x1.safetensors -o sd_xl_diffusion_base_x1_q4_0.gguf -v --type q4_0

I can convert SDXL into q8_0/q4_0 format by applying this.
And then I tried to compare fp16/q8_0/q4_0 models' time by:

sd-cli.exe -m sd_xl_diffusion_base_x1.safetensors -W 1024 -H 1024 -p "a lovely cat" --vae-tiling --vae-conv-direct --steps 7 -v
sd-cli.exe -m sd_xl_diffusion_base_x1_q8_0.gguf -W 1024 -H 1024 -p "a lovely cat" --vae-tiling --vae-conv-direct --type q8_0 -v --steps 7
sd-cli.exe -m sd_xl_diffusion_base_x1_q4_0.gguf -W 1024 -H 1024 -p "a lovely cat" --vae-tiling --vae-conv-direct --type q4_0 -v --steps 7

However, the time performances of the three models are almost the same.
FP16: 64 sec, q8_0: 68 sec, q4_0: 63 sec
Is this expected?

My platform info is as following:

Lunar Lake
CPU: Intel Ultra 268V
GPU: Intel R AI Boost
RAM: 32G

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions