Quantized model time performance

Dear authors, 

Thanks for the great help. Currently we've tried to measure SDXL base model. 
I've tried (win-vulkan) 

sd-cli.exe -M convert -m sd_xl_diffusion_base_x1.safetensors -o sd_xl_diffusion_base_x1_q8_0.gguf -v --type q8_0
sd-cli.exe -M convert -m sd_xl_diffusion_base_x1.safetensors -o sd_xl_diffusion_base_x1_q4_0.gguf -v --type q4_0 

I can convert SDXL into q8_0/q4_0 format by applying this. 
And then I tried to compare fp16/q8_0/q4_0 models' time by: 

sd-cli.exe  -m sd_xl_diffusion_base_x1.safetensors -W 1024 -H 1024 -p "a lovely cat" --vae-tiling --vae-conv-direct --steps 7 -v
sd-cli.exe  -m  sd_xl_diffusion_base_x1_q8_0.gguf -W 1024 -H 1024 -p "a lovely cat" --vae-tiling --vae-conv-direct --type q8_0 -v --steps 7
sd-cli.exe  -m  sd_xl_diffusion_base_x1_q4_0.gguf  -W 1024 -H 1024 -p "a lovely cat" --vae-tiling --vae-conv-direct  --type q4_0 -v --steps 7

However, the time performances of the three models are almost the same. 
FP16: 64 sec, q8_0: 68 sec, q4_0: 63 sec
Is this expected? 

My platform info is  as following: 

Lunar Lake  
CPU: Intel Ultra 268V 
GPU:  Intel R AI Boost 
RAM:  32G

Thanks 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantized model time performance #1151

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Quantized model time performance #1151

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions