GGUF appears to run much slower compared to same version of GGML model #959

J-Scott-Dav · 2023-11-30T19:01:16Z

J-Scott-Dav
Nov 30, 2023

I am comparing performance of similar models pulled from HuggingFace. Using the question "Name the planets in the solar system?" I found a striking difference in performance. I was wondering if someone could comment on my observations below. Is this a fair comparison? Should these performance differences be expected, or perhaps I am doing something wrong?

Model Name: wizardlm-13b-v1.1-superhot-8k-ggmlv3.q4_0.bin
File size: 7.1 GB
llama-cpp-python version: 0.1.78
Time to answer question: 7.9 seconds

Model Name: wizardlm-13b-v1.2.ggmlv3.q4_0.bin
File size: 7.1 GB
llama-cpp-python-version: 0.1.78
Time to answer question: 15.6 seconds

Model Name: wizardlm-13b-v1.2.q4_k_m.gguf
File size: 7.6 GB
llama-cpp-python-version: 0.2.12
Time to answer question: 64.4 seconds

Any comments/suggestions would be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GGUF appears to run much slower compared to same version of GGML model #959

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

GGUF appears to run much slower compared to same version of GGML model #959

Uh oh!

J-Scott-Dav Nov 30, 2023

Replies: 0 comments

J-Scott-Dav
Nov 30, 2023