You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
I would like to request the addition of GPU acceleration support for Qualcomm devices using the custom version of llama.cpp. This enhancement could significantly improve the performance of the PocketPal AI application on devices with Qualcomm Adreno GPUs.
My test results (tested on Snapdragon 8 Elite platform):
GPU Inference
OP5D0DL1:/data/local/tmp $ ./bin/llama-cli -m ggml-model-qwen1.5-7b-chat-Q4_0.gguf -b 128 -ngl 99 -c 2048 -p "Hello"
llama_perf_sampler_print: sampling time = 24.03 ms / 46 runs ( 0.52 ms per token, 1914.51 tokens per second)
llama_perf_context_print: load time = 11765.90 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 5038.54 ms / 45 runs ( 111.97 ms per token, 8.93 tokens per second)
llama_perf_context_print: total time = 5120.83 ms / 46 tokens
CPU Inference
llama_perf_sampler_print: sampling time = 1.34 ms / 13 runs ( 0.10 ms per token, 9708.74 tokens per second)
llama_perf_context_print: load time = 4218.12 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 6745.14 ms / 12 runs ( 562.10 ms per token, 1.78 tokens per second)
llama_perf_context_print: total time = 8294.89 ms / 13 tokens
The GPU-accelerated llama.cpp inference speed is approximately 5 times faster than CPU inference on my device.
Description:
I would like to request the addition of GPU acceleration support for Qualcomm devices using the custom version of llama.cpp. This enhancement could significantly improve the performance of the PocketPal AI application on devices with Qualcomm Adreno GPUs.
Reference:
For more details, please refer to the Qualcomm developer blog post: Introducing the new OpenCL GPU backend for llama.cpp
Benefits:
Thank you for considering this feature request.
The text was updated successfully, but these errors were encountered: