[Feat]: Support for GPU Acceleration on Newer Qualcomm Devices #196

shadow3aaa · 2025-02-02T18:42:06Z

Description:
I would like to request the addition of GPU acceleration support for Qualcomm devices using the custom version of llama.cpp. This enhancement could significantly improve the performance of the PocketPal AI application on devices with Qualcomm Adreno GPUs.

Reference:
For more details, please refer to the Qualcomm developer blog post: Introducing the new OpenCL GPU backend for llama.cpp

Benefits:

Enhanced performance on Qualcomm devices
Better utilization of device hardware capabilities
Potential for more responsive and efficient AI processing on mobile devices

Thank you for considering this feature request.

shadow3aaa · 2025-02-03T03:27:41Z

My test results (tested on Snapdragon 8 Elite platform):

GPU Inference

OP5D0DL1:/data/local/tmp $ ./bin/llama-cli -m ggml-model-qwen1.5-7b-chat-Q4_0.gguf -b 128 -ngl 99 -c 2048 -p "Hello"

llama_perf_sampler_print:    sampling time =      24.03 ms /    46 runs   (    0.52 ms per token,  1914.51 tokens per second)
llama_perf_context_print:        load time =   11765.90 ms
llama_perf_context_print: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =    5038.54 ms /    45 runs   (  111.97 ms per token,     8.93 tokens per second)
llama_perf_context_print:       total time =    5120.83 ms /    46 tokens

CPU Inference

llama_perf_sampler_print:    sampling time =       1.34 ms /    13 runs   (    0.10 ms per token,  9708.74 tokens per second)
llama_perf_context_print:        load time =    4218.12 ms
llama_perf_context_print: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =    6745.14 ms /    12 runs   (  562.10 ms per token,     1.78 tokens per second)
llama_perf_context_print:       total time =    8294.89 ms /    13 tokens

The GPU-accelerated llama.cpp inference speed is approximately 5 times faster than CPU inference on my device.

shadow3aaa added the enhancement New feature or request label Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: Support for GPU Acceleration on Newer Qualcomm Devices #196

[Feat]: Support for GPU Acceleration on Newer Qualcomm Devices #196

shadow3aaa commented Feb 2, 2025 •

edited

Loading

shadow3aaa commented Feb 3, 2025

[Feat]: Support for GPU Acceleration on Newer Qualcomm Devices #196

[Feat]: Support for GPU Acceleration on Newer Qualcomm Devices #196

Comments

shadow3aaa commented Feb 2, 2025 • edited Loading

shadow3aaa commented Feb 3, 2025

shadow3aaa commented Feb 2, 2025 •

edited

Loading