Skip to content

Slow generation #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KintCark opened this issue May 22, 2025 · 1 comment
Closed

Slow generation #14

KintCark opened this issue May 22, 2025 · 1 comment

Comments

@KintCark
Copy link

Y is the generation speed so slow with sdxl I'm trying to use playground 2.5 but it takes forever to get a pic then it gives me a black image so I waited all that time for norhing,will you be making better optimizations soon BTW there is no support apk for 8gen1 it says 8gen1+ I tried using it and it just crashes when loading quantization model.

@rmatif
Copy link
Owner

rmatif commented May 22, 2025

@KintCark

There's nothing I can do about the speed — it’s not related to the Flutter/Dart side. It has to do with the ggml library, and the inference is slow on mobile due to the massive compute workload. That’s also why you don’t see many local diffusion inference apps around.

As for CPU performance, ggml is already fairly well optimized. I might revisit KleidAI microkernels, but in my last tests, they didn’t perform well with more than 4 threads.

I’m doing my best to improve performance by adding an OpenCL backend:
leejet/stable-diffusion.cpp#680
and also by tuning Vulkan:
ggml-org/llama.cpp#13483
Still, it's tough to beat current CPU performance

The best path forward is to use distilled models that can converge in a single step. I’ve already added some:
leejet/stable-diffusion.cpp#675
and plan to include more in the future.

I tested the app on a Snapdragon 8 Gen 3 and it works fine. Some vendors may not support custom CPU instructions, so I recommend sticking with the generic APK. On this device, I get around 8s/it for SD1.5, which is decent — you can get good results in under a minute.

The dark image output issue is due to SDXL’s VAE having NaN issues in FP16. As mentioned in a previous issue, Playground 2.5 also needs a custom scheduler anyway.

For now, I recommend sticking to distilled models with CFG-free sampling

@rmatif rmatif closed this as completed May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants