With Gemma 4 E2B/E4B just dropping, on-device inference on Android is genuinely good now. But serious tasks still warrant a larger cloud model.
Would love the ability to configure cloud API keys (OpenAI, Anthropic, etc.) and switch between local and cloud backends per message — same chat UI, shared conversation history. The goal is using local inference for most things to cut API costs, and reaching for cloud only when needed.
Essentially the same model-switching UX that Open WebUI offers, but built into PocketPal.
With Gemma 4 E2B/E4B just dropping, on-device inference on Android is genuinely good now. But serious tasks still warrant a larger cloud model.
Would love the ability to configure cloud API keys (OpenAI, Anthropic, etc.) and switch between local and cloud backends per message — same chat UI, shared conversation history. The goal is using local inference for most things to cut API costs, and reaching for cloud only when needed.
Essentially the same model-switching UX that Open WebUI offers, but built into PocketPal.