#

on-device-llm

Here are 22 public repositories matching this topic...

HiveForensics-AI / knolo-core

KnoLo Core is a local-first knowledge base engine built for small language models (LLMs). It packages your documents into a compact .knolo file and enables fully deterministic querying — no embeddings, no vector databases, no cloud services required. Designed for on-device and edge LLM deployments.

offline-first knowledge-base document-retrieval edge-computing edge-ai local-first lexical-search offline-llm rag-alternative vector-database-alternative small-llms on-device-llm retrieval-engine deterministic-search knolo

Updated May 19, 2026
TypeScript

es617 / hunch

On-device shell command generator for macOS Tahoe. Uses Apple's 3B model with dynamic few-shot retrieval from 21k tldr examples.

shell zsh cli terminal developer-tools on-device-ai foundation-models llm-tools apple-intelligence on-device-llm local-llm-macos

Updated Apr 21, 2026
Python

rufolangus / AAOSP

Agentic Android Open Source Project (AAOSP) — Android fork with native LLM system service, MCP-aware apps, and an agent-driven launcher. On-device Qwen 2.5 via llama.cpp. Apps declare tools in their manifest. The OS runs the model.

android agent mcp aosp android-framework edge-ai tool-use jetpack-compose cuttlefish on-device-ai system-service ai-agent llm android-15 llama-cpp agentic qwen model-context-protocol on-device-llm

Updated Apr 23, 2026
Java

Jibar-OS / JibarOS

Android 16 fork. AI as a platform primitive. Twelve capabilities, one shared runtime, every app. OEM-pluggable. Apache 2.0.

android operating-system aosp vlm platform-service multimodal on-device-ai llm oir ai-runtime android-16 on-device-llm inference-runtime jibaros

Updated May 6, 2026
Shell

whyisitworking / llama-bro

High-performance Android SDK for on-device LLM inference (GGUF). Privacy-focused, offline-first, and powered by llama.cpp with a clean Kotlin Coroutines API.

android cmake ai ndk android-library llama android-app android-package on-device-ai ndk-jni ai-assistant llamacpp llama-cpp on-device-models on-device-inference on-device-llm

Updated Mar 27, 2026
Kotlin

john-rocky / PrivateFoundationModels

Apple FoundationModels API on iOS 18+. Same call site, native passthrough on iOS 26 (Apple Intelligence), CoreML / MLX backends on older OSes. Drop-in source compatible.

macos swift ios mlx swift-package coreml on-device-ai llm generative-ai apple-neural-engine visionos mlx-swift apple-intelligence foundationmodels on-device-llm

Updated May 14, 2026
Swift

Nova-IDE

carrycooldude / Nova-IDE

on-device-ai on-device-inference on-device-llm qualcomm-gpu

Updated May 27, 2026
JavaScript

YueLich / aios-wiki

📱 手机端 AI 操作系统全景知识库 — 334+ 篇深度页面，覆盖端侧大模型、AI Agent、芯片适配、推理优化 | 自动更新

wiki xiaomi arxiv knowledge-base quantization npu inference-optimization edge-ai ai-os harmonyos mobile-ai ai-assistant llm mobile-agent on-device-llm

Updated May 24, 2026

dsngeu / on-device-ai-model

iOS app that runs a local LLM on-device to transcribe meetings and generate structured notes — action items, decisions, and summaries. No cloud, no API keys, no data leaves the phone.

llama meeting-notes privacy-first edge-ai local-llm gguf privacy-first-ai on-device-llm

Updated May 26, 2026
Swift

RaccoonOnion / ash

Ash — offline survival assistant for iOS. Gemma 4 E2B/E4B fully on-device (text · image · voice) with RAG-grounded answers over 56 emergency-response packs. Built for the Kaggle Gemma 4 Good Hackathon.

ios offline-first survival flutter gemma emergency-response objectbox litert rag hnsw on-device-ai minilm speculative-decoding on-device-llm litert-lm gemma-4

Updated May 24, 2026
Dart

Jibar-OS / .github

JibarOS organization profile.

Updated Apr 23, 2026

coreline-ai / kotlin_llm_playlists

온디바이스 LLM + RAG 기반 로컬 음악 추천 Android 앱 | Android local music recommendation app powered by on-device LLM and RAG

audio android kotlin open-source ai music-recommendation rag llm on-device-llm coreline-ai

Updated May 3, 2026

privane-ai / privane-core

Execution infrastructure for local-first AI. Reason locally, execute globally.

mcp webgpu ai-agents local-first playwright secure-sandbox llm-orchestration ai-infrastructure agentic-workflows hybrid-inference sovereign-ai on-device-llm

Updated May 26, 2026
TypeScript

zxuhan / distill-sql

On-device text-to-SQL distilled from GPT-4o-mini into Qwen2.5 (0.5B → 3B locally on M1 via mlx-lm LoRA, 7B+ on cloud A100). 847 MB at 62.5% on Spider dev; 3B variant hits 72.6%, 7B variant hits 75.0%.

lora quantization knowledge-distillation mlx peft text-to-sql apple-silicon qlora qwen on-device-llm spider-benchmark llm-distillation

Updated May 12, 2026
Python

sagar-develop / litertlm-kmp

Production KMP framework for Google LiteRT-LM. Run Gemma on-device with OEM-aware RAM fixes, resilient Ktor chunked downloads, and schema-driven function calling. Plain Android support. AGPL-3.0 / Commercial dual-licensed.

android kotlin ios clean-architecture gemma litert kotlin-multiplatform rag edge-ai mediapipe local-llm kotlin-inject offline-ai structured-outputs on-device-llm dual-licensing android-llm gemma-4

Updated May 29, 2026
Kotlin

avisre / snapdragon-npu-llm

Run LLMs on Snapdragon NPU — including the 'unsupported' 8 Gen 1 (Hexagon v69). Verified at 31 tok/s on OnePlus 10 Pro.

snapdragon qnn edge-ai llm-inference oneplus-10-pro mobile-llm executorch on-device-llm android-llm hexagon-npu qualcomm-ai snapdragon-8-gen-1 samsung-galaxy-s22 hexagon-v69

Updated May 23, 2026
Shell

aurascoper / NeuralCompose

Privacy-first macOS BCI prototype: Muse EEG → BrainFlow → Core ML (ANE) → local MLX LLM → SwiftUI carousel. Fully on-device.

bci mlx core-ml swiftui privacy-first muse-eeg on-device-llm

Updated May 24, 2026
Swift

ChaseDreamInfinity / voice-agent

Hands-free voice assistant for Android, fully on-device. Gemma 4 multimodal LLM via LiteRT-LM with optional ElevenLabs cloud TTS, smart-turn-classified barge-in, and a vision channel.

android kotlin tts gemma voice-assistant multimodal jetpack-compose elevenlabs on-device-llm litert-lm

Updated May 4, 2026
Kotlin

martinkorelic / mobiletransformers-docs

Documentation for MobileTransformers - a lightweight, modular framework based on ONNX Runtime for running and adapting large language models (LLMs) directly on mobile and edge devices. It supports on-device fine-tuning (PEFT), efficient inference, quantization, weight merging, and direct inference from merged models.

android mobile mars lora llm llm-finetuning on-device-llm

Updated Feb 14, 2026

avisre / gemma4-e2b-snapdragon-npu

Compile Gemma 4 E2B for Snapdragon v69 NPU (8 Gen 1). Pipeline scaffolding from 32-agent research synthesis.

qnn mobile-llm executorch on-device-llm gemma4 gemma-4-e2b snapdragon-8-gen-1 hexagon-v69 snapdragon-npu qualcomm-ai-hub

Updated May 23, 2026
Python

Improve this page

Add a description, image, and links to the on-device-llm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the on-device-llm topic, visit your repo's landing page and select "manage topics."