Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat - powered by ONNX Runtime.
-
Updated
Mar 27, 2026 - Kotlin
Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat - powered by ONNX Runtime.
Run a <400ms latency Voice Agent on just 4GB VRAM. Fully offline, no API keys required. Optimized for GTX 1650 and edge robotics with zero-copy inference. (Apache 2.0)
🚀 A powerful Flutter-based AI chat application that lets you run LLMs directly on your mobile device or connect to local model servers. Features offline model execution, Ollama/LLMStudio integration, and a beautiful modern UI. Privacy-focused, cross-platform, and fully open source.
Voice-powered productivity for Windows
🖼️ Python Image and 🎥 Video Generator using LLM providers and models — built with Claude Code 💻 CLI
Local LLM proxy, DevOps friendly
A framework for using local LLMs (Qwen2.5-coder 7B) that are fine-tuned using RL to generate, debug, and optimize code solutions through iterative refinement.
An advanced, fully local, and GPU-accelerated RAG pipeline. Features a sophisticated LLM-based preprocessing engine, state-of-the-art Parent Document Retriever with RAG Fusion, and a modular, Hydra-configurable architecture. Built with LangChain, Ollama, and ChromaDB for 100% private, high-performance document Q&A.
A fully customizable, super light-weight, cross-platform GenAI based Personal Assistant that can be run locally on your private hardware!
LLM Router is a service that can be deployed on‑premises or in the cloud. It adds a layer between any application and the LLM provider. In real time it controls traffic, distributes a load among providers of a specific LLM, and enables analysis of outgoing requests from a security perspective (masking, anonymization, prohibited content).
🤖 An Intelligent Chatbot: Powered by the locally hosted Ollama 3.2 LLM 🧠 and ChromaDB 🗂️, this chatbot offers semantic search 🔍, session-aware responses 🗨️, and an interactive Streamlit interface 🎨 for seamless user interaction. 🚀
An AI-powered assistant to streamline knowledge management, member discovery, and content generation across Telegram and Twitter, while ensuring privacy with local LLM deployment.
An autonomous AI agent for intelligently updating, maintaining, and curating a LightRAG knowledge base.
**Ask CLI** is a command-line tool for interacting with a local LLM (Large Language Model) server. It allows you to send queries and receive concise command-line responses.
This repository has code to securely run SLM (Small language models) locally using nodejs (servers side) or inside browser .
A lightweight frontend for LM Studio local server APIs. Built using React, Vite, and Tailwind CSS with full support for streaming responses and GitHub Flavored Markdown.
JV-Archon is my personal offline LLM ecosystem.
Python CLI/TUI for intelligent media file organization. Features atomic operations, rollback safety, and integrity checks, with a local LLM workflow for context-aware renaming and categorization from API-sourced metadata.
WoolyChat - open-source AI chat app for locally hosted Ollama models. Written in Flask/JavaScript.
PlantDeck is an offline herbal RAG that indexes your PDF books and monographs, extracts text/images with OCR, and answers questions with page-level citations using a local LLM via Ollama. Runs on your machine; no cloud. Field guide only; not medical advice.
Add a description, image, and links to the local-llm-integration topic page so that developers can more easily learn about it.
To associate your repository with the local-llm-integration topic, visit your repo's landing page and select "manage topics."