ai-core exposes a powerful, lightweight LLM stack (text‑generation, embeddings, multimodal vision) via a single self‑contained AAR.
The binary contains the JNI glue + a nativeai_core.sobuilt with llama.cpp and mtmd (multimodal) back‑end.
| Feature | Supported |
|---|---|
| Text‑generation | ✅ single‑threaded inference (CPU) via NativeLib |
| Text‑embeddings | ✅ EmbedLib – returns a FloatArray embedding vector |
| Multimodal vision | ✅ MtmdLib – image + text streaming generation |
| Streaming callbacks | ✅ IGenerationCallback (token, tool‑call, error, done) |
| State persistence | ✅ KV‑cache save / load |
| Speech‑to‑text (STT) | ✅ Sherpa‑ONNX AIDL service |
| Text‑to‑speech (TTS) | ✅ Sherpa‑ONNX TTS flow & API |
| Model swapping | ✅ ModelSwapper ensures only one native instance at a time |
| Configurable prompt / template | ✅ system prompt, chat template, tools JSON |
| Debug & diagnostics | ✅ llamaPrintTimings, modelInfo JSON |
| Background threads | ✅ Coroutines + Dispatcher.IO/Default for all heavy work |
⚠️ Currently GPU support is disabled (CPU‑only). Feel free to enableMIGraphXorMetalinai_core.cppif needed.
app/
├─ libs/
│ ├─ ai_core-1.0.0.aar # provided in `build-output/`
└─ build.gradle
// app/build.gradle
dependencies {
implementation(fileTree(dir: 'libs', include: ['*.aar']))
}Important:
Add the NDK path inlocal.properties
ndk.dir=/path/to/ndk
<uses-permission android:name="android.permission.MANAGE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
<uses-permission android:name="android.permission.POST_NOTIFICATIONS" />val lib = NativeLib.getInstance()
val ok = lib.init(
path = "/sdcard/models/llama-2-7B.gguf",
threads = 4,
ctxSize = 4096,
temp = 0.7f,
topK = 20,
topP = 0.9f,
minP = 0.0f
)
if (ok) {
lib.generateStreaming(
prompt = "Hello AI!",
maxTokens = 128,
callback = object : IGenerationCallback { ... }
)
}Embedding:
val embed = EmbedLib.getInstance()
val vec: FloatArray? = embed.encode("some text")Multimodal:
MtmdLib.getInstance().init("mmproj.bin", threads = 4)
val imgBytes = /* load PNG/JPEG */
MtmdLib.getInstance().nativeGenerateStreamWithImage(
prompt = "Describe image",
imageData = imgBytes,
imageWidth = 640,
imageHeight = 480,
maxTokens = 256,
callback = object : StreamCallback { ... }
)NativeLib.releaseAll()
EmbedLib.release()
MtmdLib.releaseInstance()sh scripts/build_llama.sh /path/to/llama.cpp
- Builds libllama.so, libggml.so, libggml-cpu.so, libggml-base.so, and libmtmd.so for
arm64-v8aandx86_64. - Output goes to
build-output/<abi>/bin/. - Requires
ANDROID_NDKenv‑var to be set.
| Path | Purpose |
|---|---|
chat/... |
Prompt / chat template rendering (chat_template.cpp). |
cpu/... |
CPU helper utilities; cpu_helper.cpp furnishes thread counters. |
global_state/... |
Singleton context (g_state) holding the LLM model / tokenizer. |
state/... |
ModelState implementation, tokenisation, detokenisation, cache. |
tool_calling/... |
Ragged tool‑call parser (tool_call_state.cpp). |
utils/... |
jni_utils.cpp (callbacks), logger.h, utf8_utils.cpp (UTF‑8 conversions). |
app/
├─ src/main/AndroidManifest.xml
├─ src/main/java/com/mp/ai_core/MainActivity.kt
├─ src/main/java/com/mp/ai_core/text/GenerationService.kt
├─ src/main/java/com/mp/ai_core/ModelSwapper.kt
├─ src/main/java/com/mp/ai_core/stt/ (Sherpa STT)
├─ src/main/java/com/mp/ai_core/tts/ (Sherpa TTS)
└─ build.gradle
Functionality
- Demonstrates how to bind the
GenerationService(foreground service) and call LLM APIs from UI. - Shows embedding, GET CHUNK generation, multi‑modal, STT / TTS usage.
- Uses
ViewModel + Composefor UI; all heavy work is onDispatchers.IO.
Build
implementation(fileTree(dir: 'libs', include: ['*.aar']))app/src/main/java/com/mp/ai_core/stt/
├─ SherpaSTTManager.kt (Singleton manager)
├─ SherpaSTTService.kt (AIDL service, runs in :stt process)
└─ SherpaSTTClient.kt (Client wrapper)
- Uses Sherpa‑ONNX (offline) for fast voice recognition.
- Exposes a remote AIDL service (
ISherpaSTTService) – preferable for memory‑heavy models. - Clients bind to the service via
SherpaSTTClient. - Thread‑safe init, transcribe file/samples, release.
app/src/main/java/com/mp/ai_core/tts/
├─ ITtsService.kt
├─ TtsEngine.kt (Sherpa‑ONNX implementation)
├─ TtsServiceFactory.kt
└─ (AIDL service optional)
ITtsServicecontract:initialize,generateAudioStream,stop,release.- TTS samples streamed as
Flow<AudioChunk>. - Thread‑safe hot‑reinitialization.
- Compile via
gradlew assembleRelease. - Take
ai_core-1.0.0.aarfrombuild/libs. - Include
libs/folder in your Android project or host on JCenter/GitHub Packages. - Add
implementation(fileTree(dir: 'libs', include: ['*.aar'])).
// Text generation
NativeLib.init(...)
NativeLib.generateStreaming(...)
// Embedding
EmbedLib.getInstance().encode(...)
// Multimodal
MtmdLib.getInstance().init(...)
MtmdLib.getInstance().nativeGenerateStreamWithImage(...)
// STT (via AIDL)
SherpaSTTClient(...)
// TTS (via ITtsService)
TtsServiceFactory.createTtsService()All public methods are suspend where necessary or return Callback interfaces.
ai-core/
├─ src/main/cpp/src/
│ ├─ chat/
│ ├─ cpu/
│ ├─ global_state/
│ ├─ state/
│ ├─ tool_calling/
│ └─ utils/
├─ src/main/java/com/mp/ai_core/
│ ├─ text/
│ ├─ stt/
│ ├─ tts/
│ └─ helpers/
├─ CMakeLists.txt
├─ build_llama.sh
└─ README.md
⭐️ Happy coding – the library is designed to be plug‑and‑play. Use the README as your “starter kit”.
