A full-stack AI application that replicates a specific user's personality and voice. The system utilizes a fine-tuned Llama-3 8B model for text generation and the ElevenLabs API for low-latency voice synthesis, delivered via a Streamlit interface.
This project demonstrates the end-to-end pipeline of creating a persona LLM: from raw data extraction (iMessage), to cleaning/structuring, fine-tuning (QLoRA), and deployment via a RAG-enhanced app.
- Fine-Tuned LLM: Utilized Unsloth (QLoRA) to fine-tune Llama-3-8B on 50k+ private text messages, capturing specific slang, sentence structure, and personality quirks.
- Voice Synthesis: Integrated ElevenLabs for realistic text-to-speech generation with custom voice cloning (Optional).
- RAG Pipeline: Implemented a context-aware system prompt utilizing "Emotional Mirroring" to adapt tone based on the partner's input.
- Local Inference: Optimized for local GPU execution using
llama.cpp(GGUF format) for privacy and speed. - Remote Deployment: Includes a custom Colab notebook with TCP Tunneling support.
- Data Engineering:
- Extraction: Custom Python scripts (
sqlite3) to extract raw messages from local iMessagechat.db. - Cleaning: Regex pipelines to strip system messages ("Loved an image") and group conversations.
- Extraction: Custom Python scripts (
- Model Training:
- Framework: Unsloth (PyTorch) + Hugging Face TRL.
- Technique: QLoRA (4-bit quantization) on a Tesla T4 GPU.
- Notebook: See
notebooks/fine_tuning_pipeline.ipynb.
- Deployment:
- Backend:
llama-cpp-pythonfor GGUF inference. - Frontend: Streamlit for chat interface and audio playback.
- Backend:
Follow this pipeline if you want to create a model trained on your own text messages.
To preface, I trained my persona on my own texts and messages from my own Apple phone, therefore this method would only work on macOS. I'm positive it would be much easier to use other platforms, but this is just how I did it.
Use the included script to pull messages from your local chat.db.
- Open
scripts/export_messages.py. - Update
TARGET_HANDLE_IDwith the phone number/email of the person you text the most. - Run the script:
Output:
python scripts/export_messages.py
data/grouped_training_data.jsonl
Raw texts are messy. This script removes "Tapback" reactions and weird formatting, preparing it for the LLM.
- Run the cleaning script:
Output:
python scripts/clean_chats.py
data/clean_training_data.jsonl
Format Example: The script produces a JSONL file compatible with Alpaca/Unsloth:
{"instruction": "Hey, how was your day?", "input": "", "output": "It was pretty good. Just grounded out some code. Hbu?"}-
Open
notebooks/fine_tuning_pipeline.ipynbin Google Colab. -
Upload your
clean_training_data.jsonlto the session. -
Run the notebook. It uses Unsloth to fine-tune Llama-3 8B (2x faster, 60% less memory).
-
Export: At the end of the notebook, ensure you run the cell to save as GGUF (q4_k_m).
If the automatic export crashes Colab (common with memory limits), use this manual script in a new Colab cell to build llama.cpp and convert the model yourself:
import os
# 1. Clone & Build llama.cpp
if not os.path.exists("/content/llama.cpp"):
!git clone [https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) /content/llama.cpp
%cd /content/llama.cpp
!pip install -r requirements.txt
!cmake -B build
!cmake --build build --config Release -j 4
# 2. Convert & Quantize
print("Converting...")
!python convert_hf_to_gguf.py /content/model --outfile /content/temp.gguf --outtype f16
print("Quantizing...")
!./build/bin/llama-quantize /content/temp.gguf /content/final_model.gguf q4_k_m
print("✅ Done! Download 'final_model.gguf'")Once you have a model (or if you want to skip training and use a generic one), follow these steps.
Prerequisites: A computer with a GPU (NVIDIA or Mac M-series) is highly recommended.
git clone [https://ghttps://github.com/micccon/Persona-AI-Clone.git](https://github.com/micccon/Persona-AI-Clone.git)
cd Persona-AI-Clonepip install -r requirements.txt-
If you trained your own: Move your
final_model.gguf(from Part 1) intomodels/. -
If skipping training: Download a generic Llama-3 model:
-
Action: Save it as models/
default_model.gguf.
Create a .env file in the root directory (copy from .env.example) and fill in your keys.
-
ELEVENLABS_API_KEY: (Optional) Required for voice mode. -
VOICE_ID: (Optional) The ID of the voice you want to clone. -
MODEL_PATH: Path to your model (e.g.,models/default_model.gguf).
streamlit run app.pyIf you don't have a strong GPU, use the provided deployment notebook. This method allows you to run the app in the cloud and access it via a secure tunnel.
- Open
notebooks/colab_deployment.ipynbin Google Colab. - Run the Cells: The notebook will guide you through the setup.
- Cloud Setup: It automates cloning the repo and installing dependencies.
- Model Selection: You will be prompted to either:
- Use a generic Llama-3 demo model (default).
- OR provide a direct download link to your own fine-tuned
.ggufmodel if you have trained one (e.g., from Hugging Face).
- Launch: It establishes a TCP Tunnel via Ngrok so you can interact with the app from your browser.
(Note: You will need a free Ngrok Authtoken for the tunnel).
Finally! Click the tcp://... link provided in the output to use the app!