A production-ready Python package featuring three deeply integrated modules that automate the entire AI-data workflow for Retrieval-Augmented Generation (RAG) and hybrid search applications using MariaDB.
- Automates ingestion of structured data (CSV, database tables) into MariaDB for vector search.
- Column mapping: Specify which columns are used for embeddings and which are stored as JSON metadata.
- Embeddings: Uses HuggingFace models for semantic vector generation.
- MariaDB Table Creation: Automatically creates tables with VECTOR and JSON columns, and a VECTOR INDEX for fast search.
- LangChain-compatible vector store for MariaDB.
- Hybrid search: Combines semantic similarity (VEC_DISTANCE_COSINE) with structured JSON filtering in a single query.
- Efficient retrieval: Leverages MariaDB's unified data platform for scalable, precise search.
- Persistent chat memory for AI applications using MariaDB's JSON type.
- Simple API: Add and retrieve chat messages for any session.
- Fast and flexible: Ideal for conversational AI and state management.
- Install dependencies:
pip install -r requirements.txt
- Configure your MariaDB connection in
run_and_demo.py:DB_CONNECTION_DETAILS = { "host": "127.0.0.1", "port": 3306, "user": "root", "password": "your_password", "database": "mydb" }
- Run the demonstration:
python run_and_demo.py
- Ingestor Demo: Loads a sample OpenFlights CSV, mapping columns for embeddings and metadata.
- Hybrid Search Demo: Finds routes similar to a query, filtered by airline and stops.
- Chat Manager Demo: Simulates a chat session and retrieves the history.
mariadb-ai-toolkit/
├── mariadb_ai_toolkit/
│ ├── ingestor.py
│ ├── vectorstore.py
│ ├── chathistory.py
│ └── __init__.py
├── docs/
│ ├── ingestor.md
│ ├── vectorstore.md
│ └── chathistory.md
├── run_and_demo.py
├── requirements.txt
├── README.md
└── routes_demo.csv
- Python 3.8+
- MariaDB server (with VECTOR and JSON support)
- See
requirements.txtfor Python dependencies