Powered by Dipamkara
With the rapid development of big data and artificial intelligence technologies, the demand for effective processing and retrieval of vector data is growing. Against this backdrop, I have developed the Bhakti vector database, aiming to provide a lightweight and easy-to-deploy solution to meet the storage and semantic search needs of small and medium-sized datasets. Bhakti supports a variety of similarity calculation methods and a domain-specific language (DSL) for document-based pattern matching pre-filtering, facilitating migration of data with its portable data files, flexible data management and seamless integration with Python3. Furthermore, I propose a memory-enhanced large language model dialogue solution based on the Bhakti database, which can assign different weights to the question and answer in dialogue history, achieving fine-grained control over the semantic importance of each segment in a single dialogue history. Through experimental validation, my method shows significant performance in the application of semantic search and question-answering systems. Although there are limitations in processing large datasets, such as not supporting approximate calculation methods like HNSW, the lightweight nature of Bhakti gives it a clear advantage in scenarios involving small and medium-sized datasets.
If you are incorporating Bhakti
into your research, please remember to properly cite it to acknowledge its contribution to your work.
如果您正在將 Bhakti
整合到您的研究中,請務必正確引用它,以聲明它對您工作的貢獻.
@article{wu2025bhakti,
author = {Zihao Wu},
title = {Bhakti: A Lightweight Vector Database Management System for Endowing Large Language Models with Semantic Search Capabilities and Memory},
journal = {arXiv preprint},
year = {2025},
eprint = {2504.01553},
archivePrefix = {arXiv},
primaryClass = {cs.DB},
url = {https://arxiv.org/abs/2504.01553}
}
-
From PYPI
pip install bhakti
-
From Github
Download .whl first then run
pip install ./bhakti-X.X.X-py3-none-any.whl
Before all, make sure you've successfully installed Bhakti :)
-
-
To begin, create a path for storing data
mkdir -p /path/to/db
-
-
Create configuration file (.yaml)
# bhakti.yaml DIMENSION: 1024 DB_PATH: /path/to/db DB_ENGINE: dipamkara # optional, default to dipamkara CACHED: false # optional, default to false HOST: 0.0.0.0 # optional, default to 0.0.0.0 PORT: 23860 # optional, default to 23860 EOF: <eof> # optional, default to <eof> TIMEOUT: 4.0 # optional, default to 4.0 seconds BUFFER_SIZE: 256 # optional, default to 256 bytes VERBOSE: false # optional, default to false
-
Run bhakti in shell
# bash bhakti ./bhakti.yaml
-
-
# main.py from bhakti import BhaktiServer from bhakti.database import DBEngine if __name__ == '__main__': bhakti_server = BhaktiServer( dimension=1024, # required, only vectors with 1024 dimensions are acceptable db_path='/path/to/db', # required, path where stores data, portable db_engine=DBEngine.DIPAMKARA, # optional, default to dipamkara cached=False, # optional, default to false host='0.0.0.0', # optional, default to 0.0.0.0 port=23860, # optional, default to 23860 eof=b'<eof>', # optional, default to b'<eof>' timeout=4.0, # optional, default to 4.0 seconds buffer_size=256, # optional, default to 256 bytes verbose=False # optional, default to false ) # run server bhakti_server.run()
-
-
Currently, Python(>=3.10) is supported
# main.py import asyncio import numpy as np from bhakti import BhaktiClient from bhakti.database import Metric from bhakti.database import DBEngine async def main(): client = BhaktiClient( server='127.0.0.1', # optional, default to 127.0.0.1 port=23860, # optional, default to 23860 eof=b'<eof>', # optional, default to b'<eof>' timeout=4.0, # optional, default to 4.0 seconds buffer_size=256, # optional, default to 256 bytes db_engine=DBEngine.DIPAMKARA, # optional, default to dipamkara verbose=False # optional, default to false ) vector = np.random.randn(1024) await client.create(vector=vector, document={'age': 31, 'gender': 'male'}) await client.create_index('age') await client.create_index('gender') results = await client.find_documents_by_vector_indexed( query='age <= 31 && gender != "female"', vector=vector, metric=Metric.EUCLIDEAN_Z_SCORE, top_k=3 ) print(results) if __name__ == '__main__': asyncio.run(main())