Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you provide a documentation? How to call this work? How to use this work in your own python program? Thank you! #10

Open
willson113 opened this issue Aug 23, 2024 · 4 comments

Comments

@willson113
Copy link

我有一个数据集含一万余条问答数据集(以维基百科为背景知识库来构建的),想使用您的工作来做做RAG,首先一个问题通过检索维基百科,反馈相关的段落内容,然后提供给不同的大模型(我要做的实验大模型有chatglm llama glm 文心),您的工作可以实现吗?如何来处理呢?期待您的回复,谢谢啦!

@willson113
Copy link
Author

I have a dataset containing more than 10,000 questions and answers (built with Wikipedia as the background knowledge base). I want to use your work to do RAG. First, a question is retrieved from Wikipedia, and the relevant paragraph content is fed back, and then provided to different large models (the experimental large models I want to do are chatglm llama glm Wenxin). Can your work be implemented? How to deal with it? Looking forward to your reply, thank you!

@ytkimirti
Copy link
Collaborator

We use our own RAG Chat package in this repo which does most of the heavy-lifting for us when doing RAG. You can look at the examples in the RAG Chat repository and this project as a reference.

From what I understand, you just need to upsert your questions and answers as context to the vector database, that should be enough.

@willson113
Copy link
Author

We use our own RAG Chat package in this repo which does most of the heavy-lifting for us when doing RAG. You can look at the examples in the RAG Chat repository and this project as a reference.

From what I understand, you just need to upsert your questions and answers as context to the vector database, that should be enough.

Chinese Wikipedia is a large-scale data. Can the free vector library provided by Upstash store so much data? Can we access the 1.44 vectors you have vectorized? I am just learning and using it locally. I am not doing any other commercial things.

@ytkimirti
Copy link
Collaborator

We use a single Upstash Vector database for this project. For technical details, you can refer to this blog post. You should be able to implement a similar setup; however, given the large dataset, the free tier may not be sufficient. You can review the pricing page for more information on the available plans.

Please note that we are using this index exclusively for this project and currently do not plan to make it publicly accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants