-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Multimodal RAG (Text + Image) for Retrieval-Augmented Generation Using Llama #64
Comments
I would love to contribute to this issue @MayankChaturvedi |
I love the idea! So to make it even more clear:
If that is the workflow, I think we should go forward. Also, it would be great to not make this very complicated. We would like to see a very simple notebook that does what it needs to while not making it too complicated. PS: I have added this issue to the main issue #43 |
hello @MayankChaturvedi as proposed by @ariG23498 in this issue #55 I would love to contribute to this work as well. Let me know if you want to divide/collab in any subtask for this. |
Hey @ariG23498 I was redirected to #47 , thank you for that! I would also love to join this team @MayankChaturvedi . Please let me know if there is space for collaboration here too! |
Hi folks, thanks for your interest in the issue. We need a simple notebook. I will create a branch so that three of us can collaborate on it. Meanwhile I'll also come up with a distribution of tasks. |
@MayankChaturvedi the collaboration sounds great! Let me know if you folks need help -- the best way of reaching me is this issue. It would be open for others to view and learn 🤗 |
Hi @MayankChaturvedi I would love to collaborate on this issue. Let me know if I can contribute to this issue. |
A notebook that demonstrates how to use a multimodal RAG that combines two types of inputs, such as text and images, to retrieve relevant information from a dataset and generate new outputs based on the retrieved data.
Example
Input: Takes a text query along with an image (e.g., "Which fruit is this?")
Retrieval: Uses the image and the text to retrieve relevant documents or facts from a knowledge base or external dataset (e.g., Wikipedia articles on animals).
Generation: The system generates a coherent response based on the retrieved information (e.g., "This is a blueberry!").
The text was updated successfully, but these errors were encountered: