Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multimodal RAG (Text + Image) for Retrieval-Augmented Generation Using Llama #64

Open
Tracked by #43
MayankChaturvedi opened this issue Sep 30, 2024 · 7 comments · May be fixed by #72
Open
Tracked by #43

Add Multimodal RAG (Text + Image) for Retrieval-Augmented Generation Using Llama #64

MayankChaturvedi opened this issue Sep 30, 2024 · 7 comments · May be fixed by #72

Comments

@MayankChaturvedi
Copy link

A notebook that demonstrates how to use a multimodal RAG that combines two types of inputs, such as text and images, to retrieve relevant information from a dataset and generate new outputs based on the retrieved data.

Example
Input: Takes a text query along with an image (e.g., "Which fruit is this?")
Retrieval: Uses the image and the text to retrieve relevant documents or facts from a knowledge base or external dataset (e.g., Wikipedia articles on animals).
Generation: The system generates a coherent response based on the retrieved information (e.g., "This is a blueberry!").

@neural-navigator
Copy link

I would love to contribute to this issue @MayankChaturvedi

@ariG23498
Copy link
Collaborator

ariG23498 commented Oct 1, 2024

I love the idea!

So to make it even more clear:

  1. Use a multimodal (image and text pair) dataset from the Hugging Face Hub
  2. Embed the dataset using an embedding model
  3. RAG with multimodal Llama

If that is the workflow, I think we should go forward. Also, it would be great to not make this very complicated. We would like to see a very simple notebook that does what it needs to while not making it too complicated.

PS: I have added this issue to the main issue #43

@silvererudite
Copy link

hello @MayankChaturvedi as proposed by @ariG23498 in this issue #55 I would love to contribute to this work as well. Let me know if you want to divide/collab in any subtask for this.

@atharv-jiwane
Copy link
Contributor

Hey @ariG23498 I was redirected to #47 , thank you for that! I would also love to join this team @MayankChaturvedi . Please let me know if there is space for collaboration here too!

@MayankChaturvedi
Copy link
Author

Hi folks, thanks for your interest in the issue. We need a simple notebook. I will create a branch so that three of us can collaborate on it. Meanwhile I'll also come up with a distribution of tasks.
Let's collaborate on a discord group? - https://discord.gg/rhbqXsyX
@ariG23498 does this setup sound good?

@ariG23498
Copy link
Collaborator

@MayankChaturvedi the collaboration sounds great!

Let me know if you folks need help -- the best way of reaching me is this issue. It would be open for others to view and learn 🤗

@renuka010
Copy link

Hi @MayankChaturvedi I would love to collaborate on this issue. Let me know if I can contribute to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants