Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Qwen-2-VL Support #3125

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

nihalgeorge01
Copy link

This PR adds support for the Qwen-2-VL (vision-language) model

@nihalgeorge01 nihalgeorge01 changed the title [MODEL] Qwen-2-VL Support [Model] Qwen-2-VL Support Feb 10, 2025
@buqimaolvshangxue
Copy link

qwen2_vl can work by this commit ? @nihalgeorge01 , i also has the needs to support qwen2_vl

@nihalgeorge01
Copy link
Author

Not yet, we are fixing some bugs in the code locally. Working on pushing this out soon

@buqimaolvshangxue
Copy link

Thank you very much for your work! When I was thinking about this problem, I found that when processing the llava model in mlc, the text embedding and the image embedding are directly spliced ​​together to get the final embedding. But it seems that in the approach of qwen2_vl in vllm, the image embedding replaces certain specific positions in the expanded embedding. I wonder if the direct splicing method of llava is feasible? But if the splicing embedding method is not adopted, it seems that the public interface function needs to be modified. @nihalgeorge01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants