Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support llava onevision #2783

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

deepindeed2022
Copy link
Contributor

@deepindeed2022 deepindeed2022 commented Nov 21, 2024

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Support llava-onevision with LMDeploy

Modification

  • support llava-onevision series
  • refactor the llava related code for device_map
  • refactor llava_next.py with transformers new features

Use cases (Optional)

git clone https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov
lmdeploy serve api_server llava-onevision-qwen2-7b-ov

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@deepindeed2022 deepindeed2022 changed the title Support llava onevision [Feature] Support llava onevision Nov 21, 2024
@lvhan028 lvhan028 added the enhancement New feature or request label Nov 21, 2024
@deepindeed2022 deepindeed2022 force-pushed the support_llava_onevision branch from c15538a to 948fffe Compare November 25, 2024 07:20
@lvhan028
Copy link
Collaborator

Hi, @deepindeed2022 thanks for your contribution to LMDeploy
We are currently refactoring VLM modules in PR #2810, which conflicts to this PR.
I'd like to push the review of #2810 at first. After it is merged, I'll come back to this PR and resolve the conflicts.
Does it work for you?

@deepindeed2022
Copy link
Contributor Author

Hi, @deepindeed2022 thanks for your contribution to LMDeploy We are currently refactoring VLM modules in PR #2810, which conflicts to this PR. I'd like to push the review of #2810 at first. After it is merged, I'll come back to this PR and resolve the conflicts. Does it work for you?

Yes, I 'll follow up after this PR #2810 merge.

@lvhan028
Copy link
Collaborator

lvhan028 commented Dec 13, 2024

Hi, @deepindeed2022
It has been a while.
How are you doing?
We finally merged the VLM refactoring PR #2810
Would it be possible for you to keep working on this PR?

    - update llava doc
    - fix multi-gpus load issue
    - support llava onevision qwen
@deepindeed2022 deepindeed2022 force-pushed the support_llava_onevision branch from 948fffe to 7b5a83d Compare December 24, 2024 11:51
@deepindeed2022
Copy link
Contributor Author

@lvhan028 the pr_ete_test failed with GPU resources. How to confirm CUDA_VISIBLE_DEVICES=5 is free or how to retry?

@RunningLeon
Copy link
Collaborator

@lvhan028 the pr_ete_test failed with GPU resources. How to confirm CUDA_VISIBLE_DEVICES=5 is free or how to retry?

@deepindeed2022 hi, there's an issue with pr_ete_test and the fixing pr will come soon. Pls ignore at the moment.

from transformers import AutoConfig, AutoModelForCausalLM

from lmdeploy.utils import get_logger
from lmdeploy.vl.model.llava_hf import VISION_MODELS, LlavaHfVisionModel
from lmdeploy.vl.model.utils import disable_logging, rewrite_ctx
from lmdeploy.vl.model.utils import (disable_logging,
get_vision_encoder_device_map,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_vision_encoder_device_map is shared by LlavaHfVisionModel, LlavaVisionModel and LlavaNextVisionModel
I think it's better to put this function in the base class LlavaHfVisionModel

README.md Show resolved Hide resolved
return device_map

for keys in same_device_keys:
fuzzy_keys = [kk for kk in device_map for k in keys if kk.find(k)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuzzy_keys = [kk for kk in device_map for k in keys if kk.find(k) != -1] ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after changing it to

fuzzy_keys = [kk for kk in device_map for k in keys if k in kk]

it suffers RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:2!

I will fix it later on

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested ok on llavaonvision, llava, llava-hf

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deepindeed2022
Copy link
Contributor Author

Is there any problem with this PR ? When is it expected to be used directly in the official code repository ?

@lvhan028
Copy link
Collaborator

Hi, @deepindeed2022
It hasn't passed our test yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants