Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MASA adapter usage with segmentation foundation models #31

Open
ederev opened this issue Aug 29, 2024 · 3 comments
Open

MASA adapter usage with segmentation foundation models #31

ederev opened this issue Aug 29, 2024 · 3 comments

Comments

@ederev
Copy link

ederev commented Aug 29, 2024

Hello! Thank you for your great and interesting work. I have a question regarding MASA adapter usage with segmentation models.
In your article it is stated that you have "designed a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects".
In provided demo script: demo/video_demo_with_text.py there is a support (except for unified model) for only detection + adapter usage and post-processing segmentation via SAM on already processed video with tracks. But it is quite not aligned with described design.
So my question is: could you please provide more details on usage MASA adapter + Segmentation ? Maybe how exactly should I use demo script (in case it's applicable) or code snapshot?
Did I catch the idea behind the inference correctly ref figure 3 (b) in https://arxiv.org/pdf/2406.04221 ?

Also it is not clear how on Figure 12. Qualitative Comparison between MASA and Deva is conducted in terms of models usage.

Many thanks for considering my request.

@siyuanliii
Copy link
Owner

siyuanliii commented Aug 30, 2024 via email

@ederev
Copy link
Author

ederev commented Aug 30, 2024

Alright, thank you.
As far as I know, DEVA used SAM masks directly, but in MASA adapter there is no direct usage of segmentation masks, only detection bboxes. Thus, if I understood correctly, that in your current implementation of MASA adapter for segmentation I should get segmentation masks first and after that convert them to bboxes to use as detection model. Am I right? But in this case it's not clear how it helps to improve segmentation (except for id assotiation for masks inscribed into bboxes) , if we do the same bbox tracking.

I would appreciate any guidance on this issue.

@siyuanliii
Copy link
Owner

Thanks! "I should get segmentation masks first and after that convert them to bboxes to use as detection model." SAM is actually a prompt-driven model, it doesn't give you masks directly. The prompt can be bboxes, or points. Thus, the actual order is you get the bbox first as prompt then call SAM to give the mask. Those boxes can be tracked using MASA. MASA is a pure appearance model for the association, thus it is not intended to improve segmentation performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants