-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MASA adapter usage with segmentation foundation models #31
Comments
Thanks for the question! “MASA adapter + Segmentation” means that we use SAM as the base detection model. SAM will output masks for every object in the scene and the MASA is responsible for associating them. Deva also requires a pre-trained model to provide instance masks for multiple object tracking and segmentation tasks. Figure 12 shows the comparison between MASA and Deva on BDD100K sequences using the same instance segmentation model(UNINEXT) to provide masks, then using masa and deva for association.
From: ederev ***@***.***>
Reply to: siyuanliii/masa ***@***.***>
Date: Thursday, 29 August 2024 at 14:37
To: siyuanliii/masa ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [siyuanliii/masa] MASA adapter usage with segmentation foundation models (Issue #31)
Hello! Thank you for your great and interesting work. I have a question regarding MASA adapter usage with segmentation models.
In your article it is stated that you have "designed a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects".
In provided demo script: demo/video_demo_with_text.py there is a support (except for unified model) for only detection + adapter usage and post-processing segmentation via SAM on already processed video with tracks. But it is quite not aligned with described design.
So my question is: could you please provide more details on usage MASA adapter + Segmentation ? Maybe how exactly should I use demo script (in case it's applicable) or code snapshot?
Did I catch the idea behind the inference correctly ref figure 3 (b) in https://arxiv.org/pdf/2406.04221 ?
Also it is not clear how on Figure 12. Qualitative Comparison between MASA and Deva is conducted in terms of models usage.
Thanks in advance.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#31", "url": "#31", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
|
Alright, thank you. I would appreciate any guidance on this issue. |
Thanks! "I should get segmentation masks first and after that convert them to bboxes to use as detection model." SAM is actually a prompt-driven model, it doesn't give you masks directly. The prompt can be bboxes, or points. Thus, the actual order is you get the bbox first as prompt then call SAM to give the mask. Those boxes can be tracked using MASA. MASA is a pure appearance model for the association, thus it is not intended to improve segmentation performance. |
Hello! Thank you for your great and interesting work. I have a question regarding MASA adapter usage with segmentation models.
In your article it is stated that you have "designed a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects".
In provided demo script: demo/video_demo_with_text.py there is a support (except for unified model) for only detection + adapter usage and post-processing segmentation via SAM on already processed video with tracks. But it is quite not aligned with described design.
So my question is: could you please provide more details on usage MASA adapter + Segmentation ? Maybe how exactly should I use demo script (in case it's applicable) or code snapshot?
Did I catch the idea behind the inference correctly ref figure 3 (b) in https://arxiv.org/pdf/2406.04221 ?
Also it is not clear how on Figure 12. Qualitative Comparison between MASA and Deva is conducted in terms of models usage.
Many thanks for considering my request.
The text was updated successfully, but these errors were encountered: