Skip to content

Commit

Permalink
Merge branch 'master' of github.com:gaomingqi/Track-Anything
Browse files Browse the repository at this point in the history
  • Loading branch information
memoryunreal committed May 6, 2023
2 parents 0b52bd7 + 3aa5f67 commit 322ae43
Show file tree
Hide file tree
Showing 12 changed files with 102 additions and 54 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,12 @@
<a src="https://img.shields.io/badge/%F0%9F%A4%97-Open_in_Spaces-informational.svg?style=flat-square" href="https://huggingface.co/spaces/watchtowerss/Track-Anything?duplicate=true">
<img src="https://img.shields.io/badge/%F0%9F%A4%97-Hugging_Face_Space-informational.svg?style=flat-square">
</a>
<a src="https://img.shields.io/badge/%F0%9F%97%BA-Tutorials in Steps-2e8b57.svg?style=flat-square" href="./doc/tutorials.md">
<img src="https://img.shields.io/badge/%F0%9F%97%BA-Tutorials in Steps-2e8b57.svg?style=flat-square">
<a src="https://img.shields.io/badge/%F0%9F%97%BA-Tutorials in Steps-2bb7b3.svg?style=flat-square" href="./doc/tutorials.md">
<img src="https://img.shields.io/badge/%F0%9F%97%BA-Tutorials in Steps-2bb7b3.svg?style=flat-square">

</a>
<a src="https://img.shields.io/badge/%F0%9F%9A%80-SUSTech_VIP_Lab-important.svg?style=flat-square" href="https://zhengfenglab.com/">
<img src="https://img.shields.io/badge/%F0%9F%9A%80-SUSTech_VIP_Lab-important.svg?style=flat-square">
<a src="https://img.shields.io/badge/%F0%9F%9A%80-SUSTech_VIP_Lab-ed6c00.svg?style=flat-square" href="https://zhengfenglab.com/">
<img src="https://img.shields.io/badge/%F0%9F%9A%80-SUSTech_VIP_Lab-ed6c00.svg?style=flat-square">
</a>
</div>

Expand All @@ -38,11 +39,11 @@

- 2023/04/25: We are delighted to introduce [Caption-Anything](https://github.com/ttengwang/Caption-Anything) :writing_hand:, an inventive project from our lab that combines the capabilities of Segment Anything, Visual Captioning, and ChatGPT.

- 2023/04/20: We deployed [[DEMO]](https://huggingface.co/spaces/watchtowerss/Track-Anything?duplicate=trueg) on Hugging Face :hugs:!
- 2023/04/20: We deployed [DEMO](https://huggingface.co/spaces/watchtowerss/Track-Anything?duplicate=trueg) on Hugging Face :hugs:!

- 2023/04/14: We made Track-Anything public!

## :world_map: Video Tutorials ([Try Track-Anything in Steps](./doc/tutorials.md))
## :world_map: Video Tutorials ([Track-Anything Tutorials in Steps](./doc/tutorials.md))

https://user-images.githubusercontent.com/30309970/234902447-a4c59718-fcfe-443a-bd18-2f3f775cfc13.mp4

Expand Down
10 changes: 5 additions & 5 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,23 +448,23 @@ def generate_video_from_frames(frames, output_path, fps=30):
point_prompt = gr.Radio(
choices=["Positive", "Negative"],
value="Positive",
label="Point Prompt",
label="Point prompt",
interactive=True,
visible=False)
remove_mask_button = gr.Button(value="Remove mask", interactive=True, visible=False)
clear_button_click = gr.Button(value="Clear Clicks", interactive=True, visible=False).style(height=160)
clear_button_click = gr.Button(value="Clear clicks", interactive=True, visible=False).style(height=160)
Add_mask_button = gr.Button(value="Add mask", interactive=True, visible=False)
template_frame = gr.Image(type="pil",interactive=True, elem_id="template_frame", visible=False).style(height=360)
image_selection_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Image Selection", visible=False)
track_pause_number_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Track end frames", visible=False)
image_selection_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Track start frame", visible=False)
track_pause_number_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Track end frame", visible=False)

with gr.Column():
run_status = gr.HighlightedText(value=[("Text","Error"),("to be","Label 2"),("highlighted","Label 3")], visible=False)
mask_dropdown = gr.Dropdown(multiselect=True, value=[], label="Mask selection", info=".", visible=False)
video_output = gr.Video(autosize=True, visible=False).style(height=360)
with gr.Row():
tracking_video_predict_button = gr.Button(value="Tracking", visible=False)
inpaint_video_predict_button = gr.Button(value="Inpaint", visible=False)
inpaint_video_predict_button = gr.Button(value="Inpainting", visible=False)

# first step: get the video information
extract_frames_button.click(
Expand Down
Binary file modified doc/tutorial_imgs/2-3-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified doc/tutorial_imgs/3-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/tutorial_imgs/4-0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/tutorial_imgs/4-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/tutorial_imgs/4-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified doc/tutorial_imgs/4-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/tutorial_imgs/4-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/tutorial_imgs/5-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified doc/tutorial_imgs/tracking-preparation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
133 changes: 90 additions & 43 deletions doc/tutorials.md
Original file line number Diff line number Diff line change
@@ -1,104 +1,151 @@

<style>
.img_containter img{max-width:1000px}
</style>

## Welcome to Track-Anything Tutorials

Here we illustrate how to use Track-Anything as an interactive tool to segment, track, and inpaint anything in videos.

In the current version, Track-Anything works under a linear procedure of :one: [video selection](#step1), :two: [tracking preparation](#step2), :three: [tracking](#step3), and :four: [inpainting](#step4).
In the current version, Track-Anything works under a linear procedure of :one: [video selection](#step1), :two: [tracking preparation](#step2), :three: [tracking](#step3), :four: [correction](#step4) (optional), and :five: [inpainting](#step5).

---

### <span id="step1">1 Video Selection</span>
When starting Track-Anything, the panel looks like:

<div align=center>
<img src="./tutorial_imgs/video-selection.png" width="93%"/>
<div align=center class="img_containter">
<img src="./tutorial_imgs/video-selection.png" width="90%"/>
</div>

**Recommended steps in this stage**:
**1-1**. Select one video from your local space or examples.

**1-1**. Select one video from your local computer or examples.

**1-2**. Click "***Get video info***" to unlock other controllers.

---

### <span id="step2">2 Tracking Preparation</span>
After video selection, all controllers are unlocked and the panel looks like:
After "***Get video info***", all controllers are unlocked and the panel looks like:

<div align=center>
<img src="./tutorial_imgs/tracking-preparation.png" width="93%"/>
<div align=center class="img_containter">
<img src="./tutorial_imgs/tracking-preparation.png" width="91%"/>
</div>

**Recommended steps in this stage**:

**2-1**. Select ***Track End Frame*** (the last frame by default), via sliders (rough selection) and tunning buttons (precise selection).
**2-2**. Select ***Track Start Frame*** (***Image Selection***, the first frame by default) to add masks, via sliders (rough selection) and tunning buttons (precise selection).

<div align=center>
<img src="./tutorial_imgs/2-1.png" width="69%"/>
</div>
**2-1**. Select ***Track end frame*** (last frame by default).

- **Note**: Typing indices is also supported, but after typing, click somewhere on the panel (besides image and video part) to refresh the shown frame.
**2-2**. Select ***Track start frame*** (first frame by default).
- **Note**: Follow the order of 2-1, 2-2 to make sure the image shown is the start frame.

**2-3**. Select one object/region on the ***Track Start Frame***, via adding positive / negative points:
**2-3**. Add one mask on the ***Track start frame***, via clicking positive / negative points:

- **2-3-1**. Add one POSITIVE point on the target region. After this, one mask presents:
- **2-3-1**. Click one POSITIVE point on the target region. After this, one mask presents:

<div align=center>
<img src="./tutorial_imgs/2-3-1.png" width="99%"/>
<div align=center class="img_containter">
<img src="./tutorial_imgs/2-3-1.png" width="91%"/>
</div>

- **2-3-2**. If mask looks good, go to step 2-3-5. If not, go to step 2-3-3.

- **2-3-3**. If mask does not fully cover the target region, add one POSITIVE point on the lack part. In contrast, if mask covers the background, add one NEGATIVE point on the overcovered background. After adding pos/neg point, the mask is updated:
- **2-3-3**. If mask does not fully cover the target, click one POSITIVE point on the lack part. In contrast, if mask covers the background, click one NEGATIVE point on the overcovered region. After clicking POS/NEG point, the mask is updated:

<div align=center>
<img src="./tutorial_imgs/2-3-3-1.png" width="99%"/>
<div align=center class="img_containter">
<img src="./tutorial_imgs/2-3-3-1.png" width="91%"/>
</div>

<div align=center>
<img src="./tutorial_imgs/2-3-3-2.png" width="99%"/>
<div align=center class="img_containter">
<img src="./tutorial_imgs/2-3-3-2.png" width="91%"/>
</div>

- **2-3-4**. If mask looks good, go to step 2-3-5. If not, go to step 2-3-3.

- **2-3-5**. Click "***Add Mask***".
- **2-3-5**. Click "***Add mask***".

- **Note**: If mask cannot be refined after many adds, click "***Clear Clicks***" to restart from step 2-3-1.
- **Note**: If mask cannot be refined after many clicks, click "***Clear clicks***" to restart from step 2-3-1.

- **Note**: After each "***Add Mask***", one item appears on the Dropdown List below, more operations about this controller is given in [Tracking](#step3):
- **Note**: After each "***Add mask***", one item appears on the Dropdown List below:


<div align=center>
<img src="./tutorial_imgs/2-3-5.png" width="99%"/>
<div align=center class="img_containter">
<img src="./tutorial_imgs/2-3-5.png" width="91%"/>
</div>

- **Note**: Click "***Remove Mask***" to remove all masks from the list.
- **Note**: All masks can be removed by clicking "***Remove Mask***".

**2-3**. If add another mask, go to 2-2. If not, go to [Tracking](#step3).

**2-3**. If add another object/region, go to 2-2. If not, go to [Tracking](#step3).

**Note**: ALL masks have to be added on the ***Track start frame*** only.

**Note**: ALL masks have to be added on the ***Track Start Frame*** only.
---

### <span id="step3">3 Tracking</span>

Track-Anything only tracks the objects shown in the Dropdown List.
Track-Anything only tracks object masks shown in the Dropdown List.

**Recommended steps in this stage**:

**3-1**. Confirm the objects on the list.
**3-1**. Confirm object masks on the list.

**3-2**. Click "***Tracking***".

After step 3-2, tracking is performed (for seconds or minutes, depending on video resolution and length), and results will be shown on the right video panel:

<div align=center>
<img src="./tutorial_imgs/3-2.png" width="99%"/>
<div align=center class="img_containter">
<img src="./tutorial_imgs/3-2.png" width="91%"/>
</div>

---

### <span id="step4">4 Correction</span>

This stage is optional and recommended when tracking results degrade. (For example, degradation due to shot changes, occlusions):

<div align=center class="img_containter">
<img src="./tutorial_imgs/4-0.png" width="91%"/>
</div>

### <span id="step4">4 Inpainting</span>
**Recommended steps in this stage**:

**4-1**. Find the frame where degradation begins, and set the frame as "***Track start frame***".

<div align=center class="img_containter">
<img src="./tutorial_imgs/4-1.png" width="66%"/>
</div>

**4-2**. Click "***Remove mask***" to clear previous tracking results from "***Track start frame***".

<div align=center class="img_containter">
<img src="./tutorial_imgs/4-2.png" width="91%"/>
</div>

**4-3**. Re-Add masks as in Step 2-3 on the "***Track start frame***".

<div align=center class="img_containter">
<img src="./tutorial_imgs/4-3.png" width="91%"/>
</div>

**4-4**. Click "***Tracking***".

<div align=center class="img_containter">
<img src="./tutorial_imgs/4-4.png" width="91%"/>
</div>

**Note**: Tracking between start and end frames provides a flexible method to address degradation.

---

### <span id="step5">5 Inpainting</span>

Track-Anything only "removes" the tracked objects from the input video.

**Recommended steps in this stage**:

**4-1**. Complete steps 3-1 and 3-2 to get tracking results.
**5-1**. Get tracking results.

**4-2**. Select "***Resize Ratio***" to down-scale the video.
**5-2**. Select "***Resize Ratio***" to down-scale the video.
- **Why down-scale?** Unlike tracking, inpainting cost much more GPU memory. Down-scale can effectively avoid Out-Of-Memory (OOM) error. The estimated GPU memory requirements are as below:

|Resolution|50 frames|100 frames|1000 frames|
Expand All @@ -110,10 +157,10 @@ Track-Anything only "removes" the tracked objects from the input video.
|320 x 240|4GB|4.5GB|4.5GB|
|160 x 120|2.5GB|3GB|3GB|

**4-3**. Click "***Inpainting***".
**5-3**. Click "***Inpainting***".

After step 4-3, inpainting is performed (for seconds or minutes, depending on video resolution and length), and results will be shown on the panel below:
After step 5-3, inpainting is performed (for seconds or minutes, depending on video resolution and length), and results will be shown on the panel below:

<div align=center>
<img src="./tutorial_imgs/4-3.png" width="99%"/>
<div align=center class="img_containter">
<img src="./tutorial_imgs/5-3.png" width="91%"/>
</div>

0 comments on commit 322ae43

Please sign in to comment.