-
Notifications
You must be signed in to change notification settings - Fork 286
Add video_object_segmenting_mapper and video_depth_estimation_mapper #801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Qirui-jiao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the video processing capabilities of the data_juicer library by introducing two advanced mappers. The video_object_segmenting_mapper enables precise, text-guided identification and segmentation of objects across video sequences, while the video_depth_estimation_mapper provides detailed depth information for video frames, including the generation of 3D point clouds. These additions empower users to extract richer, more granular metadata from video content, facilitating more sophisticated downstream tasks in multimodal data processing. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two new video analysis operators: video_object_segmenting_mapper and video_depth_estimation_mapper. The video_object_segmenting_mapper performs text-guided semantic segmentation, while the video_depth_estimation_mapper estimates depth in videos. The changes include adding the new mapper files, updating __init__.py to include the new mappers, modifying constant.py to add meta keys for the new features, and updating model_utils.py to include the video depth estimation model. Additionally, test files for both mappers and updates to the Operators.md documentation are included.
|
Update (November 4):
|
Add two video analysis ops, including:
video_object_segmenting_mapper: Performs text-guided semantic segmentation of valid objects throughout the video (using YOLOE and SAM2), with support for saving segmentation visualization results.video_depth_estimation_mapper: Performs depth estimation on the video, with support for saving both visualization results and point cloud data.