feat: add human-centric video understanding operators for HumanVBench#938
feat: add human-centric video understanding operators for HumanVBench#938SYSUzhouting wants to merge 1 commit intomainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the video understanding capabilities of the system by introducing a suite of human-centric operators tailored for the HumanVBench project. It integrates advanced third-party models for detailed analysis of human presence, speech, and actions within videos, enabling more nuanced data processing for tasks focused on human interaction. The changes also include crucial fixes for batch processing and refinements in data handling, alongside updated documentation to guide users through the new features. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive set of new operators for human-centric video understanding, which is a significant feature addition. However, the review has identified several critical issues that need to be addressed. These include a regression in nlpaug_en_mapper that breaks batch processing, a critical bug in how several operators handle shared keyword arguments, and incorrect method signatures in the new mapper classes that will lead to runtime errors. Additionally, there are issues with temporary file handling, code duplication, and some potentially breaking changes regarding compressed file support that seem unrelated to the main feature. I recommend addressing these high-priority issues before merging.
| for key in samples: | ||
| if key != self.text_key: | ||
| res_samples[key] += [samples[key][idx]] * len(sample_texts) | ||
| texts_to_aug = samples[self.text_key][0] # batch_size = 1 |
There was a problem hiding this comment.
This change introduces a regression that breaks batch processing. The new implementation processes only the first sample of a batch (samples[self.text_key][0]), whereas the previous implementation correctly iterated over all samples. This will cause all but the first sample in a batch to be skipped during augmentation. The removal of tests/ops/mapper/test_nlpaug_en_mapper_batch_bug.py, which appears to have tested for this exact scenario, is also concerning.
Please revert to a batch processing logic that iterates over all samples in the batch, similar to the previous implementation.
| self.extra_kwargs = self._default_kwargs | ||
| for key in kwargs: | ||
| if key in self.extra_kwargs: | ||
| self.extra_kwargs[key] = kwargs[key] |
There was a problem hiding this comment.
Assigning self.extra_kwargs = self._default_kwargs creates a reference to the mutable class attribute _default_kwargs, not a copy. This means that if multiple instances of this operator are created, they will all share and modify the same dictionary, leading to unexpected and hard-to-debug behavior. This should be self.extra_kwargs = self._default_kwargs.copy() to ensure each instance has its own copy.
This issue is also present in other files in this PR, including:
data_juicer/ops/filter/image_face_ratio_filter.pydata_juicer/ops/filter/video_motion_score_filter.pydata_juicer/ops/mapper/image_face_blur_mapper.pydata_juicer/ops/mapper/video_face_blur_mapper.py
| self.extra_kwargs = self._default_kwargs | |
| for key in kwargs: | |
| if key in self.extra_kwargs: | |
| self.extra_kwargs[key] = kwargs[key] | |
| self.extra_kwargs = self._default_kwargs.copy() | |
| for key in kwargs: | |
| if key in self.extra_kwargs: | |
| self.extra_kwargs[key] = kwargs[key] |
| # Use ray.data functions directly with PyArrow filesystem support | ||
| # Ray's read functions support filesystem parameter via PyArrow | ||
| if data_format in {"json", "jsonl", "json.gz", "jsonl.gz", "json.zst", "jsonl.zst"}: | ||
| if data_format in {"json", "jsonl"}: |
There was a problem hiding this comment.
This PR seems to remove support for compressed json files (.gz, .zst) in several places, which is a significant breaking change and seems unrelated to the main goal of adding video operators. The changes are also inconsistent across the codebase. For example, load_strategy.py and ray_dataset.py remove support for .jsonl.zst, but data_juicer/format/json_formatter.py retains it. Could you clarify if removing compressed file support is intended? If so, the implementation should be consistent across the codebase.
|
|
||
|
|
||
|
|
||
| def process_single(self, samples, rank=None): |
There was a problem hiding this comment.
This mapper is decorated with _batched_op = True, but it implements process_single instead of process_batched. The Mapper.process method will attempt to call process_batched(self, samples), which will fail due to the method name mismatch and different signature (process_single(self, samples, rank=None)). This will cause a TypeError at runtime.
Please rename the method to process_batched and adjust its signature to match the base class. This issue exists in several other new mappers in this PR as well (e.g., video_captioning_from_human_tracks_mapper, video_human_tracks_face_demographic_mapper).
| temp_dir = tempfile.mkdtemp(dir=self.temp_save_path) | ||
| pyaviPath = os.path.join(temp_dir, 'pyavi') | ||
| pyframesPath = os.path.join(temp_dir, 'pyframes') | ||
| pyworkPath = os.path.join(temp_dir, 'pywork') | ||
| pycropPath = os.path.join(temp_dir, 'pycrop') | ||
| if os.path.exists(temp_dir): | ||
| rmtree(temp_dir) |
There was a problem hiding this comment.
The logic for handling temporary directories appears to be flawed. A temporary directory is created with tempfile.mkdtemp, then immediately removed with rmtree if it exists (which it always will). Subsequent calls to os.makedirs for subdirectories within this now-deleted path will fail with a FileNotFoundError. The rmtree call should be placed in a finally block to ensure cleanup after the directory has been used.
| from .video_captioning_face_attribute_emotion_mapper import VideoCaptioningFaceAttributeEmotionMapper | ||
| from .video_captioning_from_human_tracks_mapper import VideoCaptioningFromHumanTracksMapper | ||
| from .video_captioning_face_attribute_emotion_mapper import VideoCaptioningFaceAttributeEmotionMapper |
There was a problem hiding this comment.
VideoCaptioningFaceAttributeEmotionMapper is imported twice. This duplication should be removed for code cleanliness. This is also reflected in the __all__ list (lines 234 and 236) and in data_juicer/config/config_all.yaml.
| from .video_captioning_face_attribute_emotion_mapper import VideoCaptioningFaceAttributeEmotionMapper | |
| from .video_captioning_from_human_tracks_mapper import VideoCaptioningFromHumanTracksMapper | |
| from .video_captioning_face_attribute_emotion_mapper import VideoCaptioningFaceAttributeEmotionMapper | |
| from .video_captioning_face_attribute_emotion_mapper import VideoCaptioningFaceAttributeEmotionMapper | |
| from .video_captioning_from_human_tracks_mapper import VideoCaptioningFromHumanTracksMapper |
|
|
||
| class VideoFaceRatioFilter(Filter): | ||
| """ | ||
| Keep data samples whose videos' durations are within a specified range. |
There was a problem hiding this comment.
The docstring for this filter seems to be a copy-paste from another operator. It states "Keep data samples whose videos' durations are within a specified range," but this filter operates on face ratios, not durations. Please update the docstring to accurately describe the filter's functionality.
| Keep data samples whose videos' durations are within a specified range. | |
| Keep data samples whose videos' face-to-frame ratios are within a specified range. |
No description provided.