-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize honeypot selection algorithm #8857
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes focus on enhancing the frame selection mechanism for honeypot validation in the CVAT (Computer Vision Annotation Tool) project. A new Changes
Sequence DiagramsequenceDiagram
participant Serializer as Task/Job Serializer
participant Selector as HoneypotFrameSelector
participant Counter as _BaggedCounter
Serializer->>Selector: Create with validation frame counts
Selector->>Counter: Initialize from frame counts
Serializer->>Selector: select_next_frames(count)
Selector->>Counter: Use shuffle and select methods
Counter-->>Selector: Return selected frames
Selector-->>Serializer: Provide selected frames
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
cvat/apps/engine/serializers.py (1)
Line range hint
1104-1115
: Consider adding input validation for negative counts passed to select_next_framesWithin JobValidationLayoutWriteSerializer, the code calls frame_selector.select_next_frames(segment_honeypots_count). If segment_honeypots_count is ever 0 or negative (due to incomplete cleanup or counts mismatch), an exception or unexpected behavior might result. Adding a quick check to ensure a nonnegative count might help.
+ if segment_honeypots_count <= 0: + raise serializers.ValidationError("No available honeypots to select.")
🧹 Nitpick comments (7)
cvat/apps/engine/task_validation.py (5)
16-24
: Consider renaming _BaggedCounter to BaggedCounter for clarityThe underscore prefix conventionally denotes a private or internal usage. However, this class is being used as a key part of the honeypot selection feature, making it relatively pivotal to the design. If it is intended to be widely used, renaming it to BaggedCounter (without the underscore) may make the intent clearer.
45-53
: Ensure deterministic seeding behavior for shuffleIf the shuffle is intended to be reproducible given a seed, ensure the random generator is consistently seeded outside of the function if it needs to be stable. If multiple calls to shuffle() share the same rng, verify that it will produce the intended random distribution across calls.
82-84
: Check concurrency when updating item usageIf multiple threads or processes use HoneypotFrameSelector, the shared data structures in _BaggedCounter could become corrupted during the pop() and push operations in use_item(). Ensuring thread/process safety (e.g., using locks, or restricting concurrency) might be necessary if usage can happen concurrently.
88-102
: Improve error handling or logging for missing frames in HoneypotFrameSelectorCurrently, the constructor ensures that frames are constructed from a valid dictionary. However, if the dictionary is missing certain keys or if frames are outside the expected domain, there is no explicit exception or log message before continuing. Consider adding raised exceptions or logs to make debugging easier if the dictionary is incomplete.
103-115
: Document the assumptions in select_next_framesThe docstring mentions that certain properties (e.g., no repetition, uniform usage) are guaranteed. It would help to clarify that only frames known to the
_counter
can be selected and how the function responds when the requested count exceeds the number of available frames.cvat/apps/engine/serializers.py (2)
1489-1491
: Check if frames are disabled or removed mid-processWhen re-selecting frames in the “RANDOM_UNIFORM” method, the code calls HoneypotFrameSelector. If some frames become disabled (i.e., removed from active_validation_frames) after random selection, the internal logic might be incomplete or lead to an inconsistent final distribution. Validate if you need to re-check frame status or re-run the selection in that scenario.
Would you like me to open a new GitHub issue to propose an approach for re-validating frames in real time?
Line range hint
1104-1115
: Suggest unit tests covering corner casesNo direct unit test references appear here. We recommend adding tests around these lines to cover:
• Zero or minimal values for honeypot frames.
• Attempted usage with an empty active_validation_frames.
• Large or random values where random_seed is used.Also applies to: 1489-1491
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cvat/apps/engine/serializers.py
(5 hunks)cvat/apps/engine/task_validation.py
(1 hunks)
🔇 Additional comments (3)
cvat/apps/engine/task_validation.py (2)
25-41
: Validate negative or unexpected count inputs in from_iterable and from_counts methods
While the new code effectively constructs “bags” grouped by usage count, it does not guard against invalid usage counts (e.g., negative). This could happen if the input data source has errors or if the code inadvertently computes a negative count. Adding validation or safely clamping negative counts to zero would prevent silent anomalies.
Would you like a script to scan the codebase for possible negative usage count assignments or calls?
Line range hint 1489-1491
: Post-initialization check for random_seed usage in frame_selector
When assigning “frame_selector = HoneypotFrameSelector(active_validation_frame_counts)” (line 1490–1491), confirm that any random seed defined at the Task level or within the global context is properly passed to HoneypotFrameSelector. If the seed is omitted, the resulting random selection may yield nondeterministic behavior, which can complicate reproducibility.
cvat/apps/engine/serializers.py (1)
Line range hint 1362-1374
: Verify that the bulk context usage covers all frames thoroughly
The _TaskValidationLayoutBulkUpdateContext is a convenient way to manage bulk updates of frames. However, if frames are partially updated in other layers or if external code modifies frames after initialization, verify that these changes do not bypass the context. Consider adding a check or log if frames are found in an inconsistent state at the end of bulk operations.
|
||
def shuffle(self, *, rng: np.random.Generator | None): | ||
if not rng: | ||
rng = np.random.default_rng() |
Check notice
Code scanning / SonarCloud
Results that depend on random number generation should be reproducible Low
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8857 +/- ##
===========================================
+ Coverage 73.89% 73.94% +0.05%
===========================================
Files 408 408
Lines 44131 44214 +83
Branches 3986 3993 +7
===========================================
+ Hits 32611 32696 +85
+ Misses 11520 11518 -2
|
Quality Gate passedIssues Measures |
Motivation and context
Improved performance of the honeypot selection/update algorithm, noticeable on tasks with ~50k frames.
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation