-
Notifications
You must be signed in to change notification settings - Fork 33
Fix KNN Classification Logic and Optimize Class Count Aggregation #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe changes update the KNN classification logic in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant KNN
participant Labels
User->>KNN: Request prediction with input and k
KNN->>KNN: Find k nearest neighbor indices
KNN->>KNN: Collect indices, round labels, count votes
KNN->>Labels: Access label for each neighbor
KNN->>KNN: Fold to select class with highest count (tie: pick smaller label)
KNN-->>User: Return predicted class
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.coderabbit.yml(1 hunks)delta/src/algorithms.rs(1 hunks)
🔇 Additional comments (2)
.coderabbit.yml (1)
3-8:❓ Verification inconclusive
Verify configuration format compliance with CodeRabbit schema.
The change from a list format to a direct mapping key
rust-review:may not be compliant with the CodeRabbit configuration schema. According to the schema, thereviewsobject hasadditionalProperties: false, which means only predefined properties are allowed.
🏁 Script executed:
#!/bin/bash # Verify if rust-review is a valid configuration key echo "Checking if 'rust-review' is a standard CodeRabbit configuration key..." echo "Based on the provided schema, valid reviews properties include:" echo "- profile, request_changes_workflow, tools, path_instructions, etc." echo "The schema shows 'additionalProperties: false' for reviews object"Length of output: 587
🏁 Script executed:
#!/bin/bash # Locate JSON schema files in the repository schema_files=$(rg -l '"\\$schema"' -g '*.json') echo "Found schema files:" echo "$schema_files" # Search for 'reviews' definition in each schema file for file in $schema_files; do echo -e "\n=== Context in $file ===" rg -C3 '"reviews"' "$file" doneLength of output: 259
Unable to locate CodeRabbit schema—please confirm
rust-reviewis supportedI searched the repository for any JSON schema or docs defining the top-level
reviewsproperties but didn’t find a local schema file. Since the.coderabbit.ymlparser enforces"additionalProperties": falsefor thereviewsobject, this key may be rejected unless it’s explicitly allowed.• Please verify against the official CodeRabbit configuration reference whether
rust-reviewis a valid top-level review type.
• If it’s unsupported, convert this entry into a list item underreviews:or rename it to a supported profile/tool combination.delta/src/algorithms.rs (1)
446-452: LGTM! Improved tie-breaking logic for class prediction.The new fold-based approach with tie-breaking logic is well-implemented. It correctly selects the class with the highest vote count and breaks ties by choosing the smaller class label, which provides deterministic behavior.
| let k_indices_vec: Vec<usize> = k_indices.collect(); | ||
| for idx in k_indices_vec.iter().cycle().take(self.k) { | ||
| let label = y_train[*idx]; | ||
| let label_int = label.round() as usize; | ||
| *class_counts.entry(label_int).or_insert(0) += 1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary cycle() usage that could cause incorrect vote counting.
The use of .cycle().take(self.k) is problematic and unnecessary. Since k_indices is already created with .take(self.k) on line 432, it should contain exactly k elements. Using cycle() could theoretically repeat indices if fewer neighbors are found, leading to incorrect vote counting.
Apply this diff to fix the issue:
- let k_indices_vec: Vec<usize> = k_indices.collect();
- for idx in k_indices_vec.iter().cycle().take(self.k) {
+ for idx in k_indices {The label rounding improvement on line 441 is excellent for handling floating-point class labels.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In delta/src/algorithms.rs around lines 438 to 443, remove the use of .cycle()
when iterating over k_indices_vec because k_indices already contains exactly
self.k elements. Replace the for loop to iterate directly over k_indices_vec
without cycling to avoid repeating indices and incorrect vote counting.
Summary by CodeRabbit
Bug Fixes
Chores