-
Notifications
You must be signed in to change notification settings - Fork 221
Added validation check for bounding boxes #1015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @reddykkeerthi, for these contributions. I’ve added some comments—please take a look and let me know if you have any questions.
src/deepforest/dataset.py
Outdated
| image = np.array(Image.open(img_name).convert("RGB")) / 255 | ||
| self.image_dict[idx] = image.astype("float32") | ||
|
|
||
| def _validate_annotations(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share how you've run this to confirm it addresses the issue? Please include some details on your process in the comments. Additionally, we should make sure to add tests for the new function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing!
In the validate_annotations() function, I’ve added a validation check for bounding boxes within the __init__ method of TreeDataset. This ensures that bounding boxes from CSV and other supported formats (Previously the test failed as I was not accounting for other formats) are verified before proceeding to model fine-tuning during the __getitem__ call.
To perform this check, I extracted the image_path from CSV and calculated the image’s height and width to compare against the xmin, xmax, ymin, and ymax values.
I’ve also added unit tests in test_dataset.py to cover different cases, including:
xmin < 0ymin < 0xmax > image widthymax > image height
Additionally, I tested this in a Jupyter notebook by importing the dataset, creating a custom CSV file with various types and combinations of invalid annotations, and initializing TreeDataset. It correctly raised the appropriate error, as shown in the attached screenshot:
I have commited these changes. Please let me know if anything else is needed.
|
Hi @reddykkeerthi! I was also looking into this issue and noticed your PR already addresses the core validation logic. While going through this, I have:
All of these are already implemented, tested locally, and ready to push from my side. If you’re open to it, and if Henry is okay with it, I’d love to contribute directly to this PR (if access is provided) and push these changes here. Otherwise, I’d be more than happy to open a separate PR adding you as a co-author, or share these changes/snippets here so you can incorporate them however you prefer. Let me know what works best for you! |
Hi @Samia35-2973 , really appreciate you taking the time to dive into this and for outlining the enhancements you’ve worked on—that’s great to see! For this particular PR, I’d like to finish addressing the remaining issues and follow through with the validation logic updates myself, since I have already worked on fixing the test failures and aligning it with the rest of the pipeline. However, I am currently researching about point annotations as this seems to be missed in the original description. I’d prefer to keep the changes centralized here to maintain consistency and clarity in the implementation. That said, your test case ideas and documentation updates sound helpful. If you’re open to it, perhaps you could open a follow-up PR once this is merged to build on top of the validation improvements? I’d be happy to review and collaborate then! Thanks again! |
I have added all the requested features. Please let me know if anything else is required. Thanks. |
|
Resolve conflicts and I think this is ready. Thanks! |
|
Resolved. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of your hard work on this @reddykkeerthi!
The one last thing we need to do is cleanup the commits to help us maintain a clean readable git history by rebasing the entire PR into a single commit with a description very close to your original "Add validation check for bounding boxes".
This is typically done with an interactive rebase. Let me know if it would be helpful for me to walk you through the idea.

Fixes #1014
Implemented vectorized validation using Pandas operations to check all bounding boxes per image simultaneously for boundary violations (negative coords, out-of-image bounds, inverted boxes). Replaced row-wise iteration with batched
(xmin >= 0) & (xmax <= width)-style checks on entire annotation subsets, improving efficiency for large datasets. Validation errors aggregate all issues per image for clearer debugging.