Skip to content

Conversation

@ethanwhite
Copy link
Member

Checks to see if bounding boxes occur outside of image boundaries and clearly communicates to user when they do.

This is a version of #1015 that accounts for changes in the codebase over the last year. This was good work by @reddykkeerthi that I didn't want to see get lost.

Closes #1014

@ethanwhite
Copy link
Member Author

@jveitchmichaelis I justed noticed that you have a similar commit somewhere from the history in #1015. We can close this if you were planning to submit something similar. Sorry I didn't catch that earlier.

@codecov
Copy link

codecov bot commented Nov 15, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.50%. Comparing base (c10846a) to head (c8a48c2).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/deepforest/datasets/training.py 90.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1211      +/-   ##
==========================================
+ Coverage   87.47%   87.50%   +0.02%     
==========================================
  Files          20       20              
  Lines        2586     2616      +30     
==========================================
+ Hits         2262     2289      +27     
- Misses        324      327       +3     
Flag Coverage Δ
unittests 87.50% <90.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jveitchmichaelis
Copy link
Collaborator

jveitchmichaelis commented Nov 15, 2025

I guess that's in the dino branch? Struggling to see where that commit belongs - it looks like I added it when I was testing the pretrain data and then squashed. So we can keep this one if it didn't make it into main.

Copy link
Collaborator

@jveitchmichaelis jveitchmichaelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage checker is requesting failure tests for image path and geometry, might not be a bad idea to add those.

test_validate_dataset_missing_image and test_validate_dataset_bad_geometry etc.

Checks to see if bounding boxes occur outside of image boundaries and
clearly communicates to user when they do.

This is a version of weecology#1015 that accounts for changes in the codebase
over the last year.

Closes weecology#1014

Co-authored-by: Keerthi Reddy <[email protected]>
@ethanwhite
Copy link
Member Author

The coverage checker is requesting failure tests for image path and geometry, might not be a bad idea to add those.

Added a test for the missing image. Couldn't figure out how to trigger a bad geometry. You're welcome to PR one into this branch or point me in the right direction.

@jveitchmichaelis
Copy link
Collaborator

Yeah it's not clear to me how you'd fail that. Unless it's possible to store a bad box in WKT format and Shapely blindly loads the bounds? But otherwise I think the only way to trigger that error would be to have more (or less) than 4 bounding coordinates. Or if the shape doesn't have bounds defined - like a point?

@ethanwhite
Copy link
Member Author

Yeah it's not clear to me how you'd fail that. Unless it's possible to store a bad box in WKT format and Shapely blindly loads the bounds? But otherwise I think the only way to trigger that error would be to have more (or less) than 4 bounding coordinates. Or if the shape doesn't have bounds defined - like a point?

Should I just go ahead and pull that try/except block then?

@jveitchmichaelis
Copy link
Collaborator

jveitchmichaelis commented Nov 19, 2025

Apparently even a shapely.Point has "real" bounds: https://shapely.readthedocs.io/en/stable/reference/shapely.Point.html

Two options? If the check is to verify that it's a box, and we want to be strict, we should test for it geometrically. This seems like a neat solution. I would use envelope instead of allowing rotation.

https://stackoverflow.com/questions/62467829/python-check-if-shapely-polygon-is-a-rectangle

Alternatively we could take the bounds and assert non-zero area which would allow us to load polygon data (degraded as boxes), but not points.

Adding a check for zero (or very small) area is also probably good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add validation check to ensure that all bounding boxes are valid for training/evaluation

2 participants