Added validation check for bounding boxes #1015

reddykkeerthi · 2025-04-05T09:53:23Z

Fixes #1014
Implemented vectorized validation using Pandas operations to check all bounding boxes per image simultaneously for boundary violations (negative coords, out-of-image bounds, inverted boxes). Replaced row-wise iteration with batched (xmin >= 0) & (xmax <= width)-style checks on entire annotation subsets, improving efficiency for large datasets. Validation errors aggregate all issues per image for clearer debugging.

henrykironde

Thanks, @reddykkeerthi, for these contributions. I’ve added some comments—please take a look and let me know if you have any questions.

henrykironde · 2025-04-06T19:21:32Z

src/deepforest/dataset.py

                image = np.array(Image.open(img_name).convert("RGB")) / 255
                self.image_dict[idx] = image.astype("float32")

+    def _validate_annotations(self):


Could you share how you've run this to confirm it addresses the issue? Please include some details on your process in the comments. Additionally, we should make sure to add tests for the new function.

Thanks for reviewing!

In the validate_annotations() function, I’ve added a validation check for bounding boxes within the __init__ method of TreeDataset. This ensures that bounding boxes from CSV and other supported formats (Previously the test failed as I was not accounting for other formats) are verified before proceeding to model fine-tuning during the __getitem__ call.

To perform this check, I extracted the image_path from CSV and calculated the image’s height and width to compare against the xmin, xmax, ymin, and ymax values.

I’ve also added unit tests in test_dataset.py to cover different cases, including:

xmin < 0

ymin < 0

xmax > image width

ymax > image height

Additionally, I tested this in a Jupyter notebook by importing the dataset, creating a custom CSV file with various types and combinations of invalid annotations, and initializing TreeDataset. It correctly raised the appropriate error, as shown in the attached screenshot:

I have commited these changes. Please let me know if anything else is needed.

Samia35-2973 · 2025-04-06T21:36:32Z

Hi @reddykkeerthi! I was also looking into this issue and noticed your PR already addresses the core validation logic. While going through this, I have:

Adjusted the existing test cases to align with this new validation logic
Added several new test cases to cover additional edge scenarios, along with a dedicated test case specifically to check the validation logic as requested by @henrykironde, which I confirmed by debugging and got output like:
```
Invalid bounding boxes detected:
Image 'OSBS_029.tif' (400x400): 2 invalid boxes
  Box [xmin=-10, ymin=-10, xmax=50, ymax=50]
  Box [xmin=500, ymin=500, xmax=600, ymax=600]
```
Updated and extended the related documentation for better clarity on when this validation runs and what kind of errors users might encounter

All of these are already implemented, tested locally, and ready to push from my side. If you’re open to it, and if Henry is okay with it, I’d love to contribute directly to this PR (if access is provided) and push these changes here. Otherwise, I’d be more than happy to open a separate PR adding you as a co-author, or share these changes/snippets here so you can incorporate them however you prefer.

Let me know what works best for you!

reddykkeerthi · 2025-04-07T16:44:10Z

Hi @reddykkeerthi! I was also looking into this issue and noticed your PR already addresses the core validation logic. While going through this, I have:
Adjusted the existing test cases to align with this new validation logic
Added several new test cases to cover additional edge scenarios, along with a dedicated test case specifically to check the validation logic as requested by @henrykironde, which I confirmed by debugging and got output like:
Invalid bounding boxes detected:
Image 'OSBS_029.tif' (400x400): 2 invalid boxes
  Box [xmin=-10, ymin=-10, xmax=50, ymax=50]
  Box [xmin=500, ymin=500, xmax=600, ymax=600]
Updated and extended the related documentation for better clarity on when this validation runs and what kind of errors users might encounter
All of these are already implemented, tested locally, and ready to push from my side. If you’re open to it, and if Henry is okay with it, I’d love to contribute directly to this PR (if access is provided) and push these changes here. Otherwise, I’d be more than happy to open a separate PR adding you as a co-author, or share these changes/snippets here so you can incorporate them however you prefer.

Let me know what works best for you!

Hi @Samia35-2973 , really appreciate you taking the time to dive into this and for outlining the enhancements you’ve worked on—that’s great to see!

For this particular PR, I’d like to finish addressing the remaining issues and follow through with the validation logic updates myself, since I have already worked on fixing the test failures and aligning it with the rest of the pipeline. However, I am currently researching about point annotations as this seems to be missed in the original description. I’d prefer to keep the changes centralized here to maintain consistency and clarity in the implementation.

That said, your test case ideas and documentation updates sound helpful. If you’re open to it, perhaps you could open a follow-up PR once this is merged to build on top of the validation improvements? I’d be happy to review and collaborate then!

Thanks again!

reddykkeerthi · 2025-04-12T11:29:15Z

Thanks, @reddykkeerthi, for these contributions. I’ve added some comments—please take a look and let me know if you have any questions.

I have added all the requested features. Please let me know if anything else is required. Thanks.

bw4sz · 2025-04-16T20:32:54Z

Resolve conflicts and I think this is ready. Thanks!

reddykkeerthi · 2025-04-17T04:26:11Z

Resolved. Thanks!

ethanwhite

Thanks for all of your hard work on this @reddykkeerthi!

The one last thing we need to do is cleanup the commits to help us maintain a clean readable git history by rebasing the entire PR into a single commit with a description very close to your original "Add validation check for bounding boxes".

This is typically done with an interactive rebase. Let me know if it would be helpful for me to walk you through the idea.

Added validation check for bounding boxes

8945337

henrykironde requested changes Apr 6, 2025

View reviewed changes

reddykkeerthi added 2 commits April 12, 2025 16:42

Added tests

8405506

refractoring

816737a

reddykkeerthi requested a review from henrykironde April 12, 2025 11:25

bw4sz added the Awaiting author contribution Waiting on the issue author to do something before proceeding label Apr 16, 2025

github-actions bot removed the Awaiting author contribution Waiting on the issue author to do something before proceeding label Apr 16, 2025

Merge branch 'main' into validation_check

8764ca8

ethanwhite requested changes Apr 23, 2025

View reviewed changes

ethanwhite added the Awaiting author contribution Waiting on the issue author to do something before proceeding label Apr 23, 2025

ethanwhite mentioned this pull request Apr 23, 2025

Bounding Box error when training but no transformation function was passed #921

Closed

jveitchmichaelis mentioned this pull request Jul 21, 2025

validate labels in BoxDataset #1093

Merged

jveitchmichaelis added a commit to jveitchmichaelis/DeepForest that referenced this pull request Sep 11, 2025

validate bbox check adapted from weecology#1015

0e37f7c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added validation check for bounding boxes #1015

Added validation check for bounding boxes #1015

Uh oh!

reddykkeerthi commented Apr 5, 2025

Uh oh!

henrykironde left a comment

Uh oh!

henrykironde Apr 6, 2025

Uh oh!

reddykkeerthi Apr 12, 2025

Uh oh!

Samia35-2973 commented Apr 6, 2025

Uh oh!

reddykkeerthi commented Apr 7, 2025

Uh oh!

reddykkeerthi commented Apr 12, 2025

Uh oh!

bw4sz commented Apr 16, 2025

Uh oh!

reddykkeerthi commented Apr 17, 2025

Uh oh!

ethanwhite left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Added validation check for bounding boxes #1015

Are you sure you want to change the base?

Added validation check for bounding boxes #1015

Uh oh!

Conversation

reddykkeerthi commented Apr 5, 2025

Uh oh!

henrykironde left a comment

Choose a reason for hiding this comment

Uh oh!

henrykironde Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

reddykkeerthi Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

Samia35-2973 commented Apr 6, 2025

Uh oh!

reddykkeerthi commented Apr 7, 2025

Uh oh!

reddykkeerthi commented Apr 12, 2025

Uh oh!

bw4sz commented Apr 16, 2025

Uh oh!

reddykkeerthi commented Apr 17, 2025

Uh oh!

ethanwhite left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants