Skip to content

Conversation

@reddykkeerthi
Copy link
Contributor

Fixes #1014
Implemented vectorized validation using Pandas operations to check all bounding boxes per image simultaneously for boundary violations (negative coords, out-of-image bounds, inverted boxes). Replaced row-wise iteration with batched (xmin >= 0) & (xmax <= width)-style checks on entire annotation subsets, improving efficiency for large datasets. Validation errors aggregate all issues per image for clearer debugging.

Copy link
Contributor

@henrykironde henrykironde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @reddykkeerthi, for these contributions. I’ve added some comments—please take a look and let me know if you have any questions.

image = np.array(Image.open(img_name).convert("RGB")) / 255
self.image_dict[idx] = image.astype("float32")

def _validate_annotations(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share how you've run this to confirm it addresses the issue? Please include some details on your process in the comments. Additionally, we should make sure to add tests for the new function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing!

In the validate_annotations() function, I’ve added a validation check for bounding boxes within the __init__ method of TreeDataset. This ensures that bounding boxes from CSV and other supported formats (Previously the test failed as I was not accounting for other formats) are verified before proceeding to model fine-tuning during the __getitem__ call.

To perform this check, I extracted the image_path from CSV and calculated the image’s height and width to compare against the xmin, xmax, ymin, and ymax values.

I’ve also added unit tests in test_dataset.py to cover different cases, including:

  • xmin < 0
  • ymin < 0
  • xmax > image width
  • ymax > image height

Additionally, I tested this in a Jupyter notebook by importing the dataset, creating a custom CSV file with various types and combinations of invalid annotations, and initializing TreeDataset. It correctly raised the appropriate error, as shown in the attached screenshot:

WhatsApp Image 2025-04-12 at 16 19 47_b993a5f1

I have commited these changes. Please let me know if anything else is needed.

@Samia35-2973
Copy link
Contributor

Hi @reddykkeerthi! I was also looking into this issue and noticed your PR already addresses the core validation logic. While going through this, I have:

  • Adjusted the existing test cases to align with this new validation logic
  • Added several new test cases to cover additional edge scenarios, along with a dedicated test case specifically to check the validation logic as requested by @henrykironde, which I confirmed by debugging and got output like:
    Invalid bounding boxes detected:
    Image 'OSBS_029.tif' (400x400): 2 invalid boxes
      Box [xmin=-10, ymin=-10, xmax=50, ymax=50]
      Box [xmin=500, ymin=500, xmax=600, ymax=600]
    
  • Updated and extended the related documentation for better clarity on when this validation runs and what kind of errors users might encounter

All of these are already implemented, tested locally, and ready to push from my side. If you’re open to it, and if Henry is okay with it, I’d love to contribute directly to this PR (if access is provided) and push these changes here. Otherwise, I’d be more than happy to open a separate PR adding you as a co-author, or share these changes/snippets here so you can incorporate them however you prefer.

Let me know what works best for you!

@reddykkeerthi
Copy link
Contributor Author

Hi @reddykkeerthi! I was also looking into this issue and noticed your PR already addresses the core validation logic. While going through this, I have:

  • Adjusted the existing test cases to align with this new validation logic
  • Added several new test cases to cover additional edge scenarios, along with a dedicated test case specifically to check the validation logic as requested by @henrykironde, which I confirmed by debugging and got output like:
    Invalid bounding boxes detected:
    Image 'OSBS_029.tif' (400x400): 2 invalid boxes
      Box [xmin=-10, ymin=-10, xmax=50, ymax=50]
      Box [xmin=500, ymin=500, xmax=600, ymax=600]
    
  • Updated and extended the related documentation for better clarity on when this validation runs and what kind of errors users might encounter

All of these are already implemented, tested locally, and ready to push from my side. If you’re open to it, and if Henry is okay with it, I’d love to contribute directly to this PR (if access is provided) and push these changes here. Otherwise, I’d be more than happy to open a separate PR adding you as a co-author, or share these changes/snippets here so you can incorporate them however you prefer.

Let me know what works best for you!

Hi @Samia35-2973 , really appreciate you taking the time to dive into this and for outlining the enhancements you’ve worked on—that’s great to see!

For this particular PR, I’d like to finish addressing the remaining issues and follow through with the validation logic updates myself, since I have already worked on fixing the test failures and aligning it with the rest of the pipeline. However, I am currently researching about point annotations as this seems to be missed in the original description. I’d prefer to keep the changes centralized here to maintain consistency and clarity in the implementation.

That said, your test case ideas and documentation updates sound helpful. If you’re open to it, perhaps you could open a follow-up PR once this is merged to build on top of the validation improvements? I’d be happy to review and collaborate then!

Thanks again!

@reddykkeerthi
Copy link
Contributor Author

Thanks, @reddykkeerthi, for these contributions. I’ve added some comments—please take a look and let me know if you have any questions.

I have added all the requested features. Please let me know if anything else is required. Thanks.

@bw4sz
Copy link
Collaborator

bw4sz commented Apr 16, 2025

Resolve conflicts and I think this is ready. Thanks!

@bw4sz bw4sz added the Awaiting author contribution Waiting on the issue author to do something before proceeding label Apr 16, 2025
@github-actions github-actions bot removed the Awaiting author contribution Waiting on the issue author to do something before proceeding label Apr 16, 2025
@reddykkeerthi
Copy link
Contributor Author

Resolved. Thanks!

Copy link
Member

@ethanwhite ethanwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of your hard work on this @reddykkeerthi!

The one last thing we need to do is cleanup the commits to help us maintain a clean readable git history by rebasing the entire PR into a single commit with a description very close to your original "Add validation check for bounding boxes".

This is typically done with an interactive rebase. Let me know if it would be helpful for me to walk you through the idea.

@ethanwhite ethanwhite added the Awaiting author contribution Waiting on the issue author to do something before proceeding label Apr 23, 2025
jveitchmichaelis added a commit to jveitchmichaelis/DeepForest that referenced this pull request Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Awaiting author contribution Waiting on the issue author to do something before proceeding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add validation check to ensure that all bounding boxes are valid for training/evaluation

5 participants