Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: raise UploadErrors for images and annotations #266

Merged
merged 6 commits into from
Aug 15, 2024

Conversation

caiquejjx
Copy link
Contributor

Description

Fixes issue #235 by raising UploadError exceptions that happens inside single_upload

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Same test as the issue

import json
import roboflow

rf = roboflow.Roboflow(api_key="API_KEY")
img_path = "<path_to_img>"

workspace_id = "<ws ID>"
project_id = "<proj ID>"
workspace = rf.workspace(workspace_id)
project = workspace.project(project_id)

annotation = {
    "it is": [
        {
            "very frustrating": 0,
            "when you": "send",
            "an annotation": {
                "file and": 0.5,
                
                "function": 0.5,
                "call": 0.5,
                "completes": 0.5,
                "successfully,": 0.5,

            },
            "but": "there's",
            "no": "annotations",
            "on": "the website",
            "nor": "any",
            "errors": "raised",
        }
    ]
}
with open("annotation.json", "w") as f:
    json.dump(annotation, f)

image = project.upload(img_path, annotation_path="annotation.json")

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

  • Docs updated? What were the changes:

@LinasKo
Copy link
Contributor

LinasKo commented Jun 26, 2024

Hey @caiquejjx 👋

Typically, we ask for a Colab to showcase the changes, which significantly speeds up the review process.

I made one for us this time:
https://colab.research.google.com/drive/1xNixTC2xljopKhSpatDhYr2pCY0b5HvG#scrollTo=xI1cabXdatjr

I like the change, but wonder what the broader impact would be. Whether there's anyone relying on us failing silently.

@caiquejjx
Copy link
Contributor Author

Thanks @LinasKo! yeah it makes sense

@CLAassistant
Copy link

CLAassistant commented Jul 12, 2024

CLA assistant check
All committers have signed the CLA.

@LinasKo
Copy link
Contributor

LinasKo commented Aug 13, 2024

Hi @caiquejjx, I'll be looking at this, and if you're around - let's get it merged!

I've tested the solution and it works well, but I'd like a slightly safer approach to parsing the error.

Here's what I have in mind:

    def _parse_upload_error(self, error: rfapi.UploadError) -> str:
        dict_part = str(error).split(": ", 2)[2]
        # Could be done with ast.literal_eval, but this is safer.
        if re.search(r"'\w+':", dict_part):
            temp_str = dict_part.replace(r"\'", "<PLACEHOLDER>")
            temp_str = temp_str.replace("'", '"')
            dict_part = temp_str.replace("<PLACEHOLDER>", "'")
        parsed_dict: dict = json.loads(dict_part)
        message = parsed_dict.get("message")
        return message or str(parsed_dict)

Could you replace the method (and import re instead of ast), and then test it?

Let's also add a correct annotation test, with annotation format such as this:

Annotations Dict
annotation = {
    "info": {
        "year": "2020",
        "version": "1",
        "description": "None",
        "contributor": "Linas",
        "url": "https://app.roboflow.ai/datasets/hard-hat-sample/1",
        "date_created": "2000-01-01T00:00:00+00:00",
    },
    "licenses": [{"id": 1, "url": "https://creativecommons.org/publicdomain/zero/1.0/", "name": "Public Domain"}],
    "categories": [
        {"id": 0, "name": "cat", "supercategory": "animals"},
    ],
    "images": [
        {
            "id": 0,
            "license": 1,
            "file_name": img_path,
            "height": 1024,
            "width": 1792,
            "date_captured": "2020-07-20T19:39:26+00:00",
        }
    ],
    "annotations": [
        {
            "id": 0,
            "image_id": 0,
            "category_id": 0,
            "bbox": [45, 2, 85, 85],
            "area": 7225,
            "segmentation": [],
            "iscrowd": 0,
        },
        {
            "id": 1,
            "image_id": 0,
            "category_id": 0,
            "bbox": [324, 29, 72, 81],
            "area": 5832,
            "segmentation": [],
            "iscrowd": 0,
        },
    ],
}

@caiquejjx
Copy link
Contributor Author

Hey @LinasKo sure thing, I'll work on these soon and I guess we should also have automated tests right? (not sure if it was what you meant)

@LinasKo
Copy link
Contributor

LinasKo commented Aug 13, 2024

Aye, I wasn't too clear. A Colab will be fine this time - no need for unit tests.

@caiquejjx
Copy link
Contributor Author

@LinasKo done, I had to recreate the colab in order to edit

Copy link
Contributor

@LinasKo LinasKo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work spotting the missing replacings!

It's so tempting to revert back to ast.literal_eval at this point - seems so much simpler. Following the threads though, I see ways of crashing ast.literal_eval with just 301 symbols. If I knew whether the input is considered trusted, it would be an easy choice, but now I'd rather not take the chances.

I left one comment in the code. I think we're almost good to go.

roboflow/core/project.py Outdated Show resolved Hide resolved
if re.search(r"'\w+':", dict_part):
dict_part = dict_part.replace("True", "true")
dict_part = dict_part.replace("False", "false")
dict_part = dict_part.replace("None", "null")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if re.search(r"'\w+':", dict_part) is false?

We still wish to load from json, but since we didn't run dict_part = dict_part.replace("True", "true"), it might not be valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess now I got it right

Copy link
Contributor

@LinasKo LinasKo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much - looks good! 👍
Extra tests here show exactly what I'd expect: Colab

  1. Upload successful for new image + annotation
  2. Error saying annotations already exist, if you try again
  3. Incorrect annotation format error if wrong annotations are passed in.

@LinasKo
Copy link
Contributor

LinasKo commented Aug 15, 2024

@tonylampada, could you please have a look at this? I believe it is ready to merge.

@LinasKo LinasKo merged commit 1083a06 into roboflow:main Aug 15, 2024
6 checks passed
@tonylampada
Copy link
Collaborator

@LinasKo I think uploading with the CLI relies on the previous more lenient handling of annotation errors. Can you test that and see if it still has the same behaviour when importing datasets with the CLI?

Also, I think the backend already provides valid json in case of error, it just seems weird parsing it using regex like that.

@LinasKo
Copy link
Contributor

LinasKo commented Aug 15, 2024

@tonylampada, here's the repr of the error. It's not valid json - the strings are declared with '!.

UploadError("save annotation for MzqOjxfl22A5rxmFhrUI / bad response: 400: {'message': 'Image was already annotated.', 'type': 'InvalidImageException', 'hint': 'This image was already annotated; to overwrite the annotation, pass overwrite=true as a query parameter.'}")

I very much agree it's weird! The correct solution would be to change wherever the json is produced. Alternatively, let's use the whole string. Also, we can avoid all of this with ast.literal_eval, as per original suggestion by @caiquejjx, but that's a slight risk if the API error ever contains user-defined data.

CLI test results coming soon.

@LinasKo
Copy link
Contributor

LinasKo commented Aug 15, 2024

@tonylampada, I've tested:

  • Uploading image + annotation
  • Downloading object detection dataset (hard hats sample)
  • Importing hard hats into another obj detection project
    No issues during upload, no issues on the site.

However, I also labelled a keypoints dataset, created a version and attempted to download it, which failed.
I think it's related to json, but unrelated to my changes.

Could you test if the following works?

roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp
Error response
  ❯ roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp
loading Roboflow workspace...
loading Roboflow project...
Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 974, in json
    return complexjson.loads(self.text, **kwargs)
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 258, in export
    raise RuntimeError(response.json())
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 978, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/linasko/rf/pr/roboflow-python/.venv/bin/roboflow", line 8, in <module>
    sys.exit(main())
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 423, in main
    args.func(args)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 45, in download
    version.download(args.format, location=args.location, overwrite=True)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 205, in download
    self.export(model_format)
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 260, in export
    response.raise_for_status()
  File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.roboflow.com/linas-ws/keypooint-test/1/voc?api_key=<MY_API_KEY>

@LinasKo
Copy link
Contributor

LinasKo commented Aug 16, 2024

@tonylampada, responding to your Loom, I see the bigger picture now. We're both working towards the same goal of not hiding useful error messages.

The problem is that dataset import captures the result of project.single_upload and logs it with _log_img_upload, whereas uploading a single image/annotation completely ignores the result.

The correct solution would keep one of these approaches. Should we implement an equivalent of _log_img_upload in roboflow/roboflowpy.py::upload_image(), and then remove the raise introduced by this PR?

@tonylampada
Copy link
Collaborator

Aforementioned loom here just for reference (so we don't exclude @caiquejjx from the conversation :-))
https://www.loom.com/share/5f7a6520e0364b7588fd6130c682e0ef

@tonylampada
Copy link
Collaborator

tonylampada commented Aug 16, 2024

@LinasKo here's what I'm thinking.

project.single_upload has an "identity crisis" because it's not exactly "single" upload, is it? :-)
It can upload:

  • one image
  • one annotation (I doubt this is much used, but it's there)
  • one image, and then one annotation

Then there's the question of what do we do when something goes wrong with one of those operations.
Ideally we should decouple those in different methods, but that would be a breaking change, probably not worth it. So, scratch that.

Then, when should single_upload throw an error?
One possible answer is:

  • one image = RAISE if fails
  • one annotation = RAISE if fails
  • one image, and then one annotation
    • RAISE if image fails
    • RETURN image id + annotation error if annotation fails

I think this kinda makes sense from the perspective of what error means.
But I don't like it. Because it's a little more complex to implement inside (single_upload) and that complexity also leaks by forcing the caller to know when to handle and/or look inside the results.

So I'm tending to think that a better answer is:

  • Always return, never raise. The return value is a dictionary that contains the result for each operation (like: "image upload=ok, here's the id, annotation upload failed, here's the error")

Now, duplicating _log_img_upload is something I don't like. DRY, right?
Maybe we should move that responsibility into single_upload as well, before returning the result dic. wdyt?

@caiquejjx
Copy link
Contributor Author

@tonylampada thanks for the loom! one note is that single_upload kinda did that before already but the problem was that the upload method didn't return anything and one of the reasons I went for raising is because people probably don't check the return of the upload since it doesn't returns anything, so besides the change on single_upload we should also start returning the result in upload, right? or maybe logging

@tonylampada
Copy link
Collaborator

tonylampada commented Aug 16, 2024

we should also start returning the result in upload, right?

Yup. That makes sense

or maybe logging

I think it might be better to move the logging responsibility into single_upload.

@lrosemberg
Copy link
Contributor

lrosemberg commented Aug 18, 2024

I'm writing some tests for my PR (that maintains the compatibility with CLI), and this PR breaks the upload parse in some cases, I'll describe it better below.

In the line where the split is done by a colon and you access position 2, not all returns from the rfapi.upload_image have the status code, and so they don't have this second colon.

image

CleanShot 2024-08-18 at 17 57 44@2x

@caiquejjx @LinasKo @tonylampada

This PR should fix that, and another things:

#312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants