-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Fix heavyweight backups of tasks that have images and a non-default frame range #10004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
d1c7b51 to
aeccf57
Compare
aeccf57 to
bc066b3
Compare
…rame range Let's say we have a cloud storage-based task that was created with 6 input images, and the following settings: start frame 1, stop frame 4, frame step 2. A heavyweight backup of such a task will then contain only frames 1 and 3 in the `data` directory; however, the manifest will still contain all 6 frames. If you try to restore such a backup, CVAT will fail, because it checks that the files in the `data` directory correspond 1:1 to the manifest. This could potentially be fixed in the restore code; however, it seems to me that the backups created in this case are incorrect, as they have manifests referencing nonexistent files. As such, I think it's more appropriate to fix this in the backup code. The fix is to filter the manifest during backup, leaving only entries corresponding to frames that actually get backed up. We also need to reset the frame range to the default, so that it matches the filtered manifest. Note that the same bug also affects backups of tasks based on the mounted share. It should be reasonably easy to fix (just use `_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I cannot test such a fix, because share backups are currently broken entirely. So I will defer this fix until cvat-ai#9972.
bc066b3 to
c37def9
Compare
|
| target_dir=target_dir, | ||
| ) | ||
|
|
||
| def _write_filtered_media_manifest(self, zip_object: ZipFile, target_dir: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these frame filtering parameters (start, stop, and step) only make sense and should only apply to video tasks. This might be a big change for a bug fix, but could you check if we can move this into task creation instead? I'd probably prefer having lightweight and regular backups with the same data configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these frame filtering parameters (start, stop, and step) only make sense and should only apply to video tasks.
I don't think I agree. For example, when a user creates a task, they can select a directory full of images and then set these parameters to only use a subset of them.
Also, regardless of whether creating tasks like this should be supported, de facto it already works. So we should be able to back up and restore such tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I agree. For example, when a user creates a task, they can select a directory full of images and then set these parameters to only use a subset of them.
My point was only that we probably should stop including all the unused images into task manifests for non-video tasks. Basically, do the same manifest filtering that was added in this PR, but during the task creation after we apply all the frame filters (these parameters, wildcards etc.).



Motivation and context
Let's say we have a cloud storage-based task that was created with 6 input images, and the following settings: start frame 1, stop frame 4, frame step 2. A heavyweight backup of such a task will then contain only frames 1 and 3 in the
datadirectory; however, the manifest will still contain all 6 frames. If you try to restore such a backup, CVAT will fail, because it checks that the files in thedatadirectory correspond 1:1 to the manifest.This could potentially be fixed in the restore code; however, it seems to me that the backups created in this case are incorrect, as they have manifests referencing nonexistent files. As such, I think it's more appropriate to fix this in the backup code.
The fix is to filter the manifest during backup, leaving only entries corresponding to frames that actually get backed up. We also need to reset the frame range to the default, so that it matches the filtered manifest.
Note that the same bug also affects backups of tasks based on the mounted share. It should be reasonably easy to fix (just use
_write_filtered_media_manifestin theStorageChoice.SHAREbranch), but I cannot test such a fix, because share backups are currently broken entirely. So I will defer this fix until #9972.This bug does not affect backups of tasks based on local files, because those currently include all frames, even those that fall outside that frame range. I think that behavior is probably wrong, but it doesn't require immediate fixing.
How has this been tested?
Unit tests. Also manually.
Checklist
developbranch[ ] I have updated the documentation accordingly[ ] I have linked related issues (see GitHub docs)License
Feel free to contact the maintainers if that's a concern.