-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Fix backups for tasks using a mounted file share #9972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
62343ef to
c25755a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #9972 +/- ##
===========================================
+ Coverage 75.45% 75.49% +0.03%
===========================================
Files 427 427
Lines 46234 46233 -1
Branches 4132 4132
===========================================
+ Hits 34888 34903 +15
+ Misses 11346 11330 -16
🚀 New features to boost your workflow:
|
35affd4 to
8abde7d
Compare
zhiltsov-max
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check if this PR closes any other issues from the list? I can see there are 4 similar issues.
Wow, apparently everyone on the team encountered this at some point. 🤪 Updated the description. |
Currently, backing up such a task succeeds, but restoring fails with "No such file or directory: <root>/share/manifest.jsonl". IMO, the problem is not really in the restore logic, but in the backup logic. Share task backups are "heavyweight", in that they include a copy of the media files. But the `storage` field in `task.json` is still set to "share". As a result, CVAT tries to import it as a share task, and that logic is broken. I think we should handle such backups consistently with heavyweight cloud storage backups, and set `storage` to "local". The restore logic will then work perfectly well. In the future, if we happen to implement lightweight backups for share tasks, we can use `storage: "share"` for those.
There doesn't seem to be much purpose to this check - chunks are not even used for backups, and backups of such tasks work fine with the check removed.
8abde7d to
60ad3f3
Compare
|
…rame range Let's say we have a cloud storage-based tasks that was created with 6 input images, and the following settings: start frame 1, stop frame 4, frame step 2. A heavyweight backup of such a task will then contain only frames 1 and 3 in the `data` directory; however, the manifest will still contain all 6 frames. If you try to restore such a backup, CVAT will fail, because it checks that the files in the `data` directory correspond 1:1 to the manifest. This could potentially be fixed in the restore code; however, it seems to me that the backups created in this case are incorrect, as they have manifests referencing nonexistent files. As such, I think it's more appropriate to fix this in the backup code. The fix is to filter the manifest during backup, leaving only entries corresponding to frames that actually get backed up. We also need to reset the frame range to the default, so that it matches the filtered manifest. Note that the same bug also affects backups of tasks based on the mounted share. It should be reasonably easy to fix (just use `_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I cannot test such a fix, because share backups are currently broken entirely. So I will defer this fix until cvat-ai#9972.
…rame range Let's say we have a cloud storage-based tasks that was created with 6 input images, and the following settings: start frame 1, stop frame 4, frame step 2. A heavyweight backup of such a task will then contain only frames 1 and 3 in the `data` directory; however, the manifest will still contain all 6 frames. If you try to restore such a backup, CVAT will fail, because it checks that the files in the `data` directory correspond 1:1 to the manifest. This could potentially be fixed in the restore code; however, it seems to me that the backups created in this case are incorrect, as they have manifests referencing nonexistent files. As such, I think it's more appropriate to fix this in the backup code. The fix is to filter the manifest during backup, leaving only entries corresponding to frames that actually get backed up. We also need to reset the frame range to the default, so that it matches the filtered manifest. Note that the same bug also affects backups of tasks based on the mounted share. It should be reasonably easy to fix (just use `_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I cannot test such a fix, because share backups are currently broken entirely. So I will defer this fix until cvat-ai#9972.
…rame range Let's say we have a cloud storage-based tasks that was created with 6 input images, and the following settings: start frame 1, stop frame 4, frame step 2. A heavyweight backup of such a task will then contain only frames 1 and 3 in the `data` directory; however, the manifest will still contain all 6 frames. If you try to restore such a backup, CVAT will fail, because it checks that the files in the `data` directory correspond 1:1 to the manifest. This could potentially be fixed in the restore code; however, it seems to me that the backups created in this case are incorrect, as they have manifests referencing nonexistent files. As such, I think it's more appropriate to fix this in the backup code. The fix is to filter the manifest during backup, leaving only entries corresponding to frames that actually get backed up. We also need to reset the frame range to the default, so that it matches the filtered manifest. Note that the same bug also affects backups of tasks based on the mounted share. It should be reasonably easy to fix (just use `_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I cannot test such a fix, because share backups are currently broken entirely. So I will defer this fix until cvat-ai#9972.
…rame range Let's say we have a cloud storage-based task that was created with 6 input images, and the following settings: start frame 1, stop frame 4, frame step 2. A heavyweight backup of such a task will then contain only frames 1 and 3 in the `data` directory; however, the manifest will still contain all 6 frames. If you try to restore such a backup, CVAT will fail, because it checks that the files in the `data` directory correspond 1:1 to the manifest. This could potentially be fixed in the restore code; however, it seems to me that the backups created in this case are incorrect, as they have manifests referencing nonexistent files. As such, I think it's more appropriate to fix this in the backup code. The fix is to filter the manifest during backup, leaving only entries corresponding to frames that actually get backed up. We also need to reset the frame range to the default, so that it matches the filtered manifest. Note that the same bug also affects backups of tasks based on the mounted share. It should be reasonably easy to fix (just use `_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I cannot test such a fix, because share backups are currently broken entirely. So I will defer this fix until cvat-ai#9972.



Fixes #4989
Fixes #6149
Fixes #7965
Fixes #8297
Motivation and context
There are two problems here.
First, CVAT refuses to back up such tasks if they use static chunks. There doesn't seem to be much purpose for this; the backup process doesn't even use chunks. With the check removed, the backups work fine.
Second, even when backing up succeeds, restoring fails with "No such file or directory: /share/manifest.jsonl".
IMO, the problem here is not really in the restore logic, but in the backup logic. Share task backups are "heavyweight", in that they include a copy of the media files. But the
storagefield intask.jsonis still set to "share". As a result, CVAT tries to import it as a share task, and that logic is broken.I think we should handle such backups consistently with heavyweight cloud storage backups, and set
storageto "local". The restore logic will then work perfectly well. In the future, if we happen to implement lightweight backups for share tasks, we can usestorage: "share"for those.How has this been tested?
Unit tests.
Checklist
developbranch[ ] I have updated the documentation accordinglyLicense
Feel free to contact the maintainers if that's a concern.