Skip to content

Conversation

@SpecLad
Copy link
Contributor

@SpecLad SpecLad commented Nov 4, 2025

Fixes #4989
Fixes #6149
Fixes #7965
Fixes #8297

Motivation and context

There are two problems here.

First, CVAT refuses to back up such tasks if they use static chunks. There doesn't seem to be much purpose for this; the backup process doesn't even use chunks. With the check removed, the backups work fine.

Second, even when backing up succeeds, restoring fails with "No such file or directory: /share/manifest.jsonl".

IMO, the problem here is not really in the restore logic, but in the backup logic. Share task backups are "heavyweight", in that they include a copy of the media files. But the storage field in task.json is still set to "share". As a result, CVAT tries to import it as a share task, and that logic is broken.

I think we should handle such backups consistently with heavyweight cloud storage backups, and set storage to "local". The restore logic will then work perfectly well. In the future, if we happen to implement lightweight backups for share tasks, we can use storage: "share" for those.

How has this been tested?

Unit tests.

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • [ ] I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

@SpecLad SpecLad force-pushed the restore-backup-from-share branch from 62343ef to c25755a Compare November 4, 2025 13:48
@SpecLad SpecLad marked this pull request as ready for review November 4, 2025 13:48
@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.49%. Comparing base (57ceda9) to head (8abde7d).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9972      +/-   ##
===========================================
+ Coverage    75.45%   75.49%   +0.03%     
===========================================
  Files          427      427              
  Lines        46234    46233       -1     
  Branches      4132     4132              
===========================================
+ Hits         34888    34903      +15     
+ Misses       11346    11330      -16     
Components Coverage Δ
cvat-ui 76.90% <ø> (ø)
cvat-server 74.29% <100.00%> (+0.06%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@zhiltsov-max zhiltsov-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check if this PR closes any other issues from the list? I can see there are 4 similar issues.

@SpecLad
Copy link
Contributor Author

SpecLad commented Nov 5, 2025

Could you check if this PR closes any other issues from the list? I can see there are 4 similar issues.

Wow, apparently everyone on the team encountered this at some point. 🤪 Updated the description.

@SpecLad SpecLad marked this pull request as draft November 5, 2025 18:09
Currently, backing up such a task succeeds, but restoring fails with "No
such file or directory: <root>/share/manifest.jsonl".

IMO, the problem is not really in the restore logic, but in the backup
logic. Share task backups are "heavyweight", in that they include a copy of
the media files. But the `storage` field in `task.json` is still set to
"share". As a result, CVAT tries to import it as a share task, and that
logic is broken.

I think we should handle such backups consistently with heavyweight cloud
storage backups, and set `storage` to "local". The restore logic will then
work perfectly well. In the future, if we happen to implement lightweight
backups for share tasks, we can use `storage: "share"` for those.
There doesn't seem to be much purpose to this check - chunks are not even used
for backups, and backups of such tasks work fine with the check removed.
@SpecLad SpecLad force-pushed the restore-backup-from-share branch from 8abde7d to 60ad3f3 Compare November 5, 2025 18:15
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 5, 2025

SpecLad added a commit to SpecLad/cvat that referenced this pull request Nov 14, 2025
…rame range

Let's say we have a cloud storage-based tasks that was created with 6 input
images, and the following settings: start frame 1, stop frame 4, frame step
2. A heavyweight backup of such a task will then contain only frames 1 and 3
in the `data` directory; however, the manifest will still contain all 6
frames. If you try to restore such a backup, CVAT will fail, because it
checks that the files in the `data` directory correspond 1:1 to the
manifest.

This could potentially be fixed in the restore code; however, it seems to me
that the backups created in this case are incorrect, as they have manifests
referencing nonexistent files. As such, I think it's more appropriate to fix
this in the backup code.

The fix is to filter the manifest during backup, leaving only entries
corresponding to frames that actually get backed up. We also need to reset
the frame range to the default, so that it matches the filtered manifest.

Note that the same bug also affects backups of tasks based on the mounted
share. It should be reasonably easy to fix (just use
`_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I
cannot test such a fix, because share backups are currently broken entirely.
So I will defer this fix until cvat-ai#9972.
SpecLad added a commit to SpecLad/cvat that referenced this pull request Nov 14, 2025
…rame range

Let's say we have a cloud storage-based tasks that was created with 6 input
images, and the following settings: start frame 1, stop frame 4, frame step
2. A heavyweight backup of such a task will then contain only frames 1 and 3
in the `data` directory; however, the manifest will still contain all 6
frames. If you try to restore such a backup, CVAT will fail, because it
checks that the files in the `data` directory correspond 1:1 to the
manifest.

This could potentially be fixed in the restore code; however, it seems to me
that the backups created in this case are incorrect, as they have manifests
referencing nonexistent files. As such, I think it's more appropriate to fix
this in the backup code.

The fix is to filter the manifest during backup, leaving only entries
corresponding to frames that actually get backed up. We also need to reset
the frame range to the default, so that it matches the filtered manifest.

Note that the same bug also affects backups of tasks based on the mounted
share. It should be reasonably easy to fix (just use
`_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I
cannot test such a fix, because share backups are currently broken entirely.
So I will defer this fix until cvat-ai#9972.
SpecLad added a commit to SpecLad/cvat that referenced this pull request Nov 14, 2025
…rame range

Let's say we have a cloud storage-based tasks that was created with 6 input
images, and the following settings: start frame 1, stop frame 4, frame step
2. A heavyweight backup of such a task will then contain only frames 1 and 3
in the `data` directory; however, the manifest will still contain all 6
frames. If you try to restore such a backup, CVAT will fail, because it
checks that the files in the `data` directory correspond 1:1 to the
manifest.

This could potentially be fixed in the restore code; however, it seems to me
that the backups created in this case are incorrect, as they have manifests
referencing nonexistent files. As such, I think it's more appropriate to fix
this in the backup code.

The fix is to filter the manifest during backup, leaving only entries
corresponding to frames that actually get backed up. We also need to reset
the frame range to the default, so that it matches the filtered manifest.

Note that the same bug also affects backups of tasks based on the mounted
share. It should be reasonably easy to fix (just use
`_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I
cannot test such a fix, because share backups are currently broken entirely.
So I will defer this fix until cvat-ai#9972.
SpecLad added a commit to SpecLad/cvat that referenced this pull request Nov 14, 2025
…rame range

Let's say we have a cloud storage-based task that was created with 6 input
images, and the following settings: start frame 1, stop frame 4, frame step
2. A heavyweight backup of such a task will then contain only frames 1 and 3
in the `data` directory; however, the manifest will still contain all 6
frames. If you try to restore such a backup, CVAT will fail, because it
checks that the files in the `data` directory correspond 1:1 to the
manifest.

This could potentially be fixed in the restore code; however, it seems to me
that the backups created in this case are incorrect, as they have manifests
referencing nonexistent files. As such, I think it's more appropriate to fix
this in the backup code.

The fix is to filter the manifest during backup, leaving only entries
corresponding to frames that actually get backed up. We also need to reset
the frame range to the default, so that it matches the filtered manifest.

Note that the same bug also affects backups of tasks based on the mounted
share. It should be reasonably easy to fix (just use
`_write_filtered_media_manifest` in the `StorageChoice.SHARE` branch), but I
cannot test such a fix, because share backups are currently broken entirely.
So I will defer this fix until cvat-ai#9972.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants