-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Setup is:
- Server: AlmaLinux 9, rustic-server 0.4.4, running two instances
- Instance 1: 4 repos
- Instance 2: 2 repos
- Instance 2 clients: all Windows 10, rustic 0.9.5 installed using
cargo binstall
On multiple occasions the hard drive of the instance 2 has been filled completely leaving no free space for a running rustic backup operation, when this had happened sometimes I didn't noticed it right away so subsequent backups (it's automated) failed, maybe even on some occasions I freed some space on the drive and do nothing else rustic-wise so after a failed backup another one ran successfully, and so on. I figured no free space worst thing that can happen is simply no new snapshots but didn't actually accounted for when the drive gets full while a backup is running.
Then I started to run check on the repos. First time I ran the check was August and it detected some Hash mismatch errors, I "fixed" them with the repair option by following some instructions I was given in the discord chat. Then one month later I ran check again and again it detected some errors that I repaired following the same steps I was given the first time. Then a few days ago, this third time I ran check again and it detected different errors like:
subtree blob cb96a9ac is missing in index
And
[WARN] tree 8fb287a6 could not be loaded: Error: Tree ID `8fb287a6` not found in index (kind: related to internal operations)
[WARN] dir Camera: tree is missing
For these I ran the repair command and noticed it tagged all the snapshots from one of the paths in one of the repos, not only like the last 10 or 20 snapshots, all of them. So even snapshots that the previous month returned no error were corrupted now.
I'm suspecting this must be related to the lack of free space while the backup runs. I obviously am ignorant on how exactly rustic works but could an explanation for old data being corrupted because new data could not be written might be that rustic tried to modify a pack/index and failed because no available space and that pack/index belonged to this old data and thus all the snapshots report corruption?
I'm using rustic-server which is why I'm reporting this here.
Drive SMART health is OK, there are no errors. Filesystem is XFS.
Related #10 ?
EDIT: Something I forgot to say is that I have also gotten multiple lines like:
[WARN] pack 378ccb33 not referenced in index. Can be a parallel backup job. To repair: 'rustic repair index'.
And I always assumed was because, well, backups have failed mid-backup when drive got full, I don't know if this matters.