Skip to content

dvc pull/fetch: corrupted cache with GDriveΒ #10525

Open
@ermolaev94

Description

@ermolaev94

Bug Report

Description

I've faced with corrupted files pulled from GDrive remote. This "corruption" is not reproducable - on one machine it happens while on other it doesn't. I'll try to desccribe it in more details below:

File Description

Artifact is a folder consisting from 4 elements. DVC can't download only one of them.

  • folder hash: 68cfceac0911615ed2e552b1e52c0eaa.dir
  • folder content: [{"md5": "0ed51826c555065daf92319e2c7a56d2", "relpath": "bin.h5"}, {"md5": "bf6a6868cef0b76c896bb2fd4d494e7b", "relpath": "fractures.h5"}, {"md5": "9cc12ebcc12ebee440e4ad02e1dbdb6a0182b2", "relpath": "meta.h5"}, {"md5": "eba375f97ab33bee5d96785111d8d1f9", "relpath": "ribs.h5"}]
  • dvc pulls all files except bin.h5

Error Scenario β„–1

$ ssh <server>
$ cd /path/to/repo
$ git checkout develop
$ $ dvc pull ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 --no-run-cache -v                                     
2024-08-15 12:49:02,036 DEBUG: v3.53.2 (pip), CPython 3.10.12 on Linux-5.15.0-71-generic-x86_64-with-glibc2.35
2024-08-15 12:49:02,037 DEBUG: command: /home/ermolaev/projects/radml/venv/bin/dvc pull ../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 --no-run-cache -v
2024-08-15 12:49:02,481 DEBUG: Lockfile '../05_unf_dt/dvc.lock' needs to be updated.
2024-08-15 12:49:02,773 DEBUG: Lockfile for '../06_cage_sgm/dvc.yaml' not found
2024-08-15 12:49:04,241 DEBUG: Checking if stage '../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5' is in 'dvc.yaml'
Collecting                                                                                                                        |2.00 [00:00, 4.05entry/s]
2024-08-15 12:49:07,769 DEBUG: Preparing to transfer data from 'gdrive://1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5' to '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 12:49:07,770 DEBUG: Preparing to collect status from '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 12:49:07,770 DEBUG: Collecting status from '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 12:49:07,775 DEBUG: Preparing to collect status from '1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5'
2024-08-15 12:49:07,775 DEBUG: Collecting status from '1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5'                                                         
2024-08-15 12:49:07,783 DEBUG: Querying 19 oids via object_exists
2024-08-15 12:49:11,906 DEBUG: Querying 0 oids via object_exists              
Fetching                               
  0%|          |Fetching from gdrive                                                                                              0/1 [00:00<?,     ?file/s]
  1%|▏         |1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5/0e/d51826c555065daf92319e2c7a56d2                          100M/7.31G [00:02<03:13,    40.0MB/s]
Computing md5 for a large file '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5/0e/d51826c555065daf92319e2c7a56d2'. This is only done once.                                                                                                                                                        
2024-08-15 12:52:16,408 DEBUG: corrupted cache file '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5/0e/d51826c555065daf92319e2c7a56d2'.                                                                                                                                                           
2024-08-15 12:52:16,408 DEBUG: Removing '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5/0e/d51826c555065daf92319e2c7a56d2'            
2024-08-15 12:52:17,292 DEBUG: Preparing to transfer data from 'gdrive://1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL' to '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache'                                                                                                                                             
2024-08-15 12:52:17,292 DEBUG: Preparing to collect status from '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache'
2024-08-15 12:52:17,292 DEBUG: Collecting status from '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache'
Fetching
2024-08-15 12:52:17,805 DEBUG: Lockfile for '../06_cage_sgm/dvc.yaml' not found
2024-08-15 12:52:19,269 DEBUG: Checking if stage '../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5' is in 'dvc.yaml'
Building workspace index                                                                                                          |7.00 [00:00, 7.96entry/s]
Comparing indexes                                                                                                                |8.00 [00:00, 1.14kentry/s]
2024-08-15 12:52:20,318 DEBUG: Removing '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5'
Applying changes                                                                                                                  |1.00 [00:00, 1.14kfile/s]
M       ../../data/full_datasets/fractures_1123_seg/h5-corrected/test/
1 file modified and 1 file fetched
2024-08-15 12:52:20,358 DEBUG: Analytics is enabled.
2024-08-15 12:52:20,441 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpvgz885uf', '-v']
2024-08-15 12:52:20,449 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpvgz885uf', '-v'] with pid 5035

So, it found corrupted data and just say it in debug information. No error, no warning. I see this message only because I ran command with -v flag. Ok, I suggested that data is really corrupted i.e. cache is not the same.

Manual Cache Check

I decided to check what md5 is really for the file that was defined as corrupted. I went to "1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5/0e/d51826c555065daf92319e2c7a56d2" and downloaded it.

image

$ md5sum d51826c555065daf92319e2c7a56d2 
0ed51826c555065daf92319e2c7a56d2  d51826c555065daf92319e2c7a56d2

And cache of the file is ok.

Use own account instead of service acc

I noticed that owner is not the same for files that were pulled and files that weren't. It happens because I decided to use service account several months ago. I decided to try old auth way and to do auth on server by traslating 8080 port, but google blocks it (#10516):

image

Error Scenario β„–2

I decided to pull the same data on local machine and there is no error:

$ git clone <...>
$ git checkout develop
$ dvc pull ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 -v --no-run-cache
2024-08-15 13:13:47,731 DEBUG: v3.53.1 (pip), CPython 3.10.14 on Linux-6.8.0-31-generic-x86_64-with-glibc2.39
2024-08-15 13:13:47,731 DEBUG: command: /home/ermolaev/projects/radml/venv/bin/dvc pull ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 -v --no-run-cache
2024-08-15 13:13:48,649 DEBUG: Lockfile for 'ribs/pipelines/06_cage_sgm/dvc.yaml' not found
2024-08-15 13:13:48,678 DEBUG: Lockfile 'ribs/pipelines/05_unf_dt/dvc.lock' needs to be updated.
2024-08-15 13:13:48,836 DEBUG: Checking if stage 'ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5' is in 'dvc.yaml'
2024-08-15 13:13:49,609 DEBUG: failed to load ('ribs', 'data', 'full_datasets', 'fractures_1123_seg', 'h5-corrected', 'test') from storage local (/tmp/cvl-cvisionrad-ml/.dvc/cache/files/md5) - [Errno 2] No such file or directory: '/tmp/cvl-cvisionrad-ml/.dvc/cache/files/md5/68/cfceac0911615ed2e552b1e52c0eaa.dir'
Traceback (most recent call last):
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 611, in _load_from_storage
    _load_from_object_storage(trie, entry, storage)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 547, in _load_from_object_storage
    obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_data/hashfile/tree.py", line 193, in load
    with obj.fs.open(obj.path, "r") as fobj:
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 324, in open
    return self.fs.open(path, mode=mode, **kwargs)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_objects/fs/local.py", line 131, in open
    return open(path, mode=mode, encoding=encoding)  # noqa: SIM115
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/cvl-cvisionrad-ml/.dvc/cache/files/md5/68/cfceac0911615ed2e552b1e52c0eaa.dir'

Collecting                                                                                                                        |2.00 [00:04, 2.24s/entry]
2024-08-15 13:13:54,091 DEBUG: Preparing to transfer data from 'gdrive://1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5' to '/tmp/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 13:13:54,091 DEBUG: Preparing to collect status from '/tmp/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 13:13:54,091 DEBUG: Collecting status from '/tmp/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 13:13:54,093 DEBUG: Preparing to collect status from '1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5'
2024-08-15 13:13:54,093 DEBUG: Collecting status from '1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5'                                                         
2024-08-15 13:13:54,093 DEBUG: Querying 1 oids via object_exists
2024-08-15 13:13:57,453 DEBUG: Indexing new .dir '68cfceac0911615ed2e552b1e52c0eaa.dir' with '4' nested files                                               
2024-08-15 13:13:59,302 DEBUG: transfer dir: md5: 68cfceac0911615ed2e552b1e52c0eaa.dir with 1 files                                                         
Fetching                               
  0%|          |Fetching from gdrive                                                                                              0/1 [00:01<?,     ?file/s]
  4%|▍         |1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL/files/md5/0e/d51826c555065daf92319e2c7a56d2                          300M/7.31G [00:10<04:11,    29.9MB/s]
Computing md5 for a large file '/tmp/cvl-cvisionrad-ml/.dvc/cache/files/md5/0e/d51826c555065daf92319e2c7a56d2'. This is only done once.                     
2024-08-15 13:18:56,911 DEBUG: Preparing to transfer data from 'gdrive://1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL' to '/tmp/cvl-cvisionrad-ml/.dvc/cache'          
2024-08-15 13:18:56,912 DEBUG: Preparing to collect status from '/tmp/cvl-cvisionrad-ml/.dvc/cache'                                                         
2024-08-15 13:18:56,913 DEBUG: Collecting status from '/tmp/cvl-cvisionrad-ml/.dvc/cache'                                                                   
Fetching
2024-08-15 13:18:57,664 DEBUG: Lockfile for 'ribs/pipelines/06_cage_sgm/dvc.yaml' not found
2024-08-15 13:18:57,859 DEBUG: Checking if stage 'ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5' is in 'dvc.yaml'
Building workspace index                                                                                                         |5.00 [00:00, 12.9kentry/s]
Comparing indexes                                                                                                                 |8.00 [00:00,  627entry/s]
Applying changes                                                                                                                  |1.00 [00:00,   550file/s]A       ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/
1 file added and 2 files fetched
2024-08-15 13:18:58,499 DEBUG: Analytics is enabled.
2024-08-15 13:18:58,533 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp3cojpnij', '-v']
2024-08-15 13:18:58,540 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp3cojpnij', '-v'] with pid 173652

Error scenario β„–3

I've also tried one more way on the machine which wasn't able to download this file before

$ dvc get ../../../ ../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 -v
2024-08-15 13:15:32,242 DEBUG: v3.53.2 (pip), CPython 3.10.12 on Linux-5.15.0-71-generic-x86_64-with-glibc2.35
2024-08-15 13:15:32,242 DEBUG: command: /home/ermolaev/projects/radml/venv/bin/dvc get ../../../ ../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 -v
2024-08-15 13:15:32,696 DEBUG: Lockfile '../05_unf_dt/dvc.lock' needs to be updated.
2024-08-15 13:15:32,989 DEBUG: Lockfile for '../06_cage_sgm/dvc.yaml' not found
2024-08-15 13:19:55,415 DEBUG: Analytics is enabled.                                                                                                        
2024-08-15 13:19:55,482 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp_5_14ube', '-v']                                                            
2024-08-15 13:19:55,489 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp_5_14ube', '-v'] with pid 5162
(venv) ermolaev@ed943d457e9c:~/projects/radml/cvl-cvisionrad-ml/ribs/pipelines/01_ds_gen_and_analysis$ l
bin.h5  correction.json  dvc.lock  dvc.yaml  params.yaml  README.md
(venv) ermolaev@ed943d457e9c:~/projects/radml/cvl-cvisionrad-ml/ribs/pipelines/01_ds_gen_and_analysis$ md5sum bin.h5 
b053f9713f406497bbe6881d926718f3  bin.h5

the same command, on the same rev, with the same dependencies, but on other machine:

$ dvc get . ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 -v
2024-08-15 13:25:02,189 DEBUG: v3.53.1 (pip), CPython 3.10.14 on Linux-6.8.0-31-generic-x86_64-with-glibc2.39
2024-08-15 13:25:02,189 DEBUG: command: /home/ermolaev/projects/radml/venv/bin/dvc get . ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 -v
2024-08-15 13:25:03,052 DEBUG: Lockfile for 'ribs/pipelines/06_cage_sgm/dvc.yaml' not found
2024-08-15 13:25:03,081 DEBUG: Lockfile 'ribs/pipelines/05_unf_dt/dvc.lock' needs to be updated.
2024-08-15 13:29:30,956 DEBUG: Analytics is enabled.                                                                                                        
2024-08-15 13:29:30,984 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp96wu6zyl', '-v']                                                            
2024-08-15 13:29:30,990 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp96wu6zyl', '-v'] with pid 174332
(venv) ermolaev@alamak:/tmp/cvl-cvisionrad-ml$ md5sum bin.h5 
0ed51826c555065daf92319e2c7a56d2  bin.h5

Now I see this file and see what his hash is. And it differs by some reason.

Comparing files content

I decided to compare real data from arrays and found out that there are 3 places with byte differences, that's why files and their hashes are not totally the same. But their content is closely the same. It's not very clear why it happens.

  • error-system dvc doctor
DVC version: 3.53.2 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-5.15.0-71-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.6
Supports:
        gdrive (pydrive2 = 1.20.0),
        http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
Config:
        Global: /home/ermolaev/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/cvg-home
Caches: local
Remotes: gdrive, gdrive, gdrive, s3
Workspace directory: ext4 on /dev/mapper/cvg-home
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/35b0ceeafaa400653846b67148221777
  • ok-system dvc doctor
DVC version: 3.53.1 (pip)
-------------------------
Platform: Python 3.10.14 on Linux-6.8.0-31-generic-x86_64-with-glibc2.39
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.6
Supports:
        gdrive (pydrive2 = 1.20.0),
        http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
Config:
        Global: /home/ermolaev/.config/dvc
        System: /etc/xdg/xdg-ubuntu/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: gdrive, gdrive, gdrive, s3
Workspace directory: ext4 on /dev/nvme0n1p8
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/2df345bb176cbb6c202588cdd37f2a72

RClone

I decided to try download the same file using rclone to exclude system error i.e. ubuntu makes some changes while flashing data on disk.

$ rclone copy -P 'radml:/Y_DVC_CACHE_(DO_NOT_MODIFY!!!)/files/md5/0e/d51826c555065daf92319e2c7a56d2' .                                
Transferred:        4.567 GiB / 7.313 GiB, 62%, 42.477 MiB/s, ETA 1m6s
Transferred:            0 / 1, 0%
Elapsed time:      1m17.3s
Transferring:
 *                d51826c555065daf92319e2c7a56d2: 62% /7.313Gi, 42.418Mi/s, 1m6s
$ md5sum d51826c555065daf92319e2c7a56d2 
0ed51826c555065daf92319e2c7a56d2  d51826c555065daf92319e2c7a56d2

No error via rclone i.e. file hash is correct.

S3

I decided to try download the same file using S3 remote that I also have

$ dvc pull ../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 --no-run-cache -v -r yadrive
2024-08-15 14:16:56,230 DEBUG: v3.53.2 (pip), CPython 3.10.12 on Linux-5.15.0-71-generic-x86_64-with-glibc2.35
2024-08-15 14:16:56,230 DEBUG: command: /home/ermolaev/projects/radml/venv/bin/dvc pull ../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 --no-run-cache -v -r yadrive
2024-08-15 14:16:56,685 DEBUG: Lockfile '../05_unf_dt/dvc.lock' needs to be updated.
2024-08-15 14:16:56,977 DEBUG: Lockfile for '../06_cage_sgm/dvc.yaml' not found
2024-08-15 14:16:58,429 DEBUG: Checking if stage '../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5' is in 'dvc.yaml'
Collecting                                                                                                                        |2.00 [00:00, 4.33entry/s]
2024-08-15 14:17:00,671 DEBUG: Preparing to transfer data from 's3://cvisionrad-ml-data/files/md5' to '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 14:17:00,671 DEBUG: Preparing to collect status from '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 14:17:00,671 DEBUG: Collecting status from '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache/files/md5'
2024-08-15 14:17:00,674 DEBUG: Preparing to collect status from 'cvisionrad-ml-data/files/md5'
2024-08-15 14:17:00,674 DEBUG: Collecting status from 'cvisionrad-ml-data/files/md5'                                                                        
2024-08-15 14:17:00,674 DEBUG: Querying 1 oids via object_exists
2024-08-15 14:17:00,868 DEBUG: Indexing new .dir '68cfceac0911615ed2e552b1e52c0eaa.dir' with '4' nested files                                               
Fetching                               
  0%|          |Fetching from s3                                                                                                  0/1 [00:00<?,     ?file/s]
  1%|          |cvisionrad-ml-data/files/md5/0e/d51826c555065daf92319e2c7a56d2                                        69.4M/7.31G [00:08<19:40,    6.59MB/s]
2024-08-15 14:19:51,356 DEBUG: Preparing to transfer data from 's3://cvisionrad-ml-data' to '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache'    
2024-08-15 14:19:51,357 DEBUG: Preparing to collect status from '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache'                                
2024-08-15 14:19:51,357 DEBUG: Collecting status from '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/.dvc/cache'                                          
Fetching
2024-08-15 14:19:51,928 DEBUG: Lockfile for '../06_cage_sgm/dvc.yaml' not found
2024-08-15 14:19:53,406 DEBUG: Checking if stage '../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5' is in 'dvc.yaml'
Computing md5 for a large file '/home/ermolaev/projects/radml/cvl-cvisionrad-ml/ribs/data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5'. This is only done once.
Building workspace index                                                                                                          |7.00 [00:11, 1.67s/entry]
Comparing indexes                                                                                                                |8.00 [00:00, 1.14kentry/s]
Applying changes                                                                                                                  |0.00 [00:00,     ?file/s]
1 file fetched
2024-08-15 14:20:05,358 DEBUG: Analytics is enabled.
2024-08-15 14:20:05,449 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpx2_m5qv7', '-v']
2024-08-15 14:20:05,456 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpx2_m5qv7', '-v'] with pid 5487

$ md5sum ../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5 
0ed51826c555065daf92319e2c7a56d2  ../../data/full_datasets/fractures_1123_seg/h5-corrected/test/bin.h5

We can see that file hash is stil correct, so error happens only with GDrive, but GDrive stores correct data.

Conclusions

I see one from 2 possible errors:

  • dvc downloader from google-drive produce byte-errors during downloading, very rarely, but stil and such errors changes final hash
  • dvc copier from temporary to cache produces byte-errors

Reproduce

I don't know how to reproduce this error. During more than a 2 years of usage I've faced with this error only 3 times and I don't see strict scenario how to create such problem intentionally.

Expected

Hash must be the same.

Environment information

Ubuntu 22.04

Output of dvc doctor:

DVC version: 3.53.2 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-5.15.0-71-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.6
Supports:
        gdrive (pydrive2 = 1.20.0),
        http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
Config:
        Global: /home/ermolaev/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/cvg-home
Caches: local
Remotes: gdrive, gdrive, gdrive, s3
Workspace directory: ext4 on /dev/mapper/cvg-home
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/35b0ceeafaa400653846b67148221777

Additional Information (if any):

Setting "verify = false" helps, data is correct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/push

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions