Description
Bug Report
dvc pull
Description
dvc pull crashes with sqlite3.OperationError: disk I/O error
Reproduce
this happens trying to pull a 420G of data on an Amazon FSx Lustre filesystem.
I complete the git clone
I only do a dvc pull, after many hours of operation. I get
the mentioned error.
Expected
dvc pull to complete
Environment information
[ec2-user@ip-10-0-1-122 ~]$ dvc doctor
DVC version: 3.53.0 (pip)
Platform: Python 3.9.16 on Linux-6.1.97-104.177.amzn2023.x86_64-x86_64-with-glibc2.34
Subprojects:
dvc_data = 3.15.1
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.6
Supports:
http (aiohttp = 3.10.0, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.0, aiohttp-retry = 2.8.3),
s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
Config:
Global: /home/ec2-user/.config/dvc
System: /etc/xdg/dvc
Output of dvc doctor
:
$ dvc doctor
Additional Information (if any):
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 211, in main
ret = cmd.do_run()
File "/usr/local/lib/python3.9/site-packages/dvc/cli/command.py", line 27, in do_run
return self.run()
File "/usr/local/lib/python3.9/site-packages/dvc/commands/data_sync.py", line 35, in run
stats = self.repo.pull(
File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dvc/repo/pull.py", line 42, in pull
stats = self.checkout(
File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dvc/repo/checkout.py", line 142, in checkout
diff = compare(old, new, relink=relink, delete=True, callback=pb.as_callback())
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/checkout.py", line 315, in compare
ret = _compare(
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/checkout.py", line 243, in _compare
for change in idiff(
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 320, in diff
yield from changes
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 230, in _diff
new_dir_items, new_unknown = _get_items(new, key, new_entry, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 152, in _get_items
items = dict(index.ls(key, detail=True))
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/view.py", line 128, in ls
self._index._ensure_loaded(root_key)
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/index.py", line 759, in _ensure_loaded
entry = self.get(prefix)
File "/usr/lib64/python3.9/_collections_abc.py", line 763, in get
return self[key]
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/index.py", line 671, in getitem
item = self._trie.get(key)
File "/usr/lib64/python3.9/_collections_abc.py", line 763, in get
return self[key]
File "/usr/local/lib/python3.9/site-packages/sqltrie/serialized.py", line 58, in getitem
raw = self._trie[key]
File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 266, in getitem
row = self._get_node(key)
File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 202, in _get_node
rows = list(self._traverse(key))
File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 191, in _traverse
self._conn.executescript(STEPS_SQL.format(path=path, root=self._root_id))
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/dvc", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 236, in main
ret = _log_exceptions(exc) or 255
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 147, in _log_exceptions
_log_unknown_exceptions()
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 49, in _log_unknown_exceptions
logger.debug("Version info for developers:\n%s", get_dvc_info())
File "/usr/local/lib/python3.9/site-packages/dvc/info.py", line 38, in get_dvc_info
with Repo() as repo:
File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 209, in init
self.state = State(self.root_dir, self.site_cache_dir, self.dvcignore)
File "/usr/local/lib/python3.9/site-packages/dvc_data/hashfile/state.py", line 92, in init
self.links = Cache(links_dir)
File "/usr/local/lib/python3.9/site-packages/dvc_data/hashfile/cache.py", line 59, in init
super().init(directory=directory, timeout=timeout, disk=disk, **settings)
File "/usr/local/lib/python3.9/site-packages/diskcache/core.py", line 478, in init
self.reset(key, value, update=False)
File "/usr/local/lib/python3.9/site-packages/diskcache/core.py", line 2431, in reset
((old_value,),) = sql(
sqlite3.OperationalError: disk I/O error