Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py", line 581, in _load deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly) RuntimeError: unexpected EOF, expected 20664312 more bytes. The file might be corrupted. terminate called after throwing an instance of 'c10::Error' #98

Closed
perlman-izzy opened this issue May 31, 2020 · 1 comment

Comments

@perlman-izzy
Copy link

Running on a Vast.AI server, 4x GTX 1080 Ti/ Xeon® E5-2650 0/ 16.0/16 cores
64/64 GB

Program runs for about 2-3 minutes and throws the following error: Any help appreciated.

py", line 581, in _load

deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)

RuntimeError: unexpected EOF, expected 20664312 more bytes. The file might be corrupted.

terminate called after throwing an instance of 'c10::Error'

what(): owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() > 0 ASSERT FAILED at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/util/intrusive_ptr.h:350)

frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fc5a1c8ddc5 in /opt/conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libc10.so)

frame #1: THStorage_free + 0xca (0x7fc5a29d120a in /opt/conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libcaffe2.so)

frame #2: + 0x14872d (0x7fc5d0cb272d in /opt/conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #26: __libc_start_main + 0xf0 (0x7fc5df535830 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

@prafullasd
Copy link
Collaborator

prafullasd commented May 31, 2020

See #25, your model checkpoint got corrupted so you need to delete the downloaded checkpoints from the cache in ~/.cache/ and redownload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants