Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled #543

Merged
merged 1 commit into from
Apr 7, 2024

Conversation

amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Mar 24, 2024

We cannot pickle FileIO implementations currently due to the file system initialization being inside a LruCache wrapper instance.

For example if one tries to pickle.dumps(pyarowfileIO) now they will encounter the following error.

_pickle.PicklingError: Can't pickle <functools._lru_cache_wrapper object at 0x75ebdf179010>: it's not the same object as pyiceberg.io.pyarrow.PyArrowFileIO._initialize_fs

This change implements getstate and setstate on PyArrowFileIO and FsSpecFileIO so that they can be pickled

This is a pre-requisite to being able to serialize the Table instance.

@amogh-jahagirdar amogh-jahagirdar force-pushed the serializable-file-io-impls branch 4 times, most recently from 76f9620 to 7ef4e5e Compare April 7, 2024 00:45
@amogh-jahagirdar amogh-jahagirdar requested review from Fokko and HonahX April 7, 2024 00:57
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for working on this @amogh-jahagirdar . Just some small comments.

tests/io/test_fsspec.py Show resolved Hide resolved
tests/io/test_fsspec.py Outdated Show resolved Hide resolved
tests/io/test_fsspec.py Show resolved Hide resolved
@amogh-jahagirdar amogh-jahagirdar force-pushed the serializable-file-io-impls branch from 7ef4e5e to 84eae82 Compare April 7, 2024 15:08
@amogh-jahagirdar amogh-jahagirdar force-pushed the serializable-file-io-impls branch from 84eae82 to 1720211 Compare April 7, 2024 15:18
@amogh-jahagirdar amogh-jahagirdar requested a review from HonahX April 7, 2024 15:43
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @amogh-jahagirdar Thanks for fixing this!

@HonahX HonahX merged commit 1016b19 into apache:main Apr 7, 2024
7 checks passed
@HonahX
Copy link
Contributor

HonahX commented Apr 7, 2024

Merged, Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants