Skip to content

Bump PyArrow to 18.0.0 #1256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 30, 2024
Merged

Bump PyArrow to 18.0.0 #1256

merged 1 commit into from
Oct 30, 2024

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Oct 28, 2024

Fixes #1265

@bigluck
Copy link
Contributor

bigluck commented Oct 28, 2024

@Fokko follow up of our discussion on Slack: https://apache-iceberg.slack.com/archives/C029EE6HQ5D/p1730134036731089?thread_ts=1730122956.980119&cid=C029EE6HQ5D

Pyarrow 17 installed numpy too, but starting from pyarrow 18 they removed the dependency.

apache/arrow#44148

the io/pyarrow.py file imports numpy, so it can happens that the import of the pyarrow io strategy fails and it falls back to the s3fs strategy, hoping the user has the package installed on his system.

@Fokko
Copy link
Contributor Author

Fokko commented Oct 28, 2024

@bigluck Thanks, that's a great catch. We only use the positional deletes to combine the positional deletes (when there are more positional deletes per file). It would be great to see if we can remove this and also make the numpy dependency optional. It is quite a big one and would be nice to get rid of.

@kevinjqliu
Copy link
Contributor

kevinjqliu commented Oct 28, 2024

opened #1259 to continue the numpy deprecation conversation.
Optionally, we can temporary bring in numpy as a project dependency before exploring its deprecation

@Fokko
Copy link
Contributor Author

Fokko commented Oct 29, 2024

Keep in mind that the CI passes here because we have numpy as a PySpark dependency :)

@kevinjqliu kevinjqliu added this to the PyIceberg 0.8.0 release milestone Oct 30, 2024
@Fokko Fokko merged commit b2da8c7 into apache:main Oct 30, 2024
7 checks passed
@Fokko Fokko deleted the fd-bump-pyarrow branch October 30, 2024 20:29
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pyarrow 18 regression: ValueError: type(schema)=<class 'pyarrow.lib.Schema'>
5 participants