-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow setting write.parquet.row-group-limit
#1016
Conversation
And update the docs
5b91696
to
46afeaf
Compare
LGTM @Fokko - merging in the change from main to resolve the conflict on the doc |
…o/iceberg-python into fd-allow-setting-max-row-group-size
Also threw in a test here 👍 |
@@ -32,6 +32,7 @@ Iceberg tables support table properties to configure table behavior. | |||
| -------------------------------------- | --------------------------------- | ------- | ------------------------------------------------------------------------------------------- | | |||
| `write.parquet.compression-codec` | `{uncompressed,zstd,gzip,snappy}` | zstd | Sets the Parquet compression coddec. | | |||
| `write.parquet.compression-level` | Integer | null | Parquet compression level for the codec. If not set, it is up to PyIceberg | | |||
| `write.parquet.row-group-limit` | Number of rows | 1048576 | The upper bound of the number of entries within a single row group | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko @sungwy Thanks, I believe this has resolved my issue #1012 as well.
However, I would like to remind you that this option already exists in the doc, right after write.parquet.dict-size-bytes
, the UI doesn't allow me to leave a comment there, so please expand the collapsed area to see it.
Additionally, I'm kind of curious as to why the default value used this time is significantly larger than the previous one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for flagging this @zhongyujiang - I'll get the second one below with the older default value removed.
To my understanding the new value is the correct default value that matches the default in the PyArrow ParquetWriter: https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html
* Allow setting `write.parquet.row-group-limit` And update the docs * Add test * Make ruff happy --------- Co-authored-by: Sung Yun <[email protected]>
* Allow setting `write.parquet.row-group-limit` And update the docs * Add test * Make ruff happy --------- Co-authored-by: Sung Yun <[email protected]>
* Allow setting `write.parquet.row-group-limit` And update the docs * Add test * Make ruff happy --------- Co-authored-by: Sung Yun <[email protected]>
* Allow setting `write.parquet.row-group-limit` And update the docs * Add test * Make ruff happy --------- Co-authored-by: Sung Yun <[email protected]>
And update the docs
Fixes #1013