You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`write.parquet.compression-codec`|`{uncompressed,zstd,gzip,snappy}`| zstd | Sets the Parquet compression coddec. |
60
-
|`write.parquet.compression-level`| Integer | null | Parquet compression level for the codec. If not set, it is up to PyIceberg |
61
-
|`write.parquet.row-group-limit`| Number of rows | 1048576 | The upper bound of the number of entries within a single row group |
62
-
|`write.parquet.page-size-bytes`| Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk |
63
-
|`write.parquet.page-row-limit`| Number of rows | 20000 | Set a target threshold for the maximum number of rows within a column chunk |
64
-
|`write.parquet.dict-size-bytes`| Size in bytes | 2MB | Set the dictionary page size limit per row group |
65
-
|`write.metadata.previous-versions-max`| Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. |
66
-
|`write.object-storage.enabled`| Boolean | True | Enables the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider) that adds a hash component to file paths. Note: the default value of `True` differs from Iceberg's Java implementation |
67
-
|`write.object-storage.partitioned-paths`| Boolean | True | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled |
68
-
|`write.py-location-provider.impl`| String of form `module.ClassName`| null | Optional, [custom `LocationProvider`](configuration.md#loading-a-custom-location-provider) implementation |
|`write.parquet.compression-codec`|`{uncompressed,zstd,gzip,snappy}`| zstd | Sets the Parquet compression coddec. |
60
+
|`write.parquet.compression-level`| Integer | null | Parquet compression level for the codec. If not set, it is up to PyIceberg |
61
+
|`write.parquet.row-group-limit`| Number of rows | 1048576 | The upper bound of the number of entries within a single row group |
62
+
|`write.parquet.page-size-bytes`| Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk |
63
+
|`write.parquet.page-row-limit`| Number of rows | 20000 | Set a target threshold for the maximum number of rows within a column chunk |
64
+
|`write.parquet.dict-size-bytes`| Size in bytes | 2MB | Set the dictionary page size limit per row group |
65
+
|`write.metadata.previous-versions-max`| Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. |
66
+
|`write.object-storage.enabled`| Boolean | True | Enables the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider) that adds a hash component to file paths. Note: the default value of `True` differs from Iceberg's Java implementation |
67
+
|`write.object-storage.partitioned-paths`| Boolean | True | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled |
68
+
|`write.py-location-provider.impl`| String of form `module.ClassName`| null | Optional, [custom `LocationProvider`](configuration.md#loading-a-custom-location-provider) implementation |
69
+
|`write.data.path`| String pointing to location |`{metadata.location}/data`| Sets the location under which data is written. |
69
70
70
71
### Table behavior options
71
72
@@ -210,8 +211,8 @@ file paths that are optimized for object storage.
210
211
211
212
### Simple Location Provider
212
213
213
-
The `SimpleLocationProvider`places a table's file names underneath a `data` directory in the table's base storage
214
-
location (this is `table.metadata.location` - see the [Iceberg table specification](https://iceberg.apache.org/spec/#table-metadata)).
214
+
The `SimpleLocationProvider`provides paths prefixed by `{location}/data/`, where `location` comes from the [table metadata](https://iceberg.apache.org/spec/#table-metadata-fields). This can be overridden by setting [`write.data.path` table configuration](#write-options).
215
+
215
216
For example, a non-partitioned table might have a data file with location:
216
217
217
218
```txt
@@ -239,9 +240,9 @@ When several files are stored under the same prefix, cloud object stores such as
239
240
resulting in slowdowns. The `ObjectStoreLocationProvider` counteracts this by injecting deterministic hashes, in the form of binary directories,
240
241
into file paths, to distribute files across a larger number of object store prefixes.
241
242
242
-
Paths still contain partitions just before the file name, in Hive-style, and a `data` directory beneath the table's location,
243
-
in a similar manner to the [`SimpleLocationProvider`](configuration.md#simple-location-provider). For example, a table
244
-
partitioned over a string column `category` might have a data file with location: (note the additional binary directories)
243
+
Paths are prefixed by `{location}/data/`, where `location` comes from the [table metadata](https://iceberg.apache.org/spec/#table-metadata-fields), in a similar manner to the [`SimpleLocationProvider`](configuration.md#simple-location-provider). This can be overridden by setting [`write.data.path`table configuration](#write-options).
244
+
245
+
For example, a table partitioned over a string column `category` might have a data file with location: (note the additional binary directories)
0 commit comments