-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec additions for encryption #12162
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -53,6 +53,7 @@ Version 3 of the Iceberg spec extends data types and existing metadata structure | |
* Multi-argument transforms for partitioning and sorting | ||
* Row Lineage tracking | ||
* Binary deletion vectors | ||
* Table encryption | ||
|
||
|
||
## Goals | ||
|
@@ -685,6 +686,8 @@ A snapshot consists of the following fields: | |
| _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | ||
| | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | | ||
| | | _optional_ | **`added-rows`** | Sum of the [`added_rows_count`](#manifest-lists) from all manifests added in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | | ||
| | | _optional_ | **`encrypted-key-metadata`** | Base64-encoded key metadata of the manifest list file in an encrypted table. The key metadata is encrypted by a metadata encryption key before encoding | | ||
| | | _optional_ | **`key-metadata-key-id`** | The ID of the encryption key that encrypts the manifest list key metadata | | ||
|
||
|
||
The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: | ||
|
@@ -889,6 +892,9 @@ Table metadata consists of the following fields: | |
| _optional_ | _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). | | ||
| | | _optional_ | **`row-lineage`** | A boolean, defaulting to false, setting whether or not to track the creation and updates to rows in the table. See [Row Lineage](#row-lineage). | | ||
| | | _optional_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). | | ||
| | | _optional_ | **`key-cache`** | A list of encryption keys (key-id/key-wrap pairs), used to encrypt the manifest list file key metadata. See [Snapshot](#key-metadata-key-id). | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was wondering, if we really need multiple keys here. This would allow folks to use old keys after a key rotation but is that something we really want to enable? Shouldn't rotating a key wipe out the ability to use a new key and require writing new manifest lists and replacing old ones? I know we can get similar behavior with a Rotate + Expire, but is that a real use case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I talked a little about this idea with Russell and I think the trade-off is whether to re-encrypt all of the snapshot keys when rotating the table key. If we re-encrypt the snapshot keys, that could take a while when there are potentially thousands of snapshots in the table. On the other hand, in practice it would be a small set and rotating the table key would be rare. I think I like the idea of a single table key that is used to decrypt all of the snapshot keys. It would require a structure that allows us to change the content of the snapshot keys (a separate list of them that is mutable) but that wouldn't be too difficult. @ggershinsky, what do you think about that idea? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I'll experiment with these suggestions. |
||
| | | _optional_ | **`key-id`** | The ID of the encryption key that encrypts the manifest list key metadata. See [Snapshot](#key-metadata-key-id). | | ||
| | | _optional_ | **`key-wrap`** | Wrapped (encrypted) metadata encryption key. Wrapping can for example be done in a Mey Management Service (KMS). | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these intended to be fields inside of Something like this:
One thing we can do to make this more flexible is to allow each key to have a key ID that encrypted it, or to indicate that it was wrapped by KMS. That would allow using the same
|
||
|
||
For serialization details, see Appendix C. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After my chat with Russell, I think I would prefer to have all keys in table metadata separate from snapshot. That leaves a lot of flexibility and only requires adding a
key-id
here.