Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec additions for encryption #12162

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Version 3 of the Iceberg spec extends data types and existing metadata structure
* Multi-argument transforms for partitioning and sorting
* Row Lineage tracking
* Binary deletion vectors
* Table encryption


## Goals
Expand Down Expand Up @@ -685,6 +686,8 @@ A snapshot consists of the following fields:
| _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created |
| | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) |
| | | _optional_ | **`added-rows`** | Sum of the [`added_rows_count`](#manifest-lists) from all manifests added in this snapshot. Required if [Row Lineage](#row-lineage) is enabled |
| | | _optional_ | **`encrypted-key-metadata`** | Base64-encoded key metadata of the manifest list file in an encrypted table. The key metadata is encrypted by a metadata encryption key before encoding |
| | | _optional_ | **`key-metadata-key-id`** | The ID of the encryption key that encrypts the manifest list key metadata |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After my chat with Russell, I think I would prefer to have all keys in table metadata separate from snapshot. That leaves a lot of flexibility and only requires adding a key-id here.



The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are:
Expand Down Expand Up @@ -889,6 +892,9 @@ Table metadata consists of the following fields:
| _optional_ | _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). |
| | | _optional_ | **`row-lineage`** | A boolean, defaulting to false, setting whether or not to track the creation and updates to rows in the table. See [Row Lineage](#row-lineage). |
| | | _optional_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). |
| | | _optional_ | **`key-cache`** | A list of encryption keys (key-id/key-wrap pairs), used to encrypt the manifest list file key metadata. See [Snapshot](#key-metadata-key-id). |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, if we really need multiple keys here. This would allow folks to use old keys after a key rotation but is that something we really want to enable? Shouldn't rotating a key wipe out the ability to use a new key and require writing new manifest lists and replacing old ones?

I know we can get similar behavior with a Rotate + Expire, but is that a real use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked a little about this idea with Russell and I think the trade-off is whether to re-encrypt all of the snapshot keys when rotating the table key. If we re-encrypt the snapshot keys, that could take a while when there are potentially thousands of snapshots in the table. On the other hand, in practice it would be a small set and rotating the table key would be rare.

I think I like the idea of a single table key that is used to decrypt all of the snapshot keys. It would require a structure that allows us to change the content of the snapshot keys (a separate list of them that is mutable) but that wouldn't be too difficult.

@ggershinsky, what do you think about that idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll experiment with these suggestions.

| | | _optional_ | **`key-id`** | The ID of the encryption key that encrypts the manifest list key metadata. See [Snapshot](#key-metadata-key-id). |
| | | _optional_ | **`key-wrap`** | Wrapped (encrypted) metadata encryption key. Wrapping can for example be done in a Mey Management Service (KMS). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these intended to be fields inside of key-cache? I think that keys in table metadata should be a list of objects and those objects should have a table defining the structure. That's how we've added other structures, like schemas: "A list of schemas, stored as objects with schema-id".

Something like this:

_optional_ | **`encryption-keys`** | A list of encryption keys, stored as objects. |

...

Each encryption key within the `encryption-keys` table metadata field is a struct with the following fields:
| v1/v2 | v3 | Field name | Type | Description |
|----|----|------------|------|-------------|
|  | _required_ | **`key-id`** | `string` | ID used to refer to this key in table metadata. |
...

One thing we can do to make this more flexible is to allow each key to have a key ID that encrypted it, or to indicate that it was wrapped by KMS. That would allow using the same encryption-keys list for multiple strategies:

  • Wrap each snapshot key using KMS
  • Wrap a table key with KMS that is used to encrypt snapshot keys
    • Re-encrypt with the new KMS key when rotating
    • Leave the existing KMS-wrapped key when rotating


For serialization details, see Appendix C.

Expand Down