-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec additions for encryption #12162
base: main
Are you sure you want to change the base?
Conversation
@@ -889,6 +892,9 @@ Table metadata consists of the following fields: | |||
| _optional_ | _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). | | |||
| | | _optional_ | **`row-lineage`** | A boolean, defaulting to false, setting whether or not to track the creation and updates to rows in the table. See [Row Lineage](#row-lineage). | | |||
| | | _optional_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). | | |||
| | | _optional_ | **`key-cache`** | A list of encryption keys (key-id/key-wrap pairs), used to encrypt the manifest list file key metadata. See [Snapshot](#key-metadata-key-id). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering, if we really need multiple keys here. This would allow folks to use old keys after a key rotation but is that something we really want to enable? Shouldn't rotating a key wipe out the ability to use a new key and require writing new manifest lists and replacing old ones?
I know we can get similar behavior with a Rotate + Expire, but is that a real use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked a little about this idea with Russell and I think the trade-off is whether to re-encrypt all of the snapshot keys when rotating the table key. If we re-encrypt the snapshot keys, that could take a while when there are potentially thousands of snapshots in the table. On the other hand, in practice it would be a small set and rotating the table key would be rare.
I think I like the idea of a single table key that is used to decrypt all of the snapshot keys. It would require a structure that allows us to change the content of the snapshot keys (a separate list of them that is mutable) but that wouldn't be too difficult.
@ggershinsky, what do you think about that idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll experiment with these suggestions.
Note we'll want to start a dev list email thread once we get this nailed now to vote on the change |
@@ -889,6 +892,9 @@ Table metadata consists of the following fields: | |||
| _optional_ | _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). | | |||
| | | _optional_ | **`row-lineage`** | A boolean, defaulting to false, setting whether or not to track the creation and updates to rows in the table. See [Row Lineage](#row-lineage). | | |||
| | | _optional_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). | | |||
| | | _optional_ | **`key-cache`** | A list of encryption keys (key-id/key-wrap pairs), used to encrypt the manifest list file key metadata. See [Snapshot](#key-metadata-key-id). | | |||
| | | _optional_ | **`key-id`** | The ID of the encryption key that encrypts the manifest list key metadata. See [Snapshot](#key-metadata-key-id). | | |||
| | | _optional_ | **`key-wrap`** | Wrapped (encrypted) metadata encryption key. Wrapping can for example be done in a Mey Management Service (KMS). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these intended to be fields inside of key-cache
? I think that keys in table metadata should be a list of objects and those objects should have a table defining the structure. That's how we've added other structures, like schemas: "A list of schemas, stored as objects with schema-id
".
Something like this:
_optional_ | **`encryption-keys`** | A list of encryption keys, stored as objects. |
...
Each encryption key within the `encryption-keys` table metadata field is a struct with the following fields:
| v1/v2 | v3 | Field name | Type | Description |
|----|----|------------|------|-------------|
| | _required_ | **`key-id`** | `string` | ID used to refer to this key in table metadata. |
...
One thing we can do to make this more flexible is to allow each key to have a key ID that encrypted it, or to indicate that it was wrapped by KMS. That would allow using the same encryption-keys
list for multiple strategies:
- Wrap each snapshot key using KMS
- Wrap a table key with KMS that is used to encrypt snapshot keys
- Re-encrypt with the new KMS key when rotating
- Leave the existing KMS-wrapped key when rotating
@@ -685,6 +686,8 @@ A snapshot consists of the following fields: | |||
| _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | |||
| | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | | |||
| | | _optional_ | **`added-rows`** | Sum of the [`added_rows_count`](#manifest-lists) from all manifests added in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | | |||
| | | _optional_ | **`encrypted-key-metadata`** | Base64-encoded key metadata of the manifest list file in an encrypted table. The key metadata is encrypted by a metadata encryption key before encoding | | |||
| | | _optional_ | **`key-metadata-key-id`** | The ID of the encryption key that encrypts the manifest list key metadata | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After my chat with Russell, I think I would prefer to have all keys in table metadata separate from snapshot. That leaves a lot of flexibility and only requires adding a key-id
here.
No description provided.