Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec additions for encryption #12162

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ggershinsky
Copy link
Contributor

No description provided.

@@ -889,6 +892,9 @@ Table metadata consists of the following fields:
| _optional_ | _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). |
| | | _optional_ | **`row-lineage`** | A boolean, defaulting to false, setting whether or not to track the creation and updates to rows in the table. See [Row Lineage](#row-lineage). |
| | | _optional_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). |
| | | _optional_ | **`key-cache`** | A list of encryption keys (key-id/key-wrap pairs), used to encrypt the manifest list file key metadata. See [Snapshot](#key-metadata-key-id). |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, if we really need multiple keys here. This would allow folks to use old keys after a key rotation but is that something we really want to enable? Shouldn't rotating a key wipe out the ability to use a new key and require writing new manifest lists and replacing old ones?

I know we can get similar behavior with a Rotate + Expire, but is that a real use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked a little about this idea with Russell and I think the trade-off is whether to re-encrypt all of the snapshot keys when rotating the table key. If we re-encrypt the snapshot keys, that could take a while when there are potentially thousands of snapshots in the table. On the other hand, in practice it would be a small set and rotating the table key would be rare.

I think I like the idea of a single table key that is used to decrypt all of the snapshot keys. It would require a structure that allows us to change the content of the snapshot keys (a separate list of them that is mutable) but that wouldn't be too difficult.

@ggershinsky, what do you think about that idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll experiment with these suggestions.

@RussellSpitzer
Copy link
Member

Note we'll want to start a dev list email thread once we get this nailed now to vote on the change

@@ -889,6 +892,9 @@ Table metadata consists of the following fields:
| _optional_ | _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). |
| | | _optional_ | **`row-lineage`** | A boolean, defaulting to false, setting whether or not to track the creation and updates to rows in the table. See [Row Lineage](#row-lineage). |
| | | _optional_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). |
| | | _optional_ | **`key-cache`** | A list of encryption keys (key-id/key-wrap pairs), used to encrypt the manifest list file key metadata. See [Snapshot](#key-metadata-key-id). |
| | | _optional_ | **`key-id`** | The ID of the encryption key that encrypts the manifest list key metadata. See [Snapshot](#key-metadata-key-id). |
| | | _optional_ | **`key-wrap`** | Wrapped (encrypted) metadata encryption key. Wrapping can for example be done in a Mey Management Service (KMS). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these intended to be fields inside of key-cache? I think that keys in table metadata should be a list of objects and those objects should have a table defining the structure. That's how we've added other structures, like schemas: "A list of schemas, stored as objects with schema-id".

Something like this:

_optional_ | **`encryption-keys`** | A list of encryption keys, stored as objects. |

...

Each encryption key within the `encryption-keys` table metadata field is a struct with the following fields:
| v1/v2 | v3 | Field name | Type | Description |
|----|----|------------|------|-------------|
|  | _required_ | **`key-id`** | `string` | ID used to refer to this key in table metadata. |
...

One thing we can do to make this more flexible is to allow each key to have a key ID that encrypted it, or to indicate that it was wrapped by KMS. That would allow using the same encryption-keys list for multiple strategies:

  • Wrap each snapshot key using KMS
  • Wrap a table key with KMS that is used to encrypt snapshot keys
    • Re-encrypt with the new KMS key when rotating
    • Leave the existing KMS-wrapped key when rotating

@@ -685,6 +686,8 @@ A snapshot consists of the following fields:
| _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created |
| | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) |
| | | _optional_ | **`added-rows`** | Sum of the [`added_rows_count`](#manifest-lists) from all manifests added in this snapshot. Required if [Row Lineage](#row-lineage) is enabled |
| | | _optional_ | **`encrypted-key-metadata`** | Base64-encoded key metadata of the manifest list file in an encrypted table. The key metadata is encrypted by a metadata encryption key before encoding |
| | | _optional_ | **`key-metadata-key-id`** | The ID of the encryption key that encrypts the manifest list key metadata |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After my chat with Russell, I think I would prefer to have all keys in table metadata separate from snapshot. That leaves a lot of flexibility and only requires adding a key-id here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Specification Issues that may introduce spec changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants