`DatasetBuilder.to_dataframe()` fails if S3 buckets are encrypted with server-side KMS encryption and a KMS key is supplied. 

**Describe the bug**

`DatasetBuilder.to_dataframe()` fails if S3 buckets are encrypted with server-side KMS encryption and a KMS key is supplied. 

`S3Uploader.upload()`method  uses `{"SSEKMSKeyId": kms_key, "ServerSideEncryption": "aws:kms"}` [[ref]](https://github.com/aws/sagemaker-python-sdk/blob/d12763cb655f79c37dcad76e02530fef265fe75e/src/sagemaker/s3.py#L56) if a KMS key is provided, which is used within `DatasetBuilder.to_csv_file()` to upload the objects. But in the next step of  `DatasetBuilder.to_dataframe()` method, `S3Uploader.download()` is called that makes KMS key as `{"SSECustomerKey": kms_key}` [[ref]](https://github.com/aws/sagemaker-python-sdk/blob/d12763cb655f79c37dcad76e02530fef265fe75e/src/sagemaker/s3.py#L150C28-L150C42). This is incorrect and leads to the error stated in logs, as it should use **SSEKMSKeyId** to decrypt (not **SSECustomerKey**), which was originally used to upload the objects.  

**To reproduce**

Mentioned in the log section. 

**Expected behavior**

Expected behavior is being able to load a query output using in a pandas data frame when using `DatasetBuilder.to_dataframe()` with a KMS key. Note that KMS key is supplied in `feature_store.create_dataset(kms_key_id=<key>)`


**Screenshots or logs**

```
s3.S3Downloader.download(s3_uri=csv_file,
    local_path="./",
    kms_key='<a valid kms key for SSE bucket>',
    sagemaker_session=feature_group_session.feature_store_session
)
```
On doing the above, it fails with the following error.

```
File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:530, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    526     raise TypeError(
    527         f"{py_operation_name}() only accepts keyword arguments."
    528     )
    529 # The "self" in this scope is referring to the BaseClient.
--> 530 return self._make_api_call(operation_name, kwargs)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:964, in BaseClient._make_api_call(self, operation_name, api_params)
    962     error_code = parsed_response.get("Error", {}).get("Code")
    963     error_class = self.exceptions.from_code(error_code)
--> 964     raise error_class(parsed_response, operation_name)
    965 else:
    966     return parsed_response

ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request
```

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: 2.167.0
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**: N/A
- **Framework version**: N/A
- **Python version**: 3.10.10
- **CPU or GPU**: CPU
- **Custom Docker image (Y/N)**: N

**Workaround**
Using `S3_URI, _ = DatasetBuilder.to_csv_file()` and then calling `pd.read_csv(S3_URI)` works. It is weird that within `DatasetBuilder.to_dataframe()`, first an object is being downloaded, loaded to data frame and then deleted, when one should simply load in data frame by getting the object without downloading it.  




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`DatasetBuilder.to_dataframe()` fails if S3 buckets are encrypted with server-side KMS encryption and a KMS key is supplied. #4034

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DatasetBuilder.to_dataframe() fails if S3 buckets are encrypted with server-side KMS encryption and a KMS key is supplied. #4034

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`DatasetBuilder.to_dataframe()` fails if S3 buckets are encrypted with server-side KMS encryption and a KMS key is supplied. #4034