-
Notifications
You must be signed in to change notification settings - Fork 411
Open
Description
Feature Request / Improvement
Feature: Missing AWS Profile Support in PyIceberg / PyIceberg should support AWS profiles
Description:
When working with multiple AWS configs / credentials in parallel, AWS profiles are a convenient way to achieve this. Ideally, PyIceberg should therefore also support AWS profiles, which it currently does not.
Current state (as of writing - pyIceberg v0.10.0):
- The Glue part of the GlueCatalog can be configured to use the profile by specifying the Glue client explicitly in the Glue Catalog or via
glue.profile-nameconfig parameter:
from boto3 import Session
...
catalog = GlueCatalog(name="your_glue_catalog",client=Session(profile_name="your_aws_profile").client("glue"),...)or
catalog = GlueCatalog(
name="your_glue_catalog",
**{
"glue.profile-name": "your_aws_profile",
...
},
)- For
fsspecbackends, AWS profile support is generally available (see https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem and https://github.com/fsspec/s3fs/blob/aceea3a4985f667979e4d8a5a5b8eeddaf23b7be/s3fs/core.py#L229), but it's not implemented inPyIceberg(see). To change that we would need to set theiceberg-python/pyiceberg/io/fsspec.py
Line 199 in e07296e
fs = S3FileSystem(anon=anon, client_kwargs=client_kwargs, config_kwargs=config_kwargs) sessionparameter of theS3FileSystemexplicitly:
from s3fs import S3FileSystem
from aiobotocore.session import AioSession
...
fs = S3FileSystem(session=AioSession(profile="your_aws_profile"),...)- For
PyArrowbackend, the AWS profile support is not yet available, but they do have an enhancement ticket for it (see [Python][C++] Add Profile support to S3FileSystem arrow#47880). Once AWS profile is supported inPyArrowit can be implemented inPyIcebergas well, I assume.
Workaround for this feature gap:
session = Session(profile_name="your_aws_profile")
credentials = session.get_credentials()
if credentials is None:
raise ValueError("Could not retrieve credentials for profile")
catalog = GlueCatalog(
name="your_glue_catalog",
**{
"client.access-key-id": credentials.access_key,
"client.secret-access-key": credentials.secret_key,
"client.session-token": credentials.token,
...
},
)To-Be / Expected Behavior:
PyIcebergshould have a newclient.profile-nameands3.profile-nameconfiguration parameter (next to existingglue.profile-name.- New
client.profile-nameshould also setglue.profile-name(same behaviour as for all the other unified AWS credentials). - For now, AWS profile support should be implemented for
fsspecbackend andclient.profile-nameands3.profile-nameshould only be supported when usingfsspecbackend ("py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO"). - Once
PyArrowsupports AWS profile names (see [Python][C++] Add Profile support to S3FileSystem arrow#47880), AWS profile support should be implemented forPyArrowbackend as well andclient.profile-nameands3.profile-nameshould be fully supported.
Remark: I found this feature gap with the GlueCatalog; it might be that the RestCatalog is equally affected, but not sure.
Issues possibly related to this issue: #570, #1207, #2657
Metadata
Metadata
Assignees
Labels
No labels