-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyArrow S3FileSystem doesn't honor the AWS profile config #570
Comments
@geruh, thanks for highlighting this issue. The confusion largely stems from the naming convention used when the
+1 for unified configurations. I think it may be convenient to introduce other unified configurations, with generic names like
Regarding the |
I think it makes sense to have both a "catalog level" configuration and a "file level" configuration. A catalog might have a different set of permissions from when reading specific tables or files. I like the idea of having specific configurations at each level and also a generic "fall back" configuration. |
Fixed in #922 |
Apache Iceberg version
main (development)
Please describe the bug 🐞
When initializing the GlueCatalog with a specific AWS profile, everything works as it should with catalog operations. But, we’ve hit a issue when it comes to working with S3 via the PyArrow S3FileSystem. We allow users to specify a profile for initiating a boto connection however, this preference doesn’t carry over to the S3FileSystem. Instead of using the specified AWS profile, we will check the catalog configs for the s3 configs like:
s3.access-key-id, s3.region...
. If those aren't passed in, PyArrow's S3Filesystem has it's own strategy of inferring credentials such as:This workflow leads to some inconsistencies. For example, while Glue operations might be using a ux specified profile, S3 operations could end up using a different set of credentials or even a different region from what’s set in the environment variables or the AWS config files. This is seen in issue #515, where one region (like us-west-2) unexpectedly switches to another (like us-east-1), causing a 301 exception.
For example:
On one hand, we could argue that this profile configuration should only work at the catalog level, and for filesystems, the user must specify the aforementioned configs like
s3.region
. But on the other hand it seems reasonable that the AWS profile config should work uniformly across both the catalog and filesystem levels. This unified approach would certainly simplify configuration management for users. I’m leaning towards this perspective. However, we're currently utilizing PyArrow's S3FileSystem, which doesn't inherently support AWS profiles. This means we'd need to bridge that gap manually.cc: @HonahX @Fokko @kevinjqliu
The text was updated successfully, but these errors were encountered: