Skip to content

Conversation

@kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Aug 10, 2025

Rationale for this change

Similar to #2299
This PR adds the rest of the parameters to pyarrow.fs.AzureFileSystem

Note the Azure Data Lake configuration page already has these 3 parameters

Are these changes tested?

Are there any user-facing changes?

@kevinjqliu kevinjqliu requested review from Fokko and Copilot August 10, 2025 21:00
@kevinjqliu kevinjqliu added this to the PyIceberg 0.10.0 milestone Aug 10, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for additional Azure authentication parameters in PyArrow's AzureFileSystem integration. It extends the existing Azure file system initialization to include client ID, client secret, and tenant ID parameters for service principal authentication.

  • Adds three new Azure configuration constants for client credentials authentication
  • Updates Azure file system initialization to pass client authentication parameters to PyArrow
  • Includes reference documentation link for the PyArrow AzureFileSystem API

Comment on lines +542 to +547
if client_id := self.properties.get(ADLS_CLIENT_ID):
client_kwargs["client_id"] = client_id
if client_secret := self.properties.get(ADLS_CLIENT_SECRET):
client_kwargs["client_secret"] = client_secret
if tenant_id := self.properties.get(ADLS_TENANT_ID):
client_kwargs["tenant_id"] = tenant_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client_id, client_secret and tenant_id should be provided together, should we raise a ValueError? This way we hide Arrow exceptions where possible.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, added!

@Fokko Fokko merged commit 8052652 into apache:main Aug 11, 2025
10 checks passed
@Fokko
Copy link
Contributor

Fokko commented Aug 11, 2025

Nice one @kevinjqliu thanks for adding this 👍

gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change
Similar to apache#2299
This PR adds the rest of the parameters to
[`pyarrow.fs.AzureFileSystem`](https://arrow.apache.org/docs/python/generated/pyarrow.fs.AzureFileSystem.html)

Note the [Azure Data Lake configuration
page](https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md#azure-data-lake)
already has these 3 parameters

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
@kevinjqliu kevinjqliu deleted the kevinjqliu/pyarrow-adls-args branch August 19, 2025 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants