Make s3.request_timeout configurable #1568
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Similarly to #218, we see occasional timeout errors when writing data to S3-compatible object storage:
[I don't believe the issue is specific to the fact that I'm using Backblaze B2 rather than Amazon S3 - I saw references to similar error messages with the latter as I was researching this issue.]
The issue happens when the underlying
PUT
operation takes longer than the request timeout, which is set to a default of 3 seconds in the AWS C++ SDK used by Arrow via PyArrow.The changes in this PR allow configuration of
s3.request_timeout
when working directly or indirectly withpyiceberg.io.pyarrow.PyArrowFileIO
, just as #218 allowed configuration ofs3.connect_timeout
.For example, when creating a catalog: