Skip to content

Boto Glue standard retry policy with configuration #1307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions mkdocs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -331,16 +331,18 @@ catalog:

<!-- markdown-link-check-disable -->

| Key | Example | Description |
| ---------------------- | ------------------------------------ | ------------------------------------------------------------------------------- |
| glue.id | 111111111111 | Configure the 12-digit ID of the Glue Catalog |
| glue.skip-archive | true | Configure whether to skip the archival of older table versions. Default to true |
| Key | Example | Description |
|------------------------|----------------------------------------|---------------------------------------------------------------------------------|
| glue.id | 111111111111 | Configure the 12-digit ID of the Glue Catalog |
| glue.skip-archive | true | Configure whether to skip the archival of older table versions. Default to true |
| glue.endpoint | <https://glue.us-east-1.amazonaws.com> | Configure an alternative endpoint of the Glue service for GlueCatalog to access |
| glue.profile-name | default | Configure the static profile used to access the Glue Catalog |
| glue.region | us-east-1 | Set the region of the Glue Catalog |
| glue.access-key-id | admin | Configure the static access key id used to access the Glue Catalog |
| glue.secret-access-key | password | Configure the static secret access key used to access the Glue Catalog |
| glue.session-token | AQoDYXdzEJr... | Configure the static session token used to access the Glue Catalog |
| glue.profile-name | default | Configure the static profile used to access the Glue Catalog |
| glue.region | us-east-1 | Set the region of the Glue Catalog |
| glue.access-key-id | admin | Configure the static access key id used to access the Glue Catalog |
| glue.secret-access-key | password | Configure the static secret access key used to access the Glue Catalog |
| glue.session-token | AQoDYXdzEJr... | Configure the static session token used to access the Glue Catalog |
| glue.max-retries | 10 | Configure the maximum number of retries for the Glue service calls |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, should we add a retry property for each catalog? Or just use the default value in the table retry property?`

commit.retry.num-retries | 4 | Number of times to retry a commit before failing

https://iceberg.apache.org/docs/1.5.2/configuration/#table-behavior-properties

| glue.retry-mode | standard | Configure the retry mode for the Glue service. Default to standard. |

<!-- markdown-link-check-enable-->

Expand Down
22 changes: 21 additions & 1 deletion pyiceberg/catalog/glue.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
)

import boto3
from botocore.config import Config
from mypy_boto3_glue.client import GlueClient
from mypy_boto3_glue.type_defs import (
ColumnTypeDef,
Expand Down Expand Up @@ -128,6 +129,14 @@
GLUE_ACCESS_KEY_ID = "glue.access-key-id"
GLUE_SECRET_ACCESS_KEY = "glue.secret-access-key"
GLUE_SESSION_TOKEN = "glue.session-token"
GLUE_MAX_RETRIES = "glue.max-retries"
GLUE_RETRY_MODE = "glue.retry-mode"

MAX_RETRIES = 10
STANDARD_RETRY_MODE = "standard"
ADAPTIVE_RETRY_MODE = "adaptive"
LEGACY_RETRY_MODE = "legacy"
EXISTING_RETRY_MODES = [STANDARD_RETRY_MODE, ADAPTIVE_RETRY_MODE, LEGACY_RETRY_MODE]


def _construct_parameters(
Expand Down Expand Up @@ -297,6 +306,8 @@ class GlueCatalog(MetastoreCatalog):
def __init__(self, name: str, **properties: Any):
super().__init__(name, **properties)

retry_mode_prop_value = get_first_property_value(properties, GLUE_RETRY_MODE)

session = boto3.Session(
profile_name=properties.get(GLUE_PROFILE_NAME),
region_name=get_first_property_value(properties, GLUE_REGION, AWS_REGION),
Expand All @@ -305,7 +316,16 @@ def __init__(self, name: str, **properties: Any):
aws_secret_access_key=get_first_property_value(properties, GLUE_SECRET_ACCESS_KEY, AWS_SECRET_ACCESS_KEY),
aws_session_token=get_first_property_value(properties, GLUE_SESSION_TOKEN, AWS_SESSION_TOKEN),
)
self.glue: GlueClient = session.client("glue", endpoint_url=properties.get(GLUE_CATALOG_ENDPOINT))
self.glue: GlueClient = session.client(
"glue",
endpoint_url=properties.get(GLUE_CATALOG_ENDPOINT),
config=Config(
retries={
"max_attempts": properties.get(GLUE_MAX_RETRIES, MAX_RETRIES),
"mode": retry_mode_prop_value if retry_mode_prop_value in EXISTING_RETRY_MODES else STANDARD_RETRY_MODE,
}
),
)

if glue_catalog_id := properties.get(GLUE_ID):
_register_glue_catalog_id_with_glue_client(self.glue, glue_catalog_id)
Expand Down
1 change: 0 additions & 1 deletion tests/catalog/test_glue.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,7 +442,6 @@ def test_list_tables(
moto_endpoint_url: str,
table_schema_nested: Schema,
database_name: str,
table_name: str,
table_list: List[str],
) -> None:
test_catalog = GlueCatalog("glue", **{"s3.endpoint": moto_endpoint_url, "warehouse": f"s3://{BUCKET_NAME}/"})
Expand Down