Skip to content

Unable to load an iceberg table from aws glue catalog #515

Closed
@arookieds

Description

@arookieds

Question

PyIceberg version: 0.6.0
Python version: 3.11.1

Comments:

  • Iceberg tables are saved in a AWS Glue catalog
  • catalog, list of namespaces and list of tables are retrievable through the catalog api

Hi,

I am facing issues loading iceberg tables from AWS Glue.
The code I am using is as follow:

from opensea.resources.resources import *
import pyiceberg.catalog
    
profile_name = "saml2aws_profile_name"
catalog_name = "catalog name"
table_name = "table name"
aws_region = "aws region"

catalog = pyiceberg.catalog.load_catalog(
    catalog_name, **{"type": "glue", "profile_name": profile_name}
)

print(catalog.list_namespaces())

table = catalog.load_table((catalog_name, table_name))

The code allow me to:

  • list namespaces
  • list tables

But load_table throw the following error:

Traceback (most recent call last):
  File "/path/to/the/project/testing.py", line 15, in <module>
    table = catalog.load_table((catalog_name, table_name))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/catalog/glue.py", line 473, in load_table
    return self._convert_glue_to_iceberg(self._get_glue_table(database_name=database_name, table_name=table_name))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/catalog/glue.py", line 296, in _convert_glue_to_iceberg
    metadata = FromInputFile.table_metadata(file)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/serializers.py", line 112, in table_metadata
    with input_file.open() as input_stream:
         ^^^^^^^^^^^^^^^^^
  File "/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 263, in open
    input_file = self._filesystem.open_input_file(self._path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_fs.pyx", line 780, in pyarrow._fs.FileSystem.open_input_file
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: When reading information for key 'path/to/s3/table/location/metadata/100000-458c8ffc-de06-4eb5-bc4a-b94c3034a548.metadata.json' in bucket 's3_bucket_name': AWS Error UNKNOWN (HTTP status 400) during HeadObject operation: No response body.

I have checked I have the proper accesses, but it wasn't the issue.
I have tried a few other things but they were all unsuccessful.

  • using load_glue, instead of load_catalog
  • providing access_key and secret_key directly in the load_catalog call

The table definition is as follow and was created via Trino:

create table catalog_name.table_name (
          "timestamp" timestamp,
          "type" varchar(20),
          distribution int,
          service int,
          code varchar(20),
          base_id bigint,
          counter_id bigint,
          "category" varchar(50),
          volume double)
        with (
          format = 'PARQUET',
          partitioning = ARRAY['day(timestamp)'],
          location = 's3://s3_bucket/path/to/table/folder/'
        )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions