Closed
Description
Question
PyIceberg version: 0.6.0
Python version: 3.11.1
Comments:
- Iceberg tables are saved in a AWS Glue catalog
- catalog, list of namespaces and list of tables are retrievable through the catalog api
Hi,
I am facing issues loading iceberg tables from AWS Glue.
The code I am using is as follow:
from opensea.resources.resources import *
import pyiceberg.catalog
profile_name = "saml2aws_profile_name"
catalog_name = "catalog name"
table_name = "table name"
aws_region = "aws region"
catalog = pyiceberg.catalog.load_catalog(
catalog_name, **{"type": "glue", "profile_name": profile_name}
)
print(catalog.list_namespaces())
table = catalog.load_table((catalog_name, table_name))
The code allow me to:
- list namespaces
- list tables
But load_table throw the following error:
Traceback (most recent call last):
File "/path/to/the/project/testing.py", line 15, in <module>
table = catalog.load_table((catalog_name, table_name))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/catalog/glue.py", line 473, in load_table
return self._convert_glue_to_iceberg(self._get_glue_table(database_name=database_name, table_name=table_name))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/catalog/glue.py", line 296, in _convert_glue_to_iceberg
metadata = FromInputFile.table_metadata(file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/serializers.py", line 112, in table_metadata
with input_file.open() as input_stream:
^^^^^^^^^^^^^^^^^
File "/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 263, in open
input_file = self._filesystem.open_input_file(self._path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_fs.pyx", line 780, in pyarrow._fs.FileSystem.open_input_file
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: When reading information for key 'path/to/s3/table/location/metadata/100000-458c8ffc-de06-4eb5-bc4a-b94c3034a548.metadata.json' in bucket 's3_bucket_name': AWS Error UNKNOWN (HTTP status 400) during HeadObject operation: No response body.
I have checked I have the proper accesses, but it wasn't the issue.
I have tried a few other things but they were all unsuccessful.
- using load_glue, instead of load_catalog
- providing access_key and secret_key directly in the load_catalog call
The table definition is as follow and was created via Trino:
create table catalog_name.table_name (
"timestamp" timestamp,
"type" varchar(20),
distribution int,
service int,
code varchar(20),
base_id bigint,
counter_id bigint,
"category" varchar(50),
volume double)
with (
format = 'PARQUET',
partitioning = ARRAY['day(timestamp)'],
location = 's3://s3_bucket/path/to/table/folder/'
)
Metadata
Metadata
Assignees
Labels
No labels