-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyIceberg with Azure Storage Account (500 Internal Server Error) #4
Comments
I posted a partial explanation regarding the client side. apache/iceberg-python#939 (comment) For running the REST catalog server using this repo, you'd need to configure the server to be able to talk to your storage. For example, if you're trying to use Azure, here are some of the configs required Another example when running REST catalog server with minio (s3 compatible API)
While working on this repo, I discovered some bugs related to Pyiceberg. It was easier to iterate using Pyiceberg as submodule so that I can commit the fix right away. Some of these issues are upstreamed already (see apache/iceberg-python#864) |
To debug your issue above, look at the server log! HTTP 500 error usually indicates that the server ran into an error. |
Regarding this case below, i fullfilled (almost) all the fields on the link here, but the adlfs.sas_token. For some unknown reason (at least for me) the error says about an "AWS Error NETWORK_CONNECTION" but should be using the azure connection. And this type of configuration i didnt found inside the Dockerfile neither other place but inside "tests" and "models" folders. Also, i put some comments inside the logs to be clear what operation i did in each step. Error from Docker container: INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: 172.17.0.1:54386 - "GET /v1/config? warehouse=abfs%3A%2F%2Flanding%40sandboxnonprodstorage.dfs.core.windows.net%2F HTTP/1.1" 200 OK <------- FIRST REQUEST (Just to create the namespace and list the tables)
INFO: 172.17.0.1:54396 - "POST /v1/namespaces HTTP/1.1" 200 OK <------- NAMESPACE CREATION
INFO: 172.17.0.1:54396 - "GET /v1/namespaces HTTP/1.1" 200 OK <------- NAMESPACE LIST
INFO: 172.17.0.1:54396 - "GET /v1/namespaces/iceberg_rest/tables HTTP/1.1" 200 OK <----- TABLE'S LIST (Null as expected)
INFO: 172.17.0.1:32978 - "GET /v1/config?warehouse=abfs%3A%2F%2Flanding%40sandboxnonprodstorage.dfs.core.windows.net%2F HTTP/1.1" 200 OK <------ SECOND REQUEST (List namespaces, tables and create_table itself)
INFO: 172.17.0.1:32988 - "GET /v1/namespaces HTTP/1.1" 200 OK
INFO: 172.17.0.1:32988 - "GET /v1/namespaces/iceberg_rest/tables HTTP/1.1" 200 OK
INFO: 172.17.0.1:32988 - "POST /v1/namespaces/iceberg_rest/tables HTTP/1.1" 500 Internal Server Error <----CREATE_TABLE FUNCTION
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/iceberg_rest/api/catalog_api.py", line 297, in create_table
return _create_table(catalog, identifier, create_table_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/iceberg_rest/api/catalog_api.py", line 343, in _create_table
tbl = catalog.create_table(
^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/catalog/sql.py", line 208, in create_table
self._write_metadata(metadata, io, metadata_location)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/catalog/__init__.py", line 843, in _write_metadata
ToOutputFile.table_metadata(metadata, io.new_output(metadata_path))
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/serializers.py", line 130, in table_metadata
with output_file.create(overwrite=overwrite) as output_stream:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 304, in create
if not overwrite and self.exists() is True:
^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 248, in exists
self._file_info() # raises FileNotFoundError if it does not exist
^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 230, in _file_info
file_info = self._filesystem.get_file_info(self._path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_fs.pyx", line 584, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: When getting information for key 'rest/iceberg_rest.db/stations2000/metadata/00000-89d73996-40a2-458f-bdb9-1d1eff86a65b.metadata.json' in bucket 'warehouse': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 7, Couldn't connect to server |
The REST server is a wrapper around the underlying catalog. Looks like the catalog config is currently hardcoded to use AWS configs. iceberg-rest-catalog/src/iceberg_rest/catalog.py Lines 14 to 24 in 7c55481
Would need to change this to also take in Azure configs instead You can quickly verify this by passing the configs directly to this dict |
Worked fine after inserting the connection string parameter inside this function. But also i saw that SQLCatalog is in use rather the RESTCatalog and i was wondering why this choice? Also i tried to change to RESTCatalog but got some issues on the server side shown below, how could i fix this to use properly the RESTCatalog rather the SQLCatalog? Also, i build the postgres version but its trying to use SQLite, why? Error on server side: raise InvalidSchema(f"No connection adapters were found for {url!r}")
requests.exceptions.InvalidSchema: No connection adapters were found for 'sqlite:////tmp/warehouse/pyiceberg_catalog.db/v1/config' |
This repo implements the REST Catalog server, it accepts HTTP requests and then proxies to the underlying catalog. The server needs to get/set table metadata. In this case, the metadata is ultimately saved in the
Don't change it to
Are you using docker? The |
While using the pyiceberg got some issues/questions that blocked me, mainly regarding an internal server error 500 after the execution of a simple "create_table" function. Since I'm pretty new on iceberg stuff, probably I'm missing something that I don't know more about. Could anyone help me? I created a namespace and list it, but as soon as I try to create a table on my azure storage account I got the same error 500. My credentials are right, but im using the connection string and pointing the "warehouse" parameter to my storage account such as: "abfs://@<storage_account>.dfs.core.windows.net/".
I was looking the dockerfile and didnt saw anything that i should change, and the only files that was using the aws connection was inside models (that i believe that is to build new code with this models) and inside "tests" folder. Also, its is not clear to me about the "vendors" folder, why do you clone the pyiceberg to the container and the usage of this is not clear to me as well.
The text was updated successfully, but these errors were encountered: