Skip to content

Add ObjectStore support via SQL #1930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matthewmturner opened this issue Mar 5, 2022 · 7 comments
Open

Add ObjectStore support via SQL #1930

matthewmturner opened this issue Mar 5, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@matthewmturner
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

I am working towards making datafusion-cli a powerful tool to use locally for doing ad-hoc data analysis. The first step for that was #1875 which enables defining a local "database" that runs on startup with a .datafusionrc file. As a second step, I would like to be able to connect to object stores, such as S3, just from SQL. That will of course require adding s3 as a feature to datafusion-cli but that feature is useless unless ObjectStores can be registered. Below is the current behaviour:

❯ CREATE EXTERNAL TABLE t STORED AS CSV LOCATION 's3://bucket/t.csv';
Internal("No suitable object store found for s3")

Describe the solution you'd like
A clear and concise description of what you want to happen.

I would like to be able to register a ObjectStore just from SQL. Given that ObjectStore is a DataFusion concept I was thinking that we can add a function such as register_object_store, rather than having a SQL statement.

So it would look something like

Default credentials

❯   register_object_store('s3');

Minio

❯   register_object_store('s3', ACCESS_KEY, SECRET_KEY, PROVIDER, ENDPOINT);

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@matthewmturner matthewmturner added the enhancement New feature or request label Mar 5, 2022
@matthewmturner
Copy link
Contributor Author

@seddonm1 @yjshen @houqp FYI - in case you have thoughts on this.

@matthewmturner
Copy link
Contributor Author

actually, im not sure how well those parameters in register_object_store will generalize to other ObjectStore besides s3. so now im not sure if a general function like that could be used.

@matthewmturner
Copy link
Contributor Author

maybe my objective could be achieved with some command line options instead. for example:

Default credentials

$ datafusion-cli --object-store s3

Minio

$ datafusion-cli --object-store s3 --access-key KEY --secret-key ABC --provider PROVIDER --endpoint ENDPOINT

@houqp @yjshen @seddonm1 do you have a view on whether ObjectStore registration can be done via SQL or if this should be part of datafusion-cli?

@houqp
Copy link
Member

houqp commented Mar 9, 2022

I think it can be done through both because secret key credentials and endpoint can be provided through environment variables as well. In this case, user will only need to provide the s3 path in the SQL query.

@turbo1912
Copy link
Contributor

@matthewmturner any progress on this one? If you are not working on it still, I would like to take a stab at it

@seddonm1
Copy link
Contributor

I think this repo is largely deprecated in favour of https://github.com/apache/arrow-rs/tree/master/object_store

@matthewmturner
Copy link
Contributor Author

matthewmturner commented Sep 17, 2022

@matthewmturner any progress on this one? If you are not working on it still, I would like to take a stab at it

@turbo1912 Haven't been able to work on this, go for it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants