Skip to content

Add support for ~/.datafusionrc and cli option for overriding it to datafusion-cli #1875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 3, 2022

Conversation

matthewmturner
Copy link
Contributor

@matthewmturner matthewmturner commented Feb 23, 2022

Which issue does this PR close?

Closes #1872

Rationale for this change

What changes are included in this PR?

datafusion-cli will now by default look for ~/.datafusionrc file and run that before placing user in REPL.

User can optionally override the default .datafusionrc file using the -r / --rc option when starting datafusion-cli.

Are there any user-facing changes?

Maybe a changed option. If no change to that, then just a new command line option.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Feb 23, 2022
@houqp houqp requested a review from jimexist February 23, 2022 06:58
@houqp
Copy link
Member

houqp commented Feb 23, 2022

I would be interested to hear what @jimexist has to say about it. I am cool with file-exit and file-run rename. Other options are:

  1. use bootstrap as the argument name
  2. start using sub commands, then we can have cli run QUERY_FILE and cli console --file QUERY_FILE.

@matthewmturner
Copy link
Contributor Author

Agree - would like @jimexist opinion. I've renamed to file-run and file-exit for now.

@jimexist
Copy link
Member

i'm thinking of how this can be done in a more generic way.

if you think of psql, it will on startup read and execute commands and sql statements from .psqlrc. then it's a good place to put common init setup for your local dev exp.

probably we should read from .datafusionrc by default, and that's where you'd want to put \pset commands and common ddl commands?

@matthewmturner
Copy link
Contributor Author

@jimexist i think thats a great idea. the only thing ill add, which i personally would find useful, is the option to provide a custom .datafusionrc file when starting up (default can of course be ~/.datafusionrc.

To give you some context, im expecting to setup different environments and would like to be able to choose which i want to use without having them all bundled. I.e. I could have one for my main job that has ddl for those datasets, and then i could have another focused on arrow dev where i load db-benchmark, parquet-testing, and tpch data, etc. In that case maybe we could have something like

no option => startup with ~/.datafusionrc
-f / --file => existing use case, exits after running, aligned with psql
-r / --rc => startup with selected file, i dont think the -r option is currently used by psql so shouldnt cause confusion

@matthewmturner
Copy link
Contributor Author

@jimexist any additional thoughts on the above? If you are ok with it then ill move forward the following:

no option => startup with ~/.datafusionrc
-f / --file => existing use case, exits after running, aligned with psql
-r / --rc => startup with selected file, i dont think the -r option is currently used by psql so shouldnt cause confusion

@matthewmturner matthewmturner force-pushed the add_cli_option_for_ddl branch from b2ae6b0 to 5597e4c Compare March 2, 2022 04:59
@matthewmturner
Copy link
Contributor Author

@alamb also interested in your thoughts on this as i know you have mentioned using datafusion-cli

@alamb
Copy link
Contributor

alamb commented Mar 2, 2022

Taking inspiration frompsql , it implements a \i command (not a command line argument) for this usecase. It includes a file:

https://www.postgresql.org/docs/13/app-psql.html

\i or \include filename
Reads input from the file filename and executes it as though it had been typed on the keyboard.

If filename is - (hyphen), then standard input is read until an EOF indication or \q meta-command. This can be used to intersperse interactive input with input from files. Note that Readline behavior will be used only if it is active at the outermost level.

It appears that mysql has a similar command source or \. that does the same thing

https://dev.mysql.com/doc/refman/8.0/en/mysql-batch-commands.html

Thus I would recommend implementing a \i command (and possibly also a source command)

@matthewmturner
Copy link
Contributor Author

I thought we were actually already getting inspiriration from psql, see below:

psqlrc and ~/.psqlrc

    Unless it is passed an -X option, psql attempts to read and execute commands from the system-wide startup file (psqlrc) and then the user's personal startup file (~/.psqlrc), after connecting to the database but before accepting normal commands. These files can be used to set up the client and/or the server to taste, typically with \set and SET commands.

    The system-wide startup file is named psqlrc and is sought in the installation's “system configuration” directory, which is most reliably identified by running pg_config --sysconfdir. By default this directory will be ../etc/ relative to the directory containing the PostgreSQL executables. The name of this directory can be set explicitly via the PGSYSCONFDIR environment variable.

    The user's personal startup file is named .psqlrc and is sought in the invoking user's home directory. On Windows, which lacks such a concept, the personal startup file is named %APPDATA%\postgresql\psqlrc.conf. The location of the user's startup file can be set explicitly via the PSQLRC environment variable.

    Both the system-wide startup file and the user's personal startup file can be made psql-version-specific by appending a dash and the PostgreSQL major or minor release number to the file name, for example ~/.psqlrc-9.2 or ~/.psqlrc-9.2.5. The most specific version-matching file will be read in preference to a non-version-specific file.

Unless your point is that the above is more for options / configuration than it is for actually SQL / DDL. Which I think makes sense... I can make that update.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of -r <file> looks good to me

Perhaps we can update the PR title and description to explain what the actual change is now?

@matthewmturner
Copy link
Contributor Author

Just to ensure were aligned the PR adds both -r <file> option and it defaults to looking for ~/.datafusionrc.

I also think that adding the \i command still makes sense. Since its still helps achieve the intent of the issue i think its ok to just add it to this PR. Lmk if you think otherwise.

@alamb
Copy link
Contributor

alamb commented Mar 2, 2022

I also think that adding the \i command still makes sense. Since its still helps achieve the intent of the issue i think its ok to just add it to this PR. Lmk if you think otherwise.

I think it would also be fine to add as a follow on PR. Updating this PR's description and title I think is important though as a way to communicate the change to others

@matthewmturner matthewmturner changed the title Add new option for running sql file and keeping df-cli open Add support for ~/.datafusionrc and cli option for overriding it to datafusion-cli Mar 2, 2022
@matthewmturner
Copy link
Contributor Author

@alamb thanks much for the review and feedback.

@jimexist are we good to go on your end?

@alamb alamb merged commit c5c65f5 into apache:master Mar 3, 2022
@alamb
Copy link
Contributor

alamb commented Mar 3, 2022

Thanks for sticking with this @matthewmturner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to datafusion-cli to execute SQL from file and keep session open
4 participants