diff --git a/datafusion-cli/README.md b/datafusion-cli/README.md index 8f6856cb3358..1d99cfbcb00a 100644 --- a/datafusion-cli/README.md +++ b/datafusion-cli/README.md @@ -17,98 +17,12 @@ under the License. --> -# DataFusion Command-line Interface - -[DataFusion](https://github.com/apache/arrow-datafusion) is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format. - -The DataFusion CLI allows SQL queries to be executed by an in-process DataFusion context. - -```ignore -USAGE: - datafusion-cli [OPTIONS] - -OPTIONS: - -c, --batch-size The batch size of each query, or use DataFusion default - -f, --file ... Execute commands from file(s), then exit - --format [default: table] [possible values: csv, tsv, table, json, - nd-json] - -h, --help Print help information - -p, --data-path Path to your data, default to current directory - -q, --quiet Reduce printing other than the results and work quietly - -r, --rc ... Run the provided files on startup instead of ~/.datafusionrc - -V, --version Print version information - -``` - -## Example - -Create a CSV file to query. - -```bash,ignore -$ echo "1,2" > data.csv -``` - -```sql,ignore -$ datafusion-cli - -DataFusion CLI v12.0.0 - -> CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv'; -0 rows in set. Query took 0.001 seconds. + -> SELECT * FROM foo; -+---+---+ -| a | b | -+---+---+ -| 1 | 2 | -+---+---+ -1 row in set. Query took 0.017 seconds. -``` - -## Querying S3 Data Sources - -The CLI can query data in S3 if the following environment variables are defined: - -- `AWS_REGION` -- `AWS_ACCESS_KEY_ID` -- `AWS_SECRET_ACCESS_KEY` - -Alternatively, you can supply a profile name via the `AWS_PROFILE` environment variable. When using a [named profile](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html), the CLI obtains credentials from the profile configuration and thus does not require `AWS_ACCESS_KEY_ID` or `AWS_SECRET_ACCESS_KEY` environment variables to be set. - -Note that the region must be set to the region where the bucket exists until the following issue is resolved: - -- https://github.com/apache/arrow-rs/issues/2795 - -Example: - -```bash -$ aws s3 cp test.csv s3://my-bucket/ -upload: ./test.csv to s3://my-bucket/test.csv - -$ export AWS_REGION=us-east-1 -$ export AWS_SECRET_ACCESS_KEY=*************************** -$ export AWS_ACCESS_KEY_ID=************** - -$ ./target/release/datafusion-cli -DataFusion CLI v12.0.0 -❯ create external table test stored as csv location 's3://my-bucket/test.csv'; -0 rows in set. Query took 0.374 seconds. -❯ select * from test; -+----------+----------+ -| column_1 | column_2 | -+----------+----------+ -| 1 | 2 | -+----------+----------+ -1 row in set. Query took 0.171 seconds. -``` - -## DataFusion-Cli +# DataFusion Command-line Interface -Build the `datafusion-cli` by `cd` into the sub-directory: +[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format. -```bash -cd datafusion-cli -cargo build -``` +The DataFusion CLI is a command line utility that runs SQL queries using the DataFusion engine. -[df]: https://crates.io/crates/datafusion +See the [`datafusion-cli` documentation](https://arrow.apache.org/datafusion/user-guide/cli.html) for further information. diff --git a/docs/source/user-guide/cli.md b/docs/source/user-guide/cli.md index beac0cc44234..5954f0aaebd7 100644 --- a/docs/source/user-guide/cli.md +++ b/docs/source/user-guide/cli.md @@ -21,7 +21,7 @@ The DataFusion CLI is a command-line interactive SQL utility for executing queries against any supported data files. It is a convenient way to -try DataFusion out with your own data sources, and test out its SQL support. +try DataFusion's SQL support with your own data. ## Example