Skip to content

Add insta / snapshot testing to CLI & set up AWS mock #13672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Mar 11, 2025

Conversation

blaginin
Copy link
Contributor

@blaginin blaginin commented Dec 5, 2024

Which issue does this PR close?

Rationale for this change

We currently don't test whether our external integrations actually work: as a result #13576 has happened.

What changes are included in this PR?

Integration tests for S3.

Are there any user-facing changes?

No.

@blaginin blaginin changed the title Add snap to CLI & set up AWS mock Add snapshot testing to CLI & set up AWS mock Dec 6, 2024
@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Dec 10, 2024
@blaginin
Copy link
Contributor Author

blaginin commented Mar 5, 2025

Because CLI integration tests don't break very often, I moved them to the extended tests

https://github.com/blaginin/datafusion/actions/runs/13679702615/job/38249126001#step:8:2585

@blaginin
Copy link
Contributor Author

blaginin commented Mar 5, 2025

Also resolved conflicts with main. I remember @alamb highlighted external data sources as a priority for this year, so I think adding tests is important 👊

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @blaginin -- I think this PR is a nice improvement

In my opinion, switching the Ci tests to run on each PR for a while to make sure they are passing is required prior to merge

I think both inline snapshots and moving the docs are important, but we could do them as a follow on PR

rustup toolchain install
- name: Install Protobuf Compiler
run: sudo apt-get install -y protobuf-compiler
- name: Setup Minio - S3-compatible storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that an extended test may fail (as it is only run on commits to main). And I have spent a bunch of time recently keeping the extended tests gree.

Maybe we can put this test in the normal rust.yaml that runs on each PR for a while until we are sure it is stable 🤔

Copy link
Contributor Author

@blaginin blaginin Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair! I've added them back to the rust flow. Fells like testing on the existing linux check should be enough (to save time) but happy to add to mac os / move to a new check if preferred


```shell
cargo install cargo-insta
cargo insta review
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I worry about adding "yet another dependency / test tool" to DataFusion

However, given how painful it is to make plan output changes (e.g. see @wiedld 's PR here #15007) I think it would be really nice to add insta

cargo test
```

## Snapshot testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please put these instructions in the existing https://datafusion.apache.org/contributor-guide/testing.html instead?

And then perhaps you can add a link to that page in datafusion-cli/CONTRIBUTING.md

(this is partly so when we start using insta for other tests, we can reuse the same instructions)


glob!("sql/*.sql", |path| {
let input = fs::read_to_string(path).unwrap();
assert_cmd_snapshot!(cli().pass_stdin(input))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is absolutely possible! My idea was to make it closer to slt and separate the code from the data.

FWIW I think .slt keeps the tests setup and expected output together (the slt driver is separate for sure, but most of the tests are self contained in the .slt file

@alamb alamb changed the title Add snapshot testing to CLI & set up AWS mock Add insta / snapshot testing to CLI & set up AWS mock Mar 6, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 7, 2025
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source $HOME/.cargo/env
rustup toolchain install
- name: Setup Rust toolchain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail because (as I found out earlier in the week) the ubuntu builder doens't start with rust


### AWS

To test the S3 integration against [Minio](https://github.com/minio/minio)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the instructions locally and they worked great 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @blaginin -- I think this PR looks really good and will make DataFusion testing better and easier

I expect over time the insta style snapshot tests will become widely used (and loved!)

I also verified that

cargo test -p datafusion-cli

works without running docker / minio and I also followed the instructions locally

I will likely try a follow on PR to try and run the datafusion-cli tests separately from the main test run to make things clearer but this looks 👍

Thanks again for sticking with it

First start up a container with Minio and load test files.

```shell
docker run -d \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we consider having the tests themselves start the containers they need?
In Trino land this is done with testcontainers-java and it works awesomely well from developer experience perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s a nice improvement if tests just handle the setup automatically! Created an issue to discuss: #15092

@alamb
Copy link
Contributor

alamb commented Mar 9, 2025

I plan to merge this on Tuesday unless anyone needs additional time to review.

I may then organize some tickets to port existing tests to use the new framework (especially as we seem to be a little short on good first issue tickets)

@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

I merged up to resolve conflicts and plan to merge this when the tests pass

@alamb alamb merged commit 51f0833 into apache:main Mar 11, 2025
28 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

Thanks again @blaginin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of DataFusion documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants