Skip to content

Migrate user_defined tests to insta #15255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 18, 2025
Merged

Migrate user_defined tests to insta #15255

merged 13 commits into from
Mar 18, 2025

Conversation

shruti2522
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

migrate tests in datafusion/core/tests/user_defined to insta

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Mar 15, 2025
@alamb
Copy link
Contributor

alamb commented Mar 16, 2025

cc @blaginin

@alamb
Copy link
Contributor

alamb commented Mar 16, 2025

Thanks @shruti2522 !

@blaginin
Copy link
Contributor

Hey! I’m a bit worried that now to understand the test you’ll need to look into two places: open the test file and also search for the snapshot file… that’s why in the example pr (#15165) I used inline snapshots.

do you think we should use the same approach here?

@shruti2522
Copy link
Contributor Author

shruti2522 commented Mar 16, 2025

Hey! I’m a bit worried that now to understand the test you’ll need to look into two places: open the test file and also search for the snapshot file… that’s why in the example pr (#15165) I used inline snapshots.

do you think we should use the same approach here?

Hey @blaginin, you are right about this, at first I tried implementing the inline snapshots only, but I ran into errors while running tests for run_and_compare_query in user_defined/user_defined_plan.rs, since multiple tests are using it ,only the first one works because insta doesn't allow duplicate inline snapshots by default. I tried allow_duplicates!(), but it gets a bit tricky with async functions. So, I used snapshot files instead because they make it much easier to run multiple tests using the same query fn, also it makes it much easier to update expected outputs in the future and eliminates the use of #[rustfmt::skip] in some places. Maybe we can discuss some alternative approach to implement inline snapshots 🤔

@blaginin
Copy link
Contributor

blaginin commented Mar 17, 2025

I tried allow_duplicates!(), but it gets a bit tricky with async functions

Can you please explain more on this? I tried modifying async run_and_compare_query and this worked for me:

image

So, I used snapshot files instead because they make it much easier to run multiple tests using the same query fn, also it makes it much easier to update expected outputs in the future and eliminates the use of #[rustfmt::skip] in some places.

Just to give a bit of context, we had a discussion about inline / file snapshots here: #13672 (comment) and #13672 (comment).

To me personally, it feels like we can use snapshot files, but the snapshot itself should give enough information to understand the context. For example, in CLI, you can see program input and tests, and the snapshot file itself doesn't contain a lot of logic. Here I'm not sure if that's the case for ALL the test fixtures? Maybe some of them can be inline?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @blaginin and @shruti2522

I do think that inline snapshots would be better (and more consistent with the rest of the code). Can you please update the PR as suggested by @blaginin and mark the PR ready for review when done?

Thanks again for this (and all the other recent PRs)

@alamb alamb marked this pull request as draft March 17, 2025 16:08
@shruti2522
Copy link
Contributor Author

I tried allow_duplicates!(), but it gets a bit tricky with async functions

Can you please explain more on this? I tried modifying async run_and_compare_query and this worked for me:

image > So, I used snapshot files instead because they make it much easier to run multiple tests using the same query fn, also it makes it much easier to update expected outputs in the future and eliminates the use of #[rustfmt::skip] in some places.

Just to give a bit of context, we had a discussion about inline / file snapshots here: #13672 (comment) and #13672 (comment).

To me personally, it feels like we can use snapshot files, but the snapshot itself should give enough information to understand the context. For example, in CLI, you can see program input and tests, and the snapshot file itself doesn't contain a lot of logic. Here I'm not sure if that's the case for ALL the test fixtures? Maybe some of them can be inline?

Yes, it worked. I was using insta::allow_duplicates! like a function call:

 insta::allow_duplicates!(|| {
        insta::with_settings!({
            description => description,
        }, {
            insta::assert_snapshot!(actual, @r###"
            +-------------+---------+
            | customer_id | revenue |
            +-------------+---------+
            | paul        | 300     |
            | jorge       | 200     |
            | andy        | 150     |
            +-------------+---------+
            "###);
        });
    })();

On changing it to this it worked

 insta::allow_duplicates! {
        insta::with_settings!({
            description => description,
        }, {
            insta::assert_snapshot!(actual, @r###"
            +-------------+---------+
            | customer_id | revenue |
            +-------------+---------+
            | paul        | 300     |
            | jorge       | 200     |
            | andy        | 150     |
            +-------------+---------+
        "###);
        });
    }

Thanks a lot @blaginin and @alamb ❤️

@shruti2522 shruti2522 marked this pull request as ready for review March 17, 2025 19:26
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @shruti2522 and @blaginin -- thi slooks great to me!

Comment on lines 76 to 82
fn fmt_batches(batches: &[RecordBatch]) -> String {
use arrow::util::pretty::pretty_format_batches;
match pretty_format_batches(batches) {
Ok(formatted) => formatted.to_string(),
Err(e) => format!("Error formatting record batches: {}", e),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use batches_to_string for that?

Comment on lines 60 to 67
fn fmt_batches(batches: &[RecordBatch]) -> String {
use arrow::util::pretty::pretty_format_batches;
match pretty_format_batches(batches) {
Ok(formatted) => formatted.to_string(),
Err(e) => format!("Error formatting record batches: {}", e),
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this can be removed as well

Comment on lines 42 to 49
fn fmt_batches(batches: &[RecordBatch]) -> String {
use arrow::util::pretty::pretty_format_batches;
match pretty_format_batches(batches) {
Ok(formatted) => formatted.to_string(),
Err(e) => format!("Error formatting record batches: {}", e),
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call -- I took the liberty of pushing a commit to make the change.

Copy link
Contributor

@blaginin blaginin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!!

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

Thanks @shruti2522 and @blaginin 🙏

@alamb alamb merged commit 8c8b245 into apache:main Mar 18, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate user_defined tests to insta
3 participants