Skip to content

Failed optimizations with Int64 type #15291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aectaan opened this issue Mar 18, 2025 · 12 comments · May be fixed by #16066
Open

Failed optimizations with Int64 type #15291

aectaan opened this issue Mar 18, 2025 · 12 comments · May be fixed by #16066
Labels
bug Something isn't working

Comments

@aectaan
Copy link

aectaan commented Mar 18, 2025

Describe the bug

Datafusion optimizer produce different behaviour with different types of arguments in request. Also behaviour is dependent on positions of arguments.

To Reproduce

Add this test to optimiser_integrations.rs

#[test]
fn test_prepared_request() {
    let successful = r#"
        PREPARE req(BIGINT) AS
        WITH aggregations_group AS (
            SELECT
                count(col_utf8) FILTER (WHERE $1 - 1 <= col_int32) as foo,
                count(col_utf8) FILTER (WHERE $1 - 2 <= col_int32 AND col_uint32 >= 0) as bar,
                count(col_utf8) FILTER (WHERE $1 - 2 <= col_int32 AND col_uint32 >= 0 AND col_uint32 >= 0) as baz
            FROM test
        )
        SELECT * FROM aggregations_group
        "#;

    test_pgsql(successful).unwrap();

    let failed = r#"
            PREPARE req(BIGINT) AS
            WITH aggregations_group AS (
                SELECT
                    count(col_utf8) FILTER (WHERE $1 - 1 <= col_int64) as foo,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0) as bar,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0 AND col_uint32 >= 0) as baz
                FROM test
            )
            SELECT * FROM aggregations_group
            "#;

    test_pgsql(failed).unwrap();
}

fn test_pgsql(sql: &str) -> Result<LogicalPlan> {
    // parse the SQL
    let dialect = PostgreSqlDialect {}; // or AnsiDialect, or your own dialect ...
    let ast: Vec<Statement> = Parser::parse_sql(&dialect, sql).unwrap();
    let statement = &ast[0];
    let context_provider = MyContextProvider::default()
        .with_udaf(sum_udaf())
        .with_udaf(count_udaf())
        .with_udaf(avg_udaf())
        .with_expr_planners(vec![
            Arc::new(AggregateFunctionPlanner),
            Arc::new(WindowFunctionPlanner),
        ]);

    let sql_to_rel = SqlToRel::new(&context_provider);
    let plan = sql_to_rel.sql_statement_to_plan(statement.clone())?;

    let config = OptimizerContext::new().with_skip_failing_rules(false);
    let analyzer = Analyzer::new();
    let optimizer = Optimizer::new();
    // analyze and optimize the logical plan
    let plan = analyzer.execute_and_check(plan, config.options(), |_, _| {})?;
    optimizer.optimize(plan, &config, observe)
}

Also add a Field::new("col_int64", DataType::Int64, true) column to test table.

Failed request will execute succesfuly with any of this changes:

  • remove redundant condition
let failed = r#"
            PREPARE req(BIGINT) AS
            WITH aggregations_group AS (
                SELECT
                    count(col_utf8) FILTER (WHERE $1 - 1 <= col_int64) as foo,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0) as bar,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0) as baz
                FROM test
            )
            SELECT * FROM aggregations_group
            "#;
  • swap order of expressions
let failed = r#"
            PREPARE req(BIGINT) AS
            WITH aggregations_group AS (
                SELECT
                    count(col_utf8) FILTER (WHERE $1 - 1 <= col_int64) as foo,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0) as bar,
                    count(col_utf8) FILTER (WHERE col_uint32 >= 0 AND $1 - 2 <= col_int64 AND col_uint32 >= 0) as baz
                FROM test
            )
            SELECT * FROM aggregations_group
            "#;

Expected behavior

Expected to get successful result not dependent on types or expression order

Additional context

No response

@aectaan aectaan added the bug Something isn't working label Mar 18, 2025
@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

Thanks @aectaan -- what is the error message that you get?

@aectaan
Copy link
Author

aectaan commented Mar 18, 2025

@alamb it's related to common subexpr eliminate:

called `Result::unwrap()` on an `Err` value: Context("Optimizer rule 'common_sub_expression_eliminate' failed", SchemaError(FieldNotFound { field: Column { relation: None, name: "count(test.col_utf8) FILTER (WHERE $1 - Int64(1) <= test.col_int64)" }, valid_fields: [Column { relation: None, name: "count(test.col_utf8) FILTER (WHERE __common_expr_1 AS $1 - Int64(1) <= test.col_int64)" }, Column { relation: None, name: "count(test.col_utf8) FILTER (WHERE $1 - Int64(2) <= test.col_int64 AND test.col_uint32 >= Int64(0))" }, Column { relation: None, name: "count(test.col_utf8) FILTER (WHERE $1 - Int64(2) <= test.col_int64 AND test.col_uint32 >= Int64(0) AND test.col_uint32 >= Int64(0))" }] }, Some("")))

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

Thanks @aectaan

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

I wonder if you can get a pure SQL (datafusion-cli based) reproducer? Or does it require creating and configuring a custom context / optimizer rules 🤔

@aectaan
Copy link
Author

aectaan commented Mar 18, 2025

Thank you @alamb! No, it doesn't require anything custom.

Unfortunately datafusion-cli parser fails at this request: doesn't like opening brace before WHERE - that's why I made it as a test. Maybe I missing something.

There are two optimisation rules, that involved into bug:

  1. common_sub_expression_eliminate (obviously)
  2. simplify_expr

Order and set of rules is default. Disabling other rules doesn't change anything.

If we swap order of first simplify_expr and common_sub_expression_eliminate - everything is ok

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

Unfortunately datafusion-cli parser fails at this request: doesn't like opening brace before WHERE - that's why I made it as a test. Maybe I missing something.

Maybe you can use

set datafusion.sql_parser.dialect = 'postgres';
> set datafusion.sql_parser.dialect = 'postgres';
0 row(s) fetched.
Elapsed 0.001 seconds.

>             PREPARE req(BIGINT) AS
            WITH aggregations_group AS (
                SELECT
                    count(col_utf8) FILTER (WHERE $1 - 1 <= col_int64) as foo,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0) as bar,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0) as baz
                FROM test
            )
            SELECT * FROM aggregations_group;
Error during planning: table 'datafusion.public.test' not found

@qazxcdswe123
Copy link
Contributor

@alamb On head commit.

col_int32,col_int64,col_uint32,col_utf8
1,1,0,a
2,2,1,b
3,3,2,c
4,4,3,d
5,5,4,e
6,6,5,f
7,7,6,g
8,8,7,h
9,9,8,i
10,10,9,j
> PREPARE req(BIGINT) AS
            WITH aggregations_group AS (
                SELECT
                    count(col_utf8) FILTER (WHERE $1 - 1 <= col_int64) as foo,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0) as bar,
                    count(col_utf8) FILTER (WHERE $1 - 2 <= col_int64 AND col_uint32 >= 0 AND col_uint32 >= 0) as baz
                FROM test
            )
            SELECT * FROM aggregations_group;
0 row(s) fetched. 
Elapsed 0.010 seconds.

> EXECUTE req(42);
+-----+-----+-----+
| foo | bar | baz |
+-----+-----+-----+
| 0   | 0   | 0   |
+-----+-----+-----+
1 row(s) fetched. 
Elapsed 0.016 seconds.

> EXECUTE req(1);
+-----+-----+-----+
| foo | bar | baz |
+-----+-----+-----+
| 10  | 10  | 10  |
+-----+-----+-----+
1 row(s) fetched. 
Elapsed 0.025 seconds.

@aectaan
Copy link
Author

aectaan commented Mar 19, 2025

That's interesting, same request performed via datafusion-cli returns without error that test have.
It's reproducible with datafusion-cli older than 44.0.0.
Provided test case reproducible with all versions of datafusion.

Behaviour of test and df-cli differs starting from 34d9d3a

@aectaan
Copy link
Author

aectaan commented Mar 19, 2025

repro.patch

Looks like datafusion-cli doesn't apply optimisations at all. After applying attached patch behaviour is the same as with test.

@aectaan
Copy link
Author

aectaan commented Mar 20, 2025

Ok, probably it's related to Analyzer TypeCoercion rule. After disabling it optimisations are ok

@aectaan
Copy link
Author

aectaan commented Mar 21, 2025

@alamb What's the reason to cast numeric columns to i64/u64 and not to smallest compatible type? Why not to try something like this:

        macro_rules! try_parse_num {
            ($num:expr, $ty:ident) => {{
                if let Ok(n) = $num.parse::<$ty>() {
                    return Ok(lit(n));
                }
            }};
        }

        try_parse_num!(signed_number, i8);
        if !negative {
            try_parse_num!(unsigned_number, u8);
        }
        try_parse_num!(signed_number, i16);
        if !negative {
            try_parse_num!(unsigned_number, u16);
        }
        try_parse_num!(signed_number, i32);
        if !negative {
            try_parse_num!(unsigned_number, u32);
        }
        try_parse_num!(signed_number, i64);
        if !negative {
            try_parse_num!(unsigned_number, u64);
        }

I replaced code at

// Try to parse as i64 first, then u64 if negative is false, then decimal or f64
with provided snippet and everything is ok now - no one rule is failing, behaviour is the same, no matter of used column type.

@qazxcdswe123
Copy link
Contributor

@alamb What's the reason to cast numeric columns to i64/u64 and not to smallest compatible type? Why not to try something like this:

    macro_rules! try_parse_num {
        ($num:expr, $ty:ident) => {{
            if let Ok(n) = $num.parse::<$ty>() {
                return Ok(lit(n));
            }
        }};
    }

    try_parse_num!(signed_number, i8);
    if !negative {
        try_parse_num!(unsigned_number, u8);
    }
    try_parse_num!(signed_number, i16);
    if !negative {
        try_parse_num!(unsigned_number, u16);
    }
    try_parse_num!(signed_number, i32);
    if !negative {
        try_parse_num!(unsigned_number, u32);
    }
    try_parse_num!(signed_number, i64);
    if !negative {
        try_parse_num!(unsigned_number, u64);
    }

I replaced code at

datafusion/datafusion/sql/src/expr/value.rs

Line 81 in 74aeb91

// Try to parse as i64 first, then u64 if negative is false, then decimal or f64
with provided snippet and everything is ok now - no one rule is failing, behaviour is the same, no matter of used column type.

I think it makes some sense from a False Sharing perspective but I'm not entirely sure about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants