Skip to content

RANK() in CTE with LEFT JOIN + COALESCE fails when filtered and ordered in outer query (SQLStorm) #17771

@2010YOUY01

Description

@2010YOUY01

Describe the bug

datafusion-cli is compiled from the latest main commit 5bbdb7e

This query is able to run in DuckDB and PostgreSQL, but fails in DataFusion

> WITH part AS (
  SELECT *
  FROM (VALUES (1, 'A'), (2, 'B')) AS t(partkey, name)
),
RevenueCTE AS (
  SELECT partkey, total_revenue
  FROM (VALUES (1, 10.0), (2, 5.0)) AS t(partkey, total_revenue)
),
SupplierCTE AS (
  SELECT partkey, total_supply_cost
  FROM (VALUES (1, 2.0), (2, 3.0)) AS t(partkey, total_supply_cost)
),
RankedParts AS (
  SELECT
    p.partkey,
    p.name,
    COALESCE(r.total_revenue, 0)       AS total_revenue,
    COALESCE(s.total_supply_cost, 0)   AS total_supply_cost,
    RANK() OVER (
      ORDER BY COALESCE(r.total_revenue, 0) DESC
    )                                   AS revenue_rank
  FROM part AS p
  LEFT JOIN RevenueCTE  AS r ON p.partkey = r.partkey
  LEFT JOIN SupplierCTE AS s ON p.partkey = s.partkey
)
SELECT
  partkey,
  name,
  total_revenue,
  total_supply_cost,
  CASE
    WHEN total_supply_cost > 0 THEN total_revenue / total_supply_cost
    ELSE NULL
  END AS ratio
FROM RankedParts
WHERE revenue_rank <= 10
ORDER BY total_revenue DESC;

SanityCheckPlan
caused by
Error during planning: Plan: ["SortPreservingMergeExec: [total_revenue@2 DESC]", "  ProjectionExec: expr=[partkey@0 as partkey, name@1 as name, total_revenue@2 as total_revenue, total_supply_cost@3 as total_supply_cost, CASE WHEN total_supply_cost@3 > 0 THEN total_revenue@2 / total_supply_cost@3 END as ratio]", "    ProjectionExec: expr=[partkey@0 as partkey, name@1 as name, CASE WHEN total_revenue@2 IS NOT NULL THEN total_revenue@2 ELSE 0 END as total_revenue, CASE WHEN total_supply_cost@3 IS NOT NULL THEN total_supply_cost@3 ELSE 0 END as total_supply_cost]", "      CoalesceBatchesExec: target_batch_size=8192", "        FilterExec: rank() ORDER BY [coalesce(r.total_revenue, Int64(0)) DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@4 <= 10, projection=[partkey@0, name@1, total_revenue@2, total_supply_cost@3]", "          RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=1", "            BoundedWindowAggExec: wdw=[rank() ORDER BY [coalesce(r.total_revenue, Int64(0)) DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: \"rank() ORDER BY [coalesce(r.total_revenue, Int64(0)) DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW\", data_type: UInt64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW], mode=[Sorted]", "              SortPreservingMergeExec: [CASE WHEN total_revenue@2 IS NOT NULL THEN total_revenue@2 ELSE 0 END DESC]", "                SortExec: expr=[CASE WHEN total_revenue@2 IS NOT NULL THEN total_revenue@2 ELSE 0 END DESC], preserve_partitioning=[true]", "                  ProjectionExec: expr=[partkey@1 as partkey, name@2 as name, total_revenue@3 as total_revenue, total_supply_cost@0 as total_supply_cost]", "                    CoalesceBatchesExec: target_batch_size=8192", "                      HashJoinExec: mode=CollectLeft, join_type=Right, on=[(partkey@0, partkey@0)], projection=[total_supply_cost@1, partkey@2, name@3, total_revenue@4]", "                        ProjectionExec: expr=[column1@0 as partkey, column2@1 as total_supply_cost]", "                          DataSourceExec: partitions=1, partition_sizes=[1]", "                        RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=1", "                          ProjectionExec: expr=[partkey@1 as partkey, name@2 as name, total_revenue@0 as total_revenue]", "                            CoalesceBatchesExec: target_batch_size=8192", "                              HashJoinExec: mode=CollectLeft, join_type=Right, on=[(partkey@0, partkey@0)], projection=[total_revenue@1, partkey@2, name@3]", "                                ProjectionExec: expr=[column1@0 as partkey, column2@1 as total_revenue]", "                                  DataSourceExec: partitions=1, partition_sizes=[1]", "                                ProjectionExec: expr=[column1@0 as partkey, column2@1 as name]", "                                  DataSourceExec: partitions=1, partition_sizes=[1]"] does not satisfy order requirements: [total_revenue@2 DESC]. Child-0 order: []

To Reproduce

No response

Expected behavior

No response

Additional context

Found by SQLStorm #17698

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions