Skip to content

[BUG]: Mismatch between IR.do_evaluate and IR._non_child_args for some cudf-polars IR nodes. #20481

@TomAugspurger

Description

@TomAugspurger

Describe the bug

inherit from the base `IR` node. The evaluation of a plan node is done
by implementing the `do_evaluate` method. This method takes in
the non-child arguments specified in `_non_child_args`, followed by
pre-evaluated child nodes (`DataFrame` objects), and finally a
keyword-only `context` argument (an `IRExecutionContext` object
documents that for cudf-polars IR nodes, the signature of IR.do_evaluate should be

  • _non_child_args (unpacked)
  • pre-evaluated child nodes (DataFrames) (unpacked)
  • A kw-only context

Steps/Code to reproduce bug

This snippet looks at our IR nodes and prints out issues

import inspect
import cudf_polars.dsl.ir
import rich.table

classes = [
    cls for (name, cls) in inspect.getmembers(cudf_polars.dsl.ir) if inspect.isclass(cls) and issubclass(cls, cudf_polars.dsl.ir.IR)
    if name not in ("ErrorNode", "IR", "PythonScan")
]

records = []

for cls in classes:
    non_child = cls._non_child   # this is the source of truth
    do_evaluate_args = list(inspect.signature(cls.do_evaluate).parameters)

    for i, nc in enumerate(non_child):
        if nc not in do_evaluate_args:
            records.append({
                "class": cls.__name__,
                "arg": nc,
                "error": "Missing",
            })
        elif do_evaluate_args.index(nc) != i:
            records.append({
                "class": cls.__name__,
                "arg": nc,
                "error": "Wrong position",
            })

    # Now ensure that all *remaining* args in do_evaluate are 'DataFrame' type

    # Finally, ensure that the only kw-only argument is 'context'

t = rich.table.Table("Class", "Arg", "Error")
for record in records:
    t.add_row(record["class"], record["arg"], record["error"])

rich.print(t)

The output:

❯ python ir_check.py                                                                                                                                                                                                                                                                                                                                                                                                                                                                (base) 
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Class           ┃ Arg                ┃ Error          ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ Cache           │ schema             │ Missing        │
│ Cache           │ key                │ Wrong position │
│ Cache           │ refcount           │ Wrong position │
│ ConditionalJoin │ schema             │ Missing        │
│ ConditionalJoin │ predicate          │ Missing        │
│ ConditionalJoin │ options            │ Wrong position │
│ Distinct        │ schema             │ Missing        │
│ Distinct        │ keep               │ Wrong position │
│ Distinct        │ subset             │ Wrong position │
│ Distinct        │ zlice              │ Wrong position │
│ Distinct        │ stable             │ Wrong position │
│ Filter          │ schema             │ Missing        │
│ Filter          │ mask               │ Missing        │
│ GroupBy         │ keys               │ Missing        │
│ HConcat         │ schema             │ Missing        │
│ HConcat         │ should_broadcast   │ Wrong position │
│ HStack          │ schema             │ Missing        │
│ HStack          │ columns            │ Missing        │
│ HStack          │ should_broadcast   │ Wrong position │
│ Join            │ schema             │ Missing        │
│ Join            │ left_on            │ Missing        │
│ Join            │ right_on           │ Missing        │
│ Join            │ options            │ Wrong position │
│ MergeSorted     │ schema             │ Missing        │
│ MergeSorted     │ key                │ Wrong position │
│ Reduce          │ schema             │ Missing        │
│ Reduce          │ exprs              │ Wrong position │
│ Rolling         │ schema             │ Missing        │
│ Rolling         │ index              │ Wrong position │
│ Rolling         │ index_dtype        │ Wrong position │
│ Rolling         │ preceding_ordinal  │ Wrong position │
│ Rolling         │ following_ordinal  │ Wrong position │
│ Rolling         │ closed_window      │ Wrong position │
│ Rolling         │ keys               │ Missing        │
│ Rolling         │ agg_requests       │ Missing        │
│ Rolling         │ zlice              │ Wrong position │
│ Scan            │ cloud_options      │ Missing        │
│ Scan            │ paths              │ Wrong position │
│ Scan            │ with_columns       │ Wrong position │
│ Scan            │ skip_rows          │ Wrong position │
│ Scan            │ n_rows             │ Wrong position │
│ Scan            │ row_index          │ Wrong position │
│ Scan            │ include_file_paths │ Wrong position │
│ Scan            │ predicate          │ Wrong position │
│ Scan            │ parquet_options    │ Wrong position │
│ Select          │ schema             │ Missing        │
│ Select          │ exprs              │ Wrong position │
│ Select          │ should_broadcast   │ Wrong position │
│ Sink            │ cloud_options      │ Missing        │
│ Slice           │ schema             │ Missing        │
│ Slice           │ offset             │ Wrong position │
│ Slice           │ length             │ Wrong position │
│ Sort            │ schema             │ Missing        │
│ Sort            │ by                 │ Wrong position │
│ Sort            │ order              │ Wrong position │
│ Sort            │ null_order         │ Wrong position │
│ Sort            │ stable             │ Wrong position │
│ Sort            │ zlice              │ Wrong position │
│ Union           │ schema             │ Missing        │
│ Union           │ zlice              │ Wrong position │
└─────────────────┴────────────────────┴────────────────┘

Expected behavior

Signature of each IR should match, so that snippet should output nothing.

Additional Context

This can cause issues for tracing, which relies on the convention for IR.do_evaluate's signature:

# By convention, all non-dataframe arguments (non_child) come first.
# Anything remaining is a dataframe, except for 'context' kwarg.
frames: list[cudf_polars.containers.DataFrame] = (
list(args) + [v for k, v in kwargs.items() if k != "context"]
)[len(cls._non_child) :] # type: ignore[assignment]

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcudf-polarsIssues specific to cudf-polars

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions