-
Notifications
You must be signed in to change notification settings - Fork 989
Open
Labels
bugSomething isn't workingSomething isn't workingcudf-polarsIssues specific to cudf-polarsIssues specific to cudf-polars
Description
Describe the bug
cudf/python/cudf_polars/docs/overview.md
Lines 214 to 218 in c0b4c43
| inherit from the base `IR` node. The evaluation of a plan node is done | |
| by implementing the `do_evaluate` method. This method takes in | |
| the non-child arguments specified in `_non_child_args`, followed by | |
| pre-evaluated child nodes (`DataFrame` objects), and finally a | |
| keyword-only `context` argument (an `IRExecutionContext` object |
IR.do_evaluate should be
_non_child_args(unpacked)- pre-evaluated child nodes (
DataFrames) (unpacked) - A kw-only
context
Steps/Code to reproduce bug
This snippet looks at our IR nodes and prints out issues
import inspect
import cudf_polars.dsl.ir
import rich.table
classes = [
cls for (name, cls) in inspect.getmembers(cudf_polars.dsl.ir) if inspect.isclass(cls) and issubclass(cls, cudf_polars.dsl.ir.IR)
if name not in ("ErrorNode", "IR", "PythonScan")
]
records = []
for cls in classes:
non_child = cls._non_child # this is the source of truth
do_evaluate_args = list(inspect.signature(cls.do_evaluate).parameters)
for i, nc in enumerate(non_child):
if nc not in do_evaluate_args:
records.append({
"class": cls.__name__,
"arg": nc,
"error": "Missing",
})
elif do_evaluate_args.index(nc) != i:
records.append({
"class": cls.__name__,
"arg": nc,
"error": "Wrong position",
})
# Now ensure that all *remaining* args in do_evaluate are 'DataFrame' type
# Finally, ensure that the only kw-only argument is 'context'
t = rich.table.Table("Class", "Arg", "Error")
for record in records:
t.add_row(record["class"], record["arg"], record["error"])
rich.print(t)The output:
❯ python ir_check.py (base)
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Class ┃ Arg ┃ Error ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ Cache │ schema │ Missing │
│ Cache │ key │ Wrong position │
│ Cache │ refcount │ Wrong position │
│ ConditionalJoin │ schema │ Missing │
│ ConditionalJoin │ predicate │ Missing │
│ ConditionalJoin │ options │ Wrong position │
│ Distinct │ schema │ Missing │
│ Distinct │ keep │ Wrong position │
│ Distinct │ subset │ Wrong position │
│ Distinct │ zlice │ Wrong position │
│ Distinct │ stable │ Wrong position │
│ Filter │ schema │ Missing │
│ Filter │ mask │ Missing │
│ GroupBy │ keys │ Missing │
│ HConcat │ schema │ Missing │
│ HConcat │ should_broadcast │ Wrong position │
│ HStack │ schema │ Missing │
│ HStack │ columns │ Missing │
│ HStack │ should_broadcast │ Wrong position │
│ Join │ schema │ Missing │
│ Join │ left_on │ Missing │
│ Join │ right_on │ Missing │
│ Join │ options │ Wrong position │
│ MergeSorted │ schema │ Missing │
│ MergeSorted │ key │ Wrong position │
│ Reduce │ schema │ Missing │
│ Reduce │ exprs │ Wrong position │
│ Rolling │ schema │ Missing │
│ Rolling │ index │ Wrong position │
│ Rolling │ index_dtype │ Wrong position │
│ Rolling │ preceding_ordinal │ Wrong position │
│ Rolling │ following_ordinal │ Wrong position │
│ Rolling │ closed_window │ Wrong position │
│ Rolling │ keys │ Missing │
│ Rolling │ agg_requests │ Missing │
│ Rolling │ zlice │ Wrong position │
│ Scan │ cloud_options │ Missing │
│ Scan │ paths │ Wrong position │
│ Scan │ with_columns │ Wrong position │
│ Scan │ skip_rows │ Wrong position │
│ Scan │ n_rows │ Wrong position │
│ Scan │ row_index │ Wrong position │
│ Scan │ include_file_paths │ Wrong position │
│ Scan │ predicate │ Wrong position │
│ Scan │ parquet_options │ Wrong position │
│ Select │ schema │ Missing │
│ Select │ exprs │ Wrong position │
│ Select │ should_broadcast │ Wrong position │
│ Sink │ cloud_options │ Missing │
│ Slice │ schema │ Missing │
│ Slice │ offset │ Wrong position │
│ Slice │ length │ Wrong position │
│ Sort │ schema │ Missing │
│ Sort │ by │ Wrong position │
│ Sort │ order │ Wrong position │
│ Sort │ null_order │ Wrong position │
│ Sort │ stable │ Wrong position │
│ Sort │ zlice │ Wrong position │
│ Union │ schema │ Missing │
│ Union │ zlice │ Wrong position │
└─────────────────┴────────────────────┴────────────────┘
Expected behavior
Signature of each IR should match, so that snippet should output nothing.
Additional Context
This can cause issues for tracing, which relies on the convention for IR.do_evaluate's signature:
cudf/python/cudf_polars/cudf_polars/dsl/tracing.py
Lines 168 to 172 in c0b4c43
| # By convention, all non-dataframe arguments (non_child) come first. | |
| # Anything remaining is a dataframe, except for 'context' kwarg. | |
| frames: list[cudf_polars.containers.DataFrame] = ( | |
| list(args) + [v for k, v in kwargs.items() if k != "context"] | |
| )[len(cls._non_child) :] # type: ignore[assignment] |
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingcudf-polarsIssues specific to cudf-polarsIssues specific to cudf-polars
Type
Projects
Status
In Progress