-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(polars): add Intersection
and Difference
ops
#10623
Conversation
Intersection
and Difference
ops
@IndexSeek Thanks for the PR as always 😄 I think it's reasonable to reuse the |
You're welcome! That seems like a good plan. I gave this a try using the AttributeError: 'Sort' object has no attribute 'name' This was working for me when using a memtable in an IPython shell, but not on the tests. Do you know if there is a more efficient way than needing to manually register the tables as "left" and "right"? Here's what I was trying where it seems to work using @translate.register(ops.Difference)
def execute_difference(op, **kw):
ctx = kw.get("ctx")
sql = (
sg.select(STAR)
.from_(sg.to_identifier(op.left.name, quoted=True))
.except_(sg.select(STAR).from_(sg.to_identifier(op.right.name, quoted=True)))
)
result = ctx.execute(sql.sql(Polars), eager=False)
if op.distinct is True:
return result.unique()
return result In [1]: from ibis.interactive import *
...:
...: ibis.set_backend("polars")
...:
...: t1 = ibis.memtable({"number_col": [1, 2, 3]})
...: t2 = ibis.memtable({"number_col": [2, 3, 4]})
...: t1.difference(t2)
Out[1]:
┏━━━━━━━━━━━━┓
┃ number_col ┃
┡━━━━━━━━━━━━┩
│ int64 │
├────────────┤
│ 1 │
└────────────┘ |
cf6bf9c
to
7c87189
Compare
7c87189
to
78728bc
Compare
Nice, thank you for pushing those changes! The |
@IndexSeek Thank you! |
Description of changes
Adds support for
Intersection
andDifference
operations for the Polars backend. The approach uses pl.SQLContext to register the dataframes and query using INTERSECT and EXCEPT, respectively.I saw that a few other functions used a
ctx
argument which might be usingpl.SQLContext
in a similar manner, I wasn't sure if that argument could be used here to avoid some of the boilerplate. Here are some of those functions I am referring to:ibis/ibis/backends/polars/compiler.py
Lines 69 to 72 in 28bafd1
ibis/ibis/backends/polars/compiler.py
Lines 1287 to 1297 in 28bafd1