Skip to content

[SPARK-52551][SQL] Add a new v2 Predicate BOOLEAN_EXPRESSION #51247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Jun 23, 2025

What changes were proposed in this pull request?

This is an extension of #47611 . It's impossible to translate all catalyst expressions returning boolean type into v2 Predicate, as the return type of a catalyst expression can be dynamic, and for example we can't make v2 Cast to extend Predicate only when it returns boolean type.

This PR adds a new type of v2 Predicate: BOOLEAN_EXPRESSION. It's a simple wrapper over any expression that returns boolean type. By doing so, Spark can push down any catalyst expression that returns boolean type as predicates.

Why are the changes needed?

To pushdown more v2 predicates.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

updated test cases in PushablePredicateSuite

Was this patch authored or co-authored using generative AI tooling?

no

@gengliangwang
Copy link
Member

@cloud-fan thanks for making the PR.
FYI @aokolnychyi and I discussed on this offline. We are thinking about adding a new method or trait for the expressions which can be constant folding without context

  • example of constant folding without context:1 > 0, cast(null as boolean)
  • example of constant folding with context: current_date(), current_catalog(), cast('2020-01-01 00:00:00' as timestamp) (requires context of time zone)

We can always evaluate the constant-folding-without-context expressions before passing it to V2. WDYT?

@HyukjinKwon
Copy link
Member

Oh just read the comment. I will leave it to you @gengliangwang and @cloud-fan

@cloud-fan
Copy link
Contributor Author

@gengliangwang I don't think #51282 can guarantee that any boolean catalyst expression can be translated to v2 Predicate properly. Cast and null boolean literal are just examples. What do you think?

if (isPredicate) {
val translated = build()
val translated0 = build()
val conf = SQLConf.get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold the SQLConf will lead to unable to obtain real-time changes to SQLConf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is within a single method, I don't think we want to be that dynamic. And in practise this should be run within the same session so it won't change.

val translated0 = build()
val conf = SQLConf.get
val alwaysCreateV2Predicate = conf.getConf(SQLConf.DATA_SOURCE_ALWAYS_CREATE_V2_PREDICATE)
val translated = if (alwaysCreateV2Predicate && isPredicate && e.dataType == BooleanType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove isPredicate here.

@@ -145,5 +151,8 @@ public class Predicate extends GeneralScalarExpression {

public Predicate(String name, Expression[] children) {
super(name, children);
if ("BOOLEAN_EXPRESSION".equals(name)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use final to modify "BOOLEAN_EXPRESSION".

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a minor comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants