-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52575][SQL] Introduce contextIndependentFoldable attribute for Expressions #51282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-52575][SQL] Introduce contextIndependentFoldable attribute for Expressions #51282
Conversation
I will create a follow-up to use this new method in the V2 expression conversion
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be super helpful to simplify expressions before converting them to DSv2. I only have a question about marking expressions context-free by default for all unary and binary expressions.
* | ||
* Default is false to ensure explicit marking of context independence. | ||
*/ | ||
def contextIndependentFoldable: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we could consider splitting contextIndependent
and foldable
into separate methods. Are there use cases where we want to know if the expression is context independent without checking it is foldable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of such special cases. I think we can always check foldable
before checking contextIndependentFoldable
, which seems very safe to me. The syntax for this method is when it is foldable, does it depend on context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess some unfoldable expression could be context independent. Such as: 1 + a', the type of attribute a' is int.
But I don't know should we consider these cases.
I'm +1 for @aokolnychyi if we must consider the case I mentioned.
I'm +1 for @gengliangwang if we not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such as: 1 + a', the type of attribute a' is int
@beliefer in such a case, column a'
is not contextIndependent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I means the context that is related to the Spark context or Session context. The value of attribute is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the time zone etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key is how to define the context ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context-independent means that the expression is independent of contexts(current user, current database, current time, etc) when it is foldable.
For such expressions, we can evaluate them before storing in column default values, table constraints, etc.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
Outdated
Show resolved
Hide resolved
* @param dataType The data type to check | ||
* @return true if the data type has context dependency | ||
*/ | ||
def hasContextDependency(dataType: DataType): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. How can a data type have context dependency? It's a data type, not an operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I renamed the method to containsTimestampOrUDT
What changes were proposed in this pull request?
Introduces a method to determine whether an expression can be folded without relying on any external context (e.g., time zone, session configurations, or catalogs). If an expression is context-independent foldable, it can be safely evaluated during DDL operations such as creating tables, views, or constraints. This enables systems to store the computed value rather than the original expression, simplifying implementation and improving performance. By default, expressions are not considered context-independent foldable to ensure explicit annotation of context independence.
Examples of context-independent foldable:
Examples of not context-independent foldable:
Why are the changes needed?
This allows catalogs and connectors to store the computed value rather than the expression itself, improving both simplicity and performance.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No