Skip to content

[SPARK-52575][SQL] Introduce contextIndependentFoldable attribute for Expressions #51282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Introduces a method to determine whether an expression can be folded without relying on any external context (e.g., time zone, session configurations, or catalogs). If an expression is context-independent foldable, it can be safely evaluated during DDL operations such as creating tables, views, or constraints. This enables systems to store the computed value rather than the original expression, simplifying implementation and improving performance. By default, expressions are not considered context-independent foldable to ensure explicit annotation of context independence.

Examples of context-independent foldable:

  • 1 + 1: Always produces 2, regardless of context.
  • 1 > 0: Always evaluates to true.
  • CAST('2025-08-01' AS DATE): Always gives the same date value.

Examples of not context-independent foldable:

  • CURRENT_DATE: Result depends on the current date and time zone.
  • CURRENT_USER: Result depends on the session/user context.
  • CAST('2025-08-01 00:00:00' AS TIMESTAMP): Result can vary depending on the session time zone setting

Why are the changes needed?

This allows catalogs and connectors to store the computed value rather than the expression itself, improving both simplicity and performance.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@gengliangwang
Copy link
Member Author

I will create a follow-up to use this new method in the V2 expression conversion

 class V2ExpressionBuilder(e: Expression, isPredicate: Boolean = false) extends Logging  {
 
-  def build(): Option[V2Expression] = generateExpression(e, isPredicate)
+  def build(): Option[V2Expression] = {
+    val exprToBuild = if (e.foldable && e.contextIndependentFoldable) {
+      ConstantFolding.constantFolding(e)
+    } else {
+      e
+    }
+    generateExpression(exprToBuild, isPredicate)
+  }

cc @cloud-fan @beliefer @aokolnychyi

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be super helpful to simplify expressions before converting them to DSv2. I only have a question about marking expressions context-free by default for all unary and binary expressions.

*
* Default is false to ensure explicit marking of context independence.
*/
def contextIndependentFoldable: Boolean = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we could consider splitting contextIndependent and foldable into separate methods. Are there use cases where we want to know if the expression is context independent without checking it is foldable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of such special cases. I think we can always check foldable before checking contextIndependentFoldable, which seems very safe to me. The syntax for this method is when it is foldable, does it depend on context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess some unfoldable expression could be context independent. Such as: 1 + a', the type of attribute a' is int.
But I don't know should we consider these cases.
I'm +1 for @aokolnychyi if we must consider the case I mentioned.
I'm +1 for @gengliangwang if we not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such as: 1 + a', the type of attribute a' is int

@beliefer in such a case, column a' is not contextIndependent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I means the context that is related to the Spark context or Session context. The value of attribute is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the time zone etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key is how to define the context ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context-independent means that the expression is independent of contexts(current user, current database, current time, etc) when it is foldable.
For such expressions, we can evaluate them before storing in column default values, table constraints, etc.

* @param dataType The data type to check
* @return true if the data type has context dependency
*/
def hasContextDependency(dataType: DataType): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing. How can a data type have context dependency? It's a data type, not an operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I renamed the method to containsTimestampOrUDT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants