Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Nov 1, 2025

Which issue does this PR close?

Closes #2672.

Rationale for this change

We're serializing dataFilters for native_datafusion, but it doesn't support subqueries. We filter this in pushedDownFilters so there's a mismatch.

What changes are included in this PR?

New method supportedDataFilters that consolidates this filtering for both access methods (serialization and execution).

How are these changes tested?

  • Running SPARK_HOME=pwd COMET_PARQUET_SCAN_IMPL=native_datafusion ./mvnw -pl spark -am -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" test locally.

@mbutrovich mbutrovich changed the title fix: [native_datafusion] Don't serialize unfiltered data filters for native_datafusion fix: [native_datafusion] Don't serialize subquery data filters Nov 1, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 1, 2025

Codecov Report

❌ Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.30%. Comparing base (f09f8af) to head (bbf6091).
⚠️ Report is 653 commits behind head on main.

Files with missing lines Patch % Lines
...ala/org/apache/spark/sql/comet/CometScanExec.scala 25.00% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2676      +/-   ##
============================================
+ Coverage     56.12%   59.30%   +3.17%     
- Complexity      976     1450     +474     
============================================
  Files           119      147      +28     
  Lines         11743    13797    +2054     
  Branches       2251     2369     +118     
============================================
+ Hits           6591     8182    +1591     
- Misses         4012     4388     +376     
- Partials       1140     1227      +87     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, but I don't think you can test by running the current stability suite since it ignores COMET_PARQUET_SCAN_IMPL=native_datafusion and explicitly tests for auto and iceberg compact. I will merge these changes into #2673 and see if it fixes the failures there

@mbutrovich
Copy link
Contributor Author

mbutrovich commented Nov 1, 2025

Changes LGTM, but I don't think you can test by running the current stability suite since it ignores COMET_PARQUET_SCAN_IMPL=native_datafusion and explicitly tests for auto and iceberg compact. I will merge these changes into #2673 and see if it fixes the failures there

Whoops. I just tried it and hit this in q1 now:

check simplified (tpcds-v1.4/q1) - native_datafusion *** FAILED *** (1 second, 24 milliseconds)
  java.lang.IllegalArgumentException: requirement failed: input[0, int, true] IN dynamicpruning#236 has not finished
  at scala.Predef$.require(Predef.scala:281)
  at org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144)
  at org.apache.spark.sql.execution.InSubqueryExec.doGenCode(subquery.scala:156)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
  at org.apache.spark.sql.catalyst.expressions.DynamicPruningExpression.doGenCode(DynamicPruning.scala:105)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
  ...

Converting to draft since I don't want it to get accidentally merged.

@mbutrovich mbutrovich marked this pull request as draft November 1, 2025 14:43
@mbutrovich mbutrovich closed this Nov 3, 2025
@mbutrovich mbutrovich deleted the fix_2672 branch November 4, 2025 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native_datafusion] Subquery has not finished

3 participants