Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve testing for array_remove and fallback to Spark for unsupported types #1308

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jan 18, 2025

Which issue does this PR close?

Closes #1307

Rationale for this change

This PR fixes a specific bug and provides an approach for solving a general problem where we are not checking for supported types when translating expressions to native.

What changes are included in this PR?

  • Move data generator from fuzz testing tool into Comet jar so that it can also be used in integration tests
  • Add type checks for array_remove and related tests

How are these changes tested?

@andygrove andygrove marked this pull request as draft January 19, 2025 14:57
@andygrove andygrove changed the title fix: Fallback to Spark for unsupported types for ArrayRemove fix: Improve testing for array_remove and fallback to Spark for unsupported types Jan 19, 2025
@andygrove andygrove marked this pull request as ready for review January 19, 2025 16:05
@@ -141,4 +146,154 @@ class DataGenerator(r: Random) {
Range(0, num).map(_ => generateRow(schema, stringGen))
}

private val primitiveTypes = Seq(
Copy link
Member Author

@andygrove andygrove Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is copied moved from the fuzz testing tool

@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2025

Codecov Report

Attention: Patch coverage is 85.86957% with 13 lines in your changes missing coverage. Please review.

Project coverage is 34.91%. Comparing base (be48839) to head (f8f244c).
Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
...la/org/apache/comet/testing/ParquetGenerator.scala 89.39% 1 Missing and 6 partials ⚠️
...src/main/scala/org/apache/comet/serde/arrays.scala 66.66% 1 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1308      +/-   ##
============================================
+ Coverage     34.69%   34.91%   +0.22%     
+ Complexity      991      989       -2     
============================================
  Files           116      119       +3     
  Lines         44891    45204     +313     
  Branches       9864     9959      +95     
============================================
+ Hits          15574    15785     +211     
- Misses        26168    26253      +85     
- Partials       3149     3166      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -2660,4 +2660,19 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
}
}

test("array_intersect") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title says array_remove but the test is for array-intersect?

Or other places that we already have tests?

Co-authored-by: Liang-Chi Hsieh <[email protected]>
r,
spark,
s"test$i.parquet",
numRows = conf.generateData.numRows(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numRows is for all rows from all generated files or for one single file? Do we have definition for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

array_remove with structs causes error at runtime
4 participants