fix(sparksql): Default ignoreNulls to true for collect_set backward compatibility#16947
Open
yaooqinn wants to merge 1 commit intofacebookincubator:mainfrom
Open
fix(sparksql): Default ignoreNulls to true for collect_set backward compatibility#16947yaooqinn wants to merge 1 commit intofacebookincubator:mainfrom
yaooqinn wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Build Impact AnalysisDirectly Changed Targets
Selective Build Targets (building these covers all 5 affected)Total affected: 5/555 targets All affected targets (5)
Fast path • Graph from main@f7c243e24ac2705f4d69bc87cbcde0259ac6775b |
Collaborator
jinchengchenghh
left a comment
There was a problem hiding this comment.
Please don't include non-related change
…ompatibility The ignoreNulls_ field in SparkCollectSetAggregate was defaulting to false (RESPECT NULLS), which breaks backward compatibility when the 1-arg signature is used. In this case, setConstantInputs() does not receive a boolean constant, so the default value is used — which must match Spark's default behavior of ignoring nulls. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
565e97c to
d07e804
Compare
Contributor
Author
|
Rebased on latest main — removed unrelated changes from the diff. Now only the 1-file fix (CollectSetAggregate.cpp). |
rui-mo
reviewed
Mar 30, 2026
Collaborator
rui-mo
left a comment
There was a problem hiding this comment.
Can you please add test for verify the default behavior? Thanks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a backward compatibility bug introduced in PR #16416.
The
ignoreNulls_field inSparkCollectSetAggregatewas defaulting tofalse(RESPECT NULLS). When the 1-arg signaturecollect_set(T)is used,setConstantInputs()does not receive a boolean constant, so the default value is used — which must match Spark's default behavior of ignoring nulls (true).Root cause
Impact
Without this fix, any downstream consumer (e.g., Gluten) using the native
collect_setwith the 1-arg signature would get null elements in the output array, causingNullPointerExceptionduring Spark's result projection.Testing
Verified in Gluten with
VeloxAggregateFunctionsDefaultSuite— all 16 collect_set/collect_list tests pass after this fix.Related: Gluten PR apache/gluten#11837