[core] Fix nested update agg dropping existing-key updates at count limit#8272
[core] Fix nested update agg dropping existing-key updates at count limit#8272QuakeWang wants to merge 3 commits into
Conversation
…imit When a nested-key was configured together with a count limit, `agg` returned the accumulator unchanged as soon as `acc.size() >= countLimit`. This early return was shared with the no-key path, so once the nested array reached the limit, updates to already-existing keys (and sequence-based updates) were silently dropped instead of being applied. Split the two paths: the no-key path keeps the count-limit truncation behavior, while the keyed path merges rows by key into a map. Existing keys are always updated (honoring the nested-sequence-field when set) and the count limit only caps the number of new keys, so in-place updates are no longer lost at the limit. Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
|
Thanks for the fix. I think one edge case still bypasses the new keyed count-limit logic: |
| addNonNullRows(input, rows, remainCount); | ||
| if (keyProjection == null) { | ||
| if (accumulator == null) { | ||
| return inputField; |
There was a problem hiding this comment.
Could we also apply the count limit to the no-key initial input case? This branch returns the whole inputField when accumulator == null, so a first write such as countLimit = 2 with ARRAY[row1, row2, row3] keeps all three rows. The no-key path is append mode, and the existing count-limit test already expects rows beyond the limit to be dropped, so the null-accumulator case should probably build a GenericArray from at most countLimit non-null input rows instead of returning inputField directly.
There was a problem hiding this comment.
@JingsongLi Good catch, thanks. I agree that the no-key append path should apply countLimit even when accumulator == null.
I updated the branch so the initial no-key input now builds a GenericArray from at most countLimit non-null input rows instead of returning inputField directly. I also added a regression test for an oversized first input array with a null element.
Purpose
When a nested-key was configured together with a count limit,
aggshort-circuited withif (acc.size() >= countLimit) return accumulator;. This early return was shared with the no-key path, so once the nested array reached the limit, updates to existing keys (andnested-sequence-fieldupdates) were silently dropped — the limit was meant to bound the number of distinct keys, not block in-place updates.This PR splits the two paths: the no-key path keeps the count-limit truncation, while the keyed path merges by key so existing keys are always updated and the limit only caps the number of new keys.
Tests
mvn test -pl paimon-core -Dtest=FieldAggregatorTest