Skip to content

[core] Fix nested update agg dropping existing-key updates at count limit#8272

Open
QuakeWang wants to merge 3 commits into
apache:masterfrom
QuakeWang:nested-limit-update
Open

[core] Fix nested update agg dropping existing-key updates at count limit#8272
QuakeWang wants to merge 3 commits into
apache:masterfrom
QuakeWang:nested-limit-update

Conversation

@QuakeWang

Copy link
Copy Markdown
Member

Purpose

When a nested-key was configured together with a count limit, agg short-circuited with if (acc.size() >= countLimit) return accumulator;. This early return was shared with the no-key path, so once the nested array reached the limit, updates to existing keys (and nested-sequence-field updates) were silently dropped — the limit was meant to bound the number of distinct keys, not block in-place updates.

This PR splits the two paths: the no-key path keeps the count-limit truncation, while the keyed path merges by key so existing keys are always updated and the limit only caps the number of new keys.

Tests

  • mvn test -pl paimon-core -Dtest=FieldAggregatorTest

…imit

When a nested-key was configured together with a count limit, `agg` returned the accumulator unchanged as soon as `acc.size() >= countLimit`. This early return was shared with the no-key path, so once the nested array reached the limit, updates to already-existing keys (and sequence-based updates) were silently dropped instead of being applied.

Split the two paths: the no-key path keeps the count-limit truncation behavior, while the keyed path merges rows by key into a map. Existing keys are always updated (honoring the nested-sequence-field when set) and the count limit only caps the number of new keys, so in-place updates are no longer lost at the limit.

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
@JingsongLi

JingsongLi commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Thanks for the fix. I think one edge case still bypasses the new keyed count-limit logic: agg returns inputField directly when accumulator == null, before the keyed path runs. If the first non-null value for a row already contains more than countLimit distinct nested keys, or contains duplicate nested keys where nested-sequence-field should choose the latest row, that first array is kept as-is. On later merges, addNestedRows(acc, map, false) treats all of those keys as existing, so the count limit never brings it back under the intended distinct-key cap. Could we route the keyed accumulator == null case through the same map/limit/sequence merge path, and add a test with a multi-row first input array?

@QuakeWang QuakeWang requested a review from JingsongLi June 19, 2026 13:28
addNonNullRows(input, rows, remainCount);
if (keyProjection == null) {
if (accumulator == null) {
return inputField;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also apply the count limit to the no-key initial input case? This branch returns the whole inputField when accumulator == null, so a first write such as countLimit = 2 with ARRAY[row1, row2, row3] keeps all three rows. The no-key path is append mode, and the existing count-limit test already expects rows beyond the limit to be dropped, so the null-accumulator case should probably build a GenericArray from at most countLimit non-null input rows instead of returning inputField directly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JingsongLi Good catch, thanks. I agree that the no-key append path should apply countLimit even when accumulator == null.

I updated the branch so the initial no-key input now builds a GenericArray from at most countLimit non-null input rows instead of returning inputField directly. I also added a regression test for an oversized first input array with a null element.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants