perf: Fix quadratic behavior of `to_array_of_size` by neilconway · Pull Request #20459 · apache/datafusion

neilconway · 2026-02-20T21:41:39Z

Which issue does this PR close?

Closes ScalarValue::to_array_of_size is slow for StringViewArray with many buffers #20458.
Closes Avoid deep copying Utf8View array buffers in ScalarValue::to_array_of_size() #18159.

Rationale for this change

When array_to_size(n) was called on a List-like object containing a StringViewArray with b data buffers, the previous implementation returned a list containing a StringViewArray with n*b buffers, which leads to catastrophically bad performance if b grows even somewhat large.

What changes are included in this PR?

Instead of using repeat_n + concat to merge together n copies of the StringViewArray, we instead use take, which preserves the same number of buffers as the input StringViewArray.

This PR also adds a new benchmark for this situation.

Are these changes tested?

Yes and benchmarked.

Are there any user-facing changes?

No.

AI usage

Iterated on the problem with Claude Code; I understand the problem and the solution.

neilconway · 2026-02-20T21:42:16Z

Benchmarks:

group                                    array-of-size-opt                      array-of-size-vanilla
-----                                    -----------------                      ---------------------
list_to_array_of_size/1_buffer/1024      1.00  1870.0±30.72µs        ? ?/sec    3.77      7.1±0.05ms        ? ?/sec
list_to_array_of_size/1_buffer/256       1.00    337.9±4.16µs        ? ?/sec    1.77    599.8±5.17µs        ? ?/sec
list_to_array_of_size/50_buffers/1024    1.00  1945.4±23.28µs        ? ?/sec    65.93   128.3±1.06ms        ? ?/sec
list_to_array_of_size/50_buffers/256     1.00    364.7±4.31µs        ? ?/sec    22.76     8.3±0.07ms        ? ?/sec

neilconway · 2026-02-21T01:38:33Z

I see #18159 already exists for this issue; I'll be optimistic and claim this PR closes it... 😅

neilconway · 2026-02-21T01:41:32Z

We could consider backing out the special-case logic in NLJ that was introduced in #18161, but that will require some consideration and benchmarking first.

neilconway added 2 commits February 20, 2026 16:25

Add bench

c740883

Add fix

e6cddaa

github-actions bot added the common Related to common crate label Feb 20, 2026

Fix clippy

e8b0d6c

Generalize to arrays of any length

4d6c672

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

perf: Fix quadratic behavior of `to_array_of_size`#20459

perf: Fix quadratic behavior of `to_array_of_size`#20459
neilconway wants to merge 4 commits intoapache:mainfrom
neilconway:neilc/optimize-to-array-of-size

neilconway commented Feb 20, 2026 •

edited

Loading

Uh oh!

neilconway commented Feb 20, 2026

Uh oh!

neilconway commented Feb 21, 2026

Uh oh!

neilconway commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

neilconway commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

AI usage

Uh oh!

neilconway commented Feb 20, 2026

Uh oh!

neilconway commented Feb 21, 2026

Uh oh!

neilconway commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neilconway commented Feb 20, 2026 •

edited

Loading