Skip to content

arrow-select: Implement concat for RunArrays #7487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

brancz
Copy link
Contributor

@brancz brancz commented May 9, 2025

It's unclear whether the added work and complexity of merging the last run between consecutive arrays with the first one of the next array being concatenanted is worth the potential benefit. For now this implementation focuses on the ability to concatenate RunArrays in the first place.

Part of apache/datafusion#16011

What changes are included in this PR?

Support for concatenating RunArrays.

Are there any user-facing changes?

No, this is strictly additive and doesn't even add any new APIs, just adds support for RunArrays in an existing API.

@alamb

It's unclear whether the added work and complexity of merging the last
run between consecutive arrays with the first one of the next array
being concatenanted is worth the potential benefit. For now this
implementation focuses on the ability to concatenate `RunArray`s in the
first place.
@github-actions github-actions bot added the arrow Changes to the arrow crate label May 9, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @brancz -- I think this looks quite good to me. With a few more tests I think it will be ready to merge.

It's unclear whether the added work and complexity of merging the last run between consecutive arrays with the first one of the next array being concatenanted is worth the potential benefit.

I agree -- and the approach in this PR has the nice benefit that it is very simply to explain


// Concatenate the arrays - this should now work properly
let result = concat(&[&array1, &array2]).unwrap();
let result_run_array = result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps you could use the new as_run() methods added elsewhere in this PR

let values1 = Int32Array::from(vec![10, 20]);
let array1 = RunArray::try_new(&run_ends1, &values1).unwrap();

let run_ends2 = Int32Array::from(vec![2, 4]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change the run ends for the second array so they aren't the same as the first? Also it would be good to include a run with length 1

Something like this perhaps:

Suggested change
let run_ends2 = Int32Array::from(vec![2, 4]);
let run_ends2 = Int32Array::from(vec![1, 4]);

}

#[test]
fn test_concat_run_array_with_nulls() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please add a test for:

  • calling concat with 1 RunArray
  • Calling concat with 3 Run arrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants