-
Notifications
You must be signed in to change notification settings - Fork 962
Continued: Use Arc<[Buffer]> instead of raw Vec<Buffer> in GenericByteViewArray for faster slice #7773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In follow-up work, I want to expose the ViewBuffers representation to the kernels in arrow-select. The |
Filter is another obvious candidate |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really nice @ctsk -- thank you
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I am not quite sure what to make of the benchmark results so far -- some of them may be noise -- we'll have to look into that further |
I am starting to check this one out in more detail |
🤖 |
🤖: Benchmark completed Details
|
🤖 |
I tried to reproduce the benchmarks for the concatenate_kernel These benchmarks seem to show 10-20% slowdown:
I used this command cargo bench --bench concatenate_kernel -- "utf8_view max_str_len=20 null_density=0" I ran it on main and then on this branch and I didn't see any significant difference. Maybe some of the subsequent cleanups since when I last ran benchmarks will make sense. Lets see what the next round says Details of run on `main`
Details of run on `arc-buffers`
|
🤖: Benchmark completed Details
|
🤖 |
Slice certainly looks faster now 🎉
🤔 it still seems concat shows some regression
However, I can't reproduce it on my Mac M3 -- maybe it is something to do with the benchmark machine (which is an x86) -- I'll try and reproduce there later today |
let len = array.len(); | ||
array.buffers.insert(0, array.views.into_inner()); | ||
let new_buffers = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this code is doing an extra allocation (namely it is now making a new Vec::with_capacity
rather than reusing the previous buffers as it was before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reproduce the slowdown for the concat kernel on my laptop.
For this allocation, I believe we in turn save doing this allocation during clone()
when using Arc<ViewBuffers>
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, it appears that I can't consistently reproduce the slowdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it is really strange
@@ -609,8 +617,9 @@ impl<T: ByteViewType + ?Sized> Array for GenericByteViewArray<T> { | |||
|
|||
fn shrink_to_fit(&mut self) { | |||
self.views.shrink_to_fit(); | |||
self.buffers.iter_mut().for_each(|b| b.shrink_to_fit()); | |||
self.buffers.shrink_to_fit(); | |||
if let Some(buffers) = Arc::get_mut(&mut self.buffers.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this changes the semantics slightly -- and it only shrinks the buffers if they aren't shared
Maybe it should be using Arc::make_mut
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying Buffers are reference counted too, and only shrink themselves if the reference count is 1. So while this does change the semantics slightly, I don't think it changes as much in practice: When the Arc is shared, the (current) alternative would be to store references to the same buffers in another Vec - thus incrementing the reference counts on the underlying buffers and making their shrink_to_fit
a no-op.
That's also why make_mut
wont lead to more shrinking
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
/// | ||
/// Similar to `Arc<Vec<Buffer>>` or `Arc<[Buffer]>` | ||
#[derive(Clone, Debug)] | ||
pub struct ViewBuffers(pub(crate) Arc<[Buffer]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea I had was instead of Arc<[Buffer>]
what if we left it as Arc<Vec<Buffer>>
so converting back/forth to Vec wasn't as costly 🤔
This PR is a continuation of the work by @ShiKaiWi in #6427.
Which issue does this PR close?
StringViewArray::slice()
andBinaryViewArray::slice()
faster / non allocating #6408.Please consult the description of #6427 for the details.
What changed since !6427
ViewBuffers
abstraction (analogous toFields
inSchema
) that hides the internal representation (Arc<[Buffer]>
) in the constructors ofGenericByteViewArray
.Are there any user-facing changes?
Yes, these are breaking changes.