Continued: Use Arc<[Buffer]> instead of raw Vec<Buffer> in GenericByteViewArray for faster slice #7773

ctsk · 2025-06-25T13:44:27Z

This PR is a continuation of the work by @ShiKaiWi in #6427.

Which issue does this PR close?

Closes Make StringViewArray::slice() and BinaryViewArray::slice() faster / non allocating #6408.

Please consult the description of #6427 for the details.

What changed since !6427

I've rebased the PR on the current main branch and fixed incompatibilities.
I added a ViewBuffers abstraction (analogous to Fields in Schema) that hides the internal representation (Arc<[Buffer]>) in the constructors of GenericByteViewArray.

Are there any user-facing changes?

Yes, these are breaking changes.

…rray` for faster `slice`

ctsk · 2025-06-25T13:52:19Z

In follow-up work, I want to expose the ViewBuffers representation to the kernels in arrow-select. The take kernel can then avoid allocating a new vector for the data buffers - I imagine other kernels will benefit aswell.

alamb · 2025-06-25T19:43:08Z

In follow-up work, I want to expose the ViewBuffers representation to the kernels in arrow-select. The take kernel can then avoid allocating a new vector for the data buffers - I imagine other kernels will benefit aswell.

Filter is another obvious candidate

alamb

This is looking really nice @ctsk -- thank you

arrow-array/src/array/byte_view_array.rs

arrow-array/benches/view_types.rs

alamb · 2025-06-25T21:58:43Z

I am not quite sure what to make of the benchmark results so far -- some of them may be noise -- we'll have to look into that further

alamb · 2025-06-27T11:39:19Z

I am starting to check this one out in more detail

alamb · 2025-06-27T11:42:38Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-buffers (1bb37b2) to e42df82 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-buffers
Results will be posted here when complete

alamb · 2025-06-27T11:44:12Z

🤖: Benchmark completed

Details

group                       arc-buffers                            main
-----                       -----------                            ----
gc view types all           1.01    869.0±4.44µs        ? ?/sec    1.00    857.4±3.13µs        ? ?/sec
gc view types slice half    1.01    432.2±1.25µs        ? ?/sec    1.00    426.5±1.04µs        ? ?/sec
view types slice            1.00    637.7±2.32ns        ? ?/sec    1.11    706.0±1.09ns        ? ?/sec

alamb · 2025-06-27T11:44:15Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-buffers (1bb37b2) to e42df82 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-buffers
Results will be posted here when complete

alamb · 2025-06-27T11:49:52Z

I tried to reproduce the benchmarks for the concatenate_kernel

These benchmarks seem to show 10-20% slowdown:

group                                                          arc-buffers                            main
-----                                                          -----------                            ----
...
concat utf8_view  max_str_len=20 null_density=0                1.15     88.5±0.40µs        ? ?/sec    1.00     77.2±0.45µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.13     95.1±0.28µs        ? ?/sec    1.00     83.9±0.52µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.18     47.4±3.25µs        ? ?/sec    1.00     40.1±3.28µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.17     56.3±3.89µs        ? ?/sec    1.00     48.2±3.63µs        ? ?/sec

I used this command

cargo bench --bench concatenate_kernel -- "utf8_view  max_str_len=20 null_density=0"

I ran it on main and then on this branch and I didn't see any significant difference. Maybe some of the subsequent cleanups since when I last ran benchmarks will make sense.

Lets see what the next round says

Details of run on `main`

andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$ cargo bench --bench concatenate_kernel -- "utf8_view  max_str_len=20 null_density=0"
    Finished `bench` profile [optimized] target(s) in 0.11s
     Running benches/concatenate_kernel.rs (target/release/deps/concatenate_kernel-b6004627e93b2be2)
concat utf8_view  max_str_len=20 null_density=0
                        time:   [36.989 µs 37.027 µs 37.072 µs]
                        change: [−2.1520% −1.4034% −0.7385%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

concat utf8_view  max_str_len=20 null_density=0.2
                        time:   [39.854 µs 39.939 µs 40.047 µs]
                        change: [−0.0378% +0.4927% +0.9618%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 25 outliers among 100 measurements (25.00%)
  15 (15.00%) low mild
  2 (2.00%) high mild
  8 (8.00%) high severe

Details of run on `arc-buffers`

andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$ cargo bench --bench concatenate_kernel -- "utf8_view  max_str_len=20 null_density=0"
    Finished `bench` profile [optimized] target(s) in 0.09s
     Running benches/concatenate_kernel.rs (target/release/deps/concatenate_kernel-b6004627e93b2be2)
concat utf8_view  max_str_len=20 null_density=0
                        time:   [37.007 µs 37.062 µs 37.120 µs]
                        change: [−3.5033% −2.6414% −1.9181%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

concat utf8_view  max_str_len=20 null_density=0.2
                        time:   [39.346 µs 39.433 µs 39.530 µs]
                        change: [−2.0976% −1.4141% −0.8057%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

alamb · 2025-06-27T11:53:22Z

🤖: Benchmark completed

Details

group                                                          arc-buffers                            main
-----                                                          -----------                            ----
concat 1024 arrays boolean 4                                   1.00     27.7±0.07µs        ? ?/sec    1.04     28.8±0.03µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     12.9±0.02µs        ? ?/sec    1.21     15.6±0.05µs        ? ?/sec
concat 1024 arrays str 4                                       1.00     54.2±0.37µs        ? ?/sec    1.02     55.4±0.42µs        ? ?/sec
concat boolean 1024                                            1.00    425.8±0.66ns        ? ?/sec    1.05    448.6±0.53ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00     44.1±0.20µs        ? ?/sec    1.15     50.9±0.06µs        ? ?/sec
concat boolean nulls 1024                                      1.00    722.1±1.25ns        ? ?/sec    1.08    781.9±1.03ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     96.3±0.20µs        ? ?/sec    1.14    109.5±0.21µs        ? ?/sec
concat fixed size lists                                        1.08   771.5±22.38µs        ? ?/sec    1.00   713.1±28.32µs        ? ?/sec
concat i32 1024                                                1.00    435.3±1.01ns        ? ?/sec    1.01    441.2±3.74ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.00    221.5±8.00µs        ? ?/sec    1.07    237.0±8.16µs        ? ?/sec
concat i32 nulls 1024                                          1.00    728.7±1.99ns        ? ?/sec    1.05    762.1±5.92ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    280.1±8.15µs        ? ?/sec    1.05    293.6±5.01µs        ? ?/sec
concat str 1024                                                1.11     14.2±1.19µs        ? ?/sec    1.00     12.8±0.78µs        ? ?/sec
concat str 8192 over 100 arrays                                1.01    108.0±0.90ms        ? ?/sec    1.00    107.4±0.77ms        ? ?/sec
concat str nulls 1024                                          1.07      6.6±0.60µs        ? ?/sec    1.00      6.2±0.62µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.02     54.4±0.47ms        ? ?/sec    1.00     53.4±0.30ms        ? ?/sec
concat str_dict 1024                                           1.00      2.9±0.01µs        ? ?/sec    1.06      3.1±0.01µs        ? ?/sec
concat str_dict_sparse 1024                                    1.01      6.9±0.04µs        ? ?/sec    1.00      6.9±0.03µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      6.9±0.04µs        ? ?/sec    1.01      6.9±0.08µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.01     77.9±0.42µs        ? ?/sec    1.00     77.6±0.30µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     83.3±0.38µs        ? ?/sec    1.01     84.3±0.85µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.15     88.9±0.30µs        ? ?/sec    1.00     77.5±0.38µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.13     94.6±0.47µs        ? ?/sec    1.00     83.6±0.35µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.00     46.9±2.68µs        ? ?/sec    1.02     48.0±4.02µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     53.8±3.10µs        ? ?/sec    1.02     55.0±3.70µs        ? ?/sec

alamb · 2025-06-27T11:53:25Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-buffers (1bb37b2) to e42df82 diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-buffers
Results will be posted here when complete

alamb · 2025-06-27T11:59:04Z

Slice certainly looks faster now 🎉

view types slice            1.00    637.7±2.32ns        ? ?/sec    1.11    706.0±1.09ns        ? ?/sec

🤔 it still seems concat shows some regression

concat utf8_view  max_str_len=20 null_density=0                1.15     88.9±0.30µs        ? ?/sec    1.00     77.5±0.38µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.13     94.6±0.47µs        ? ?/sec    1.00     83.6±0.35µs        ? ?/sec

However, I can't reproduce it on my Mac M3 -- maybe it is something to do with the benchmark machine (which is an x86) -- I'll try and reproduce there later today

alamb · 2025-06-27T12:05:11Z

arrow-array/src/array/byte_view_array.rs

        let len = array.len();
-        array.buffers.insert(0, array.views.into_inner());
+        let new_buffers = {


I wonder if this code is doing an extra allocation (namely it is now making a new Vec::with_capacity rather than reusing the previous buffers as it was before)

I can reproduce the slowdown for the concat kernel on my laptop.

For this allocation, I believe we in turn save doing this allocation during clone() when using Arc<ViewBuffers>...

Nevermind, it appears that I can't consistently reproduce the slowdown.

yeah it is really strange

alamb · 2025-06-27T12:07:57Z

arrow-array/src/array/byte_view_array.rs

@@ -609,8 +617,9 @@ impl<T: ByteViewType + ?Sized> Array for GenericByteViewArray<T> {

    fn shrink_to_fit(&mut self) {
        self.views.shrink_to_fit();
-        self.buffers.iter_mut().for_each(|b| b.shrink_to_fit());
-        self.buffers.shrink_to_fit();
+        if let Some(buffers) = Arc::get_mut(&mut self.buffers.0) {


I think this changes the semantics slightly -- and it only shrinks the buffers if they aren't shared

Maybe it should be using Arc::make_mut 🤔

The underlying Buffers are reference counted too, and only shrink themselves if the reference count is 1. So while this does change the semantics slightly, I don't think it changes as much in practice: When the Arc is shared, the (current) alternative would be to store references to the same buffers in another Vec - thus incrementing the reference counts on the underlying buffers and making their shrink_to_fit a no-op.

That's also why make_mut wont lead to more shrinking

alamb · 2025-06-27T12:09:26Z

🤖: Benchmark completed

Details

group                                                                                arc-buffers                            main
-----                                                                                -----------                            ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.00    255.8±2.58ms        ? ?/sec    1.01    258.6±1.94ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.00      8.7±0.12ms        ? ?/sec    1.01      8.8±0.10ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      4.2±0.08ms        ? ?/sec    1.01      4.3±0.10ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.01      3.5±0.12ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.00    237.9±2.33ms        ? ?/sec    1.03    245.5±2.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.00      9.9±0.08ms        ? ?/sec    1.05     10.3±0.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.00      4.6±0.14ms        ? ?/sec    1.08      4.9±0.10ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.5±0.02ms        ? ?/sec    1.03      4.7±0.01ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.00     58.3±1.37ms        ? ?/sec    1.08     63.1±1.32ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.00     11.8±0.16ms        ? ?/sec    1.01     11.9±0.09ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.00      9.9±0.19ms        ? ?/sec    1.00      9.9±0.40ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.00      8.5±0.20ms        ? ?/sec    1.00      8.6±0.29ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.00     78.9±0.68ms        ? ?/sec    1.05     83.2±0.37ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     13.3±0.09ms        ? ?/sec    1.05     13.9±0.13ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.00     10.0±0.31ms        ? ?/sec    1.06     10.6±0.40ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.03     10.0±0.18ms        ? ?/sec    1.00      9.7±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.00     47.6±0.08ms        ? ?/sec    1.07     51.1±0.31ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.00      6.0±0.04ms        ? ?/sec    1.06      6.3±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.02      5.0±0.21ms        ? ?/sec    1.00      4.9±0.29ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.05      3.3±0.04ms        ? ?/sec    1.00      3.2±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.00     60.7±0.22ms        ? ?/sec    1.08     65.3±0.38ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.00      7.7±0.04ms        ? ?/sec    1.15      8.9±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.00      5.5±0.26ms        ? ?/sec    1.06      5.8±0.20ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.9±0.02ms        ? ?/sec    1.04      4.0±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.00     41.7±0.08ms        ? ?/sec    1.08     45.2±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.7±0.02ms        ? ?/sec    1.07      5.0±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.00      2.6±0.19ms        ? ?/sec    1.01      2.7±0.17ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.02      2.4±0.01ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.00     51.2±0.14ms        ? ?/sec    1.14     58.4±0.22ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.00      6.9±0.02ms        ? ?/sec    1.12      7.7±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.00      3.6±0.18ms        ? ?/sec    1.13      4.1±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.00      4.7±0.02ms        ? ?/sec    1.00      4.7±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.00     64.5±0.10ms        ? ?/sec    1.05     68.0±0.64ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.04      8.2±0.02ms        ? ?/sec    1.00      7.9±0.04ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.06      4.6±0.15ms        ? ?/sec    1.00      4.4±0.20ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.00      3.8±0.02ms        ? ?/sec    1.02      3.9±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.00     82.2±0.21ms        ? ?/sec    1.14     93.8±0.60ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.00     11.5±0.05ms        ? ?/sec    1.03     11.8±0.03ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.02      6.1±0.09ms        ? ?/sec    1.00      6.0±0.18ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      6.4±0.06ms        ? ?/sec    1.01      6.4±0.02ms        ? ?/sec

alamb · 2025-06-27T12:09:30Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-buffers (1bb37b2) to e42df82 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-buffers
Results will be posted here when complete

alamb · 2025-06-27T12:32:38Z

🤖: Benchmark completed

Details

group                                                                         arc-buffers                            main
-----                                                                         -----------                            ----
filter context decimal128 (kept 1/2)                                          1.10     47.7±8.83µs        ? ?/sec    1.00     43.2±6.19µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     49.7±1.42µs        ? ?/sec    1.00     49.7±1.23µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    242.1±0.25ns        ? ?/sec    1.01    243.7±0.32ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     97.7±0.14µs        ? ?/sec    1.00     98.0±0.19µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.00     13.6±0.51µs        ? ?/sec    1.02     13.9±0.46µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.20    578.0±1.32ns        ? ?/sec    1.00    482.4±0.58ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     70.7±0.60µs        ? ?/sec    1.00     70.9±0.18µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     70.6±0.09µs        ? ?/sec    1.01     71.0±0.21µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     70.6±0.12µs        ? ?/sec    1.00     70.9±0.08µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     70.7±0.10µs        ? ?/sec    1.00     70.9±0.08µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     70.7±0.18µs        ? ?/sec    1.00     70.9±0.11µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     70.7±0.15µs        ? ?/sec    1.00     70.9±0.09µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     70.7±0.08µs        ? ?/sec    1.00     70.9±0.12µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     70.6±0.09µs        ? ?/sec    1.00     70.9±0.08µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     70.7±0.15µs        ? ?/sec    1.00     70.8±0.06µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     22.6±0.04µs        ? ?/sec    1.00     22.6±0.05µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.04      6.7±0.48µs        ? ?/sec    1.00      6.4±0.33µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.00    244.7±0.30ns        ? ?/sec    1.02    248.4±0.47ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     93.9±0.16µs        ? ?/sec    1.00     94.1±0.15µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.03     14.1±0.74µs        ? ?/sec    1.00     13.7±0.50µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.01    478.4±0.55ns        ? ?/sec    1.00    472.9±0.88ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.04    118.1±7.48µs        ? ?/sec    1.00    113.9±5.88µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00     58.3±1.37µs        ? ?/sec    1.00     58.1±1.24µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.02    688.7±3.66ns        ? ?/sec    1.00    677.0±0.80ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    116.7±1.17µs        ? ?/sec    1.03    120.0±7.20µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00     57.5±1.42µs        ? ?/sec    1.02     58.8±1.29µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.04    507.3±0.87ns        ? ?/sec    1.00    486.4±0.55ns        ? ?/sec
filter context string (kept 1/2)                                              1.00   590.4±11.72µs        ? ?/sec    1.01   596.2±13.59µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.01     23.4±0.12µs        ? ?/sec    1.00     23.3±0.05µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00      7.1±0.29µs        ? ?/sec    1.08      7.7±0.42µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.01    819.6±2.14ns        ? ?/sec    1.00    813.5±1.62ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     94.7±0.21µs        ? ?/sec    1.00     94.9±0.30µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00     14.4±0.52µs        ? ?/sec    1.03     14.7±0.43µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00   1060.4±1.27ns        ? ?/sec    1.00   1062.4±6.53ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.00   626.6±15.05µs        ? ?/sec    1.00   624.9±18.95µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00   1089.6±2.57ns        ? ?/sec    1.00   1091.5±1.19ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     18.8±0.03µs        ? ?/sec    1.01     18.9±0.02µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.00   1805.7±8.20ns        ? ?/sec    1.00  1802.6±10.97ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.00    232.1±0.28ns        ? ?/sec    1.04    240.3±0.35ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     89.8±0.07µs        ? ?/sec    1.00     90.2±0.16µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      8.6±0.02µs        ? ?/sec    1.03      8.9±0.02µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.19    558.3±0.76ns        ? ?/sec    1.00    470.0±0.35ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.02     97.7±0.37µs        ? ?/sec    1.00     96.2±0.40µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.00     51.7±1.03µs        ? ?/sec    1.02     52.7±1.35µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.25      3.0±0.00µs        ? ?/sec    1.00      2.4±0.00µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.01    198.7±0.42µs        ? ?/sec    1.00    196.7±0.23µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.01    150.1±0.71µs        ? ?/sec    1.00    149.2±0.18µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.01     70.5±1.78µs        ? ?/sec    1.00     70.1±2.56µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.21      3.2±0.01µs        ? ?/sec    1.00      2.6±0.00µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    152.6±0.60µs        ? ?/sec    1.00    152.7±0.20µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00     10.7±0.65µs        ? ?/sec    1.10     11.7±0.65µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.23      3.1±0.01µs        ? ?/sec    1.00      2.5±0.00µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.00    188.5±4.41µs        ? ?/sec    1.03    193.9±4.42µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.02    213.7±8.33µs        ? ?/sec    1.00    209.8±7.52µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.27      3.2±0.00µs        ? ?/sec    1.00      2.5±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.01     92.2±0.12µs        ? ?/sec    1.00     91.6±0.11µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      8.8±0.39µs        ? ?/sec    1.02      9.0±0.38µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.26      3.1±0.00µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.02     84.2±0.21µs        ? ?/sec    1.00     82.7±0.17µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00      2.6±0.01µs        ? ?/sec    1.15      3.0±0.00µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.28      2.8±0.00µs        ? ?/sec    1.00      2.2±0.00µs        ? ?/sec
filter run array (kept 1/2)                                                   1.19    443.0±2.50µs        ? ?/sec    1.00    373.6±0.82µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.15    395.5±0.89µs        ? ?/sec    1.00    342.9±2.95µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.36    333.5±1.34µs        ? ?/sec    1.00    246.0±1.61µs        ? ?/sec
filter single record batch                                                    1.00     92.7±0.16µs        ? ?/sec    1.00     92.3±0.14µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.03     94.3±0.16µs        ? ?/sec    1.00     91.9±0.13µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      3.8±0.01µs        ? ?/sec    1.08      4.1±0.01µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.25      3.1±0.01µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec

alamb · 2025-06-27T14:06:49Z

arrow-array/src/view_buffers.rs

+///
+/// Similar to `Arc<Vec<Buffer>>` or `Arc<[Buffer]>`
+#[derive(Clone, Debug)]
+pub struct ViewBuffers(pub(crate) Arc<[Buffer]>);


One idea I had was instead of Arc<[Buffer>] what if we left it as Arc<Vec<Buffer>> so converting back/forth to Vec wasn't as costly 🤔

ShiKaiWi and others added 5 commits June 25, 2025 15:11

Use Arc<[Buffer]> instead of raw Vec<Buffer> in `GenericByteViewA…

8421b96

…rray` for faster `slice`

add benchmark case about view array slice

2d9e7af

Fixup rebase errors

9078199

Implement ViewBuffers abstraction

0748cc0

Use ViewBuffers abstraction

aaa2341

github-actions bot added the arrow Changes to the arrow crate label Jun 25, 2025

Add license

25e215a

alamb added the api-change Changes to the arrow API label Jun 25, 2025

This comment was marked as outdated.

Sign in to view

alamb reviewed Jun 25, 2025

View reviewed changes

arrow-array/src/array/byte_view_array.rs Outdated Show resolved Hide resolved

arrow-array/src/array/byte_view_array.rs Outdated Show resolved Hide resolved

arrow-array/benches/view_types.rs Show resolved Hide resolved

alamb added the next-major-release the PR has API changes and it waiting on the next major version label Jun 25, 2025

This comment was marked as outdated.

Sign in to view

ctsk added 2 commits June 25, 2025 23:07

Keep encapsulation in struct

9eda113

Clean up ArrayData::from(GenericByteViewArray<T>)

cd4bf18

Mini clean-up

cf19477

Merge remote-tracking branch 'apache/main' into arc-buffers

1bb37b2

alamb reviewed Jun 27, 2025

View reviewed changes

alamb mentioned this pull request Jun 27, 2025

POC: try using Arc<Vec<Buffer>> instead of Arc<[Buffer]> #7804

Draft

alamb reviewed Jun 27, 2025

View reviewed changes

Fix concat regression

5733877

ctsk force-pushed the arc-buffers branch from 6024078 to 5733877 Compare June 28, 2025 09:43

etseidl mentioned this pull request Jul 1, 2025

Release arrow-rs / parquet Major version 56.0.0 (July 2025) #7395

Open

11 tasks

Continued: Use Arc<[Buffer]> instead of raw Vec<Buffer> in GenericByteViewArray for faster slice #7773

Are you sure you want to change the base?

Continued: Use Arc<[Buffer]> instead of raw Vec<Buffer> in GenericByteViewArray for faster slice #7773

Uh oh!

Conversation

ctsk commented Jun 25, 2025

Which issue does this PR close?

What changed since !6427

Are there any user-facing changes?

Uh oh!

ctsk commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Jun 25, 2025

Uh oh!

This comment was marked as outdated.

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

alamb commented Jun 25, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

ctsk Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ctsk Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

ctsk Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ctsk commented Jun 25, 2025 •

edited

Loading

alamb commented Jun 27, 2025 •

edited

Loading

ctsk Jun 28, 2025 •

edited

Loading

ctsk Jun 28, 2025 •

edited

Loading