Skip to content

Continued: Use Arc<[Buffer]> instead of raw Vec<Buffer> in GenericByteViewArray for faster slice #7773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ctsk
Copy link
Contributor

@ctsk ctsk commented Jun 25, 2025

This PR is a continuation of the work by @ShiKaiWi in #6427.

Which issue does this PR close?

Please consult the description of #6427 for the details.

What changed since !6427

  • I've rebased the PR on the current main branch and fixed incompatibilities.
  • I added a ViewBuffers abstraction (analogous to Fields in Schema) that hides the internal representation (Arc<[Buffer]>) in the constructors of GenericByteViewArray.

Are there any user-facing changes?

Yes, these are breaking changes.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 25, 2025
@ctsk
Copy link
Contributor Author

ctsk commented Jun 25, 2025

In follow-up work, I want to expose the ViewBuffers representation to the kernels in arrow-select. The take kernel can then avoid allocating a new vector for the data buffers - I imagine other kernels will benefit aswell.

@alamb alamb added the api-change Changes to the arrow API label Jun 25, 2025
@alamb
Copy link
Contributor

alamb commented Jun 25, 2025

In follow-up work, I want to expose the ViewBuffers representation to the kernels in arrow-select. The take kernel can then avoid allocating a new vector for the data buffers - I imagine other kernels will benefit aswell.

Filter is another obvious candidate

@alamb

This comment was marked as outdated.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really nice @ctsk -- thank you

@alamb alamb added the next-major-release the PR has API changes and it waiting on the next major version label Jun 25, 2025
@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb
Copy link
Contributor

alamb commented Jun 25, 2025

I am not quite sure what to make of the benchmark results so far -- some of them may be noise -- we'll have to look into that further

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

I am starting to check this one out in more detail

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-buffers (1bb37b2) to e42df82 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-buffers
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖: Benchmark completed

Details

group                       arc-buffers                            main
-----                       -----------                            ----
gc view types all           1.01    869.0±4.44µs        ? ?/sec    1.00    857.4±3.13µs        ? ?/sec
gc view types slice half    1.01    432.2±1.25µs        ? ?/sec    1.00    426.5±1.04µs        ? ?/sec
view types slice            1.00    637.7±2.32ns        ? ?/sec    1.11    706.0±1.09ns        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-buffers (1bb37b2) to e42df82 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-buffers
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

I tried to reproduce the benchmarks for the concatenate_kernel

These benchmarks seem to show 10-20% slowdown:

group                                                          arc-buffers                            main
-----                                                          -----------                            ----
...
concat utf8_view  max_str_len=20 null_density=0                1.15     88.5±0.40µs        ? ?/sec    1.00     77.2±0.45µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.13     95.1±0.28µs        ? ?/sec    1.00     83.9±0.52µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.18     47.4±3.25µs        ? ?/sec    1.00     40.1±3.28µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.17     56.3±3.89µs        ? ?/sec    1.00     48.2±3.63µs        ? ?/sec

I used this command

cargo bench --bench concatenate_kernel -- "utf8_view  max_str_len=20 null_density=0"

I ran it on main and then on this branch and I didn't see any significant difference. Maybe some of the subsequent cleanups since when I last ran benchmarks will make sense.

Lets see what the next round says

Details of run on `main`

andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$ cargo bench --bench concatenate_kernel -- "utf8_view  max_str_len=20 null_density=0"
    Finished `bench` profile [optimized] target(s) in 0.11s
     Running benches/concatenate_kernel.rs (target/release/deps/concatenate_kernel-b6004627e93b2be2)
concat utf8_view  max_str_len=20 null_density=0
                        time:   [36.989 µs 37.027 µs 37.072 µs]
                        change: [−2.1520% −1.4034% −0.7385%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

concat utf8_view  max_str_len=20 null_density=0.2
                        time:   [39.854 µs 39.939 µs 40.047 µs]
                        change: [−0.0378% +0.4927% +0.9618%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 25 outliers among 100 measurements (25.00%)
  15 (15.00%) low mild
  2 (2.00%) high mild
  8 (8.00%) high severe

Details of run on `arc-buffers`

andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$ cargo bench --bench concatenate_kernel -- "utf8_view  max_str_len=20 null_density=0"
    Finished `bench` profile [optimized] target(s) in 0.09s
     Running benches/concatenate_kernel.rs (target/release/deps/concatenate_kernel-b6004627e93b2be2)
concat utf8_view  max_str_len=20 null_density=0
                        time:   [37.007 µs 37.062 µs 37.120 µs]
                        change: [−3.5033% −2.6414% −1.9181%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

concat utf8_view  max_str_len=20 null_density=0.2
                        time:   [39.346 µs 39.433 µs 39.530 µs]
                        change: [−2.0976% −1.4141% −0.8057%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖: Benchmark completed

Details

group                                                          arc-buffers                            main
-----                                                          -----------                            ----
concat 1024 arrays boolean 4                                   1.00     27.7±0.07µs        ? ?/sec    1.04     28.8±0.03µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     12.9±0.02µs        ? ?/sec    1.21     15.6±0.05µs        ? ?/sec
concat 1024 arrays str 4                                       1.00     54.2±0.37µs        ? ?/sec    1.02     55.4±0.42µs        ? ?/sec
concat boolean 1024                                            1.00    425.8±0.66ns        ? ?/sec    1.05    448.6±0.53ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00     44.1±0.20µs        ? ?/sec    1.15     50.9±0.06µs        ? ?/sec
concat boolean nulls 1024                                      1.00    722.1±1.25ns        ? ?/sec    1.08    781.9±1.03ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     96.3±0.20µs        ? ?/sec    1.14    109.5±0.21µs        ? ?/sec
concat fixed size lists                                        1.08   771.5±22.38µs        ? ?/sec    1.00   713.1±28.32µs        ? ?/sec
concat i32 1024                                                1.00    435.3±1.01ns        ? ?/sec    1.01    441.2±3.74ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.00    221.5±8.00µs        ? ?/sec    1.07    237.0±8.16µs        ? ?/sec
concat i32 nulls 1024                                          1.00    728.7±1.99ns        ? ?/sec    1.05    762.1±5.92ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    280.1±8.15µs        ? ?/sec    1.05    293.6±5.01µs        ? ?/sec
concat str 1024                                                1.11     14.2±1.19µs        ? ?/sec    1.00     12.8±0.78µs        ? ?/sec
concat str 8192 over 100 arrays                                1.01    108.0±0.90ms        ? ?/sec    1.00    107.4±0.77ms        ? ?/sec
concat str nulls 1024                                          1.07      6.6±0.60µs        ? ?/sec    1.00      6.2±0.62µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.02     54.4±0.47ms        ? ?/sec    1.00     53.4±0.30ms        ? ?/sec
concat str_dict 1024                                           1.00      2.9±0.01µs        ? ?/sec    1.06      3.1±0.01µs        ? ?/sec
concat str_dict_sparse 1024                                    1.01      6.9±0.04µs        ? ?/sec    1.00      6.9±0.03µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      6.9±0.04µs        ? ?/sec    1.01      6.9±0.08µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.01     77.9±0.42µs        ? ?/sec    1.00     77.6±0.30µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     83.3±0.38µs        ? ?/sec    1.01     84.3±0.85µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.15     88.9±0.30µs        ? ?/sec    1.00     77.5±0.38µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.13     94.6±0.47µs        ? ?/sec    1.00     83.6±0.35µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.00     46.9±2.68µs        ? ?/sec    1.02     48.0±4.02µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     53.8±3.10µs        ? ?/sec    1.02     55.0±3.70µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-buffers (1bb37b2) to e42df82 diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-buffers
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

Slice certainly looks faster now 🎉

view types slice            1.00    637.7±2.32ns        ? ?/sec    1.11    706.0±1.09ns        ? ?/sec

🤔 it still seems concat shows some regression

concat utf8_view  max_str_len=20 null_density=0                1.15     88.9±0.30µs        ? ?/sec    1.00     77.5±0.38µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.13     94.6±0.47µs        ? ?/sec    1.00     83.6±0.35µs        ? ?/sec

However, I can't reproduce it on my Mac M3 -- maybe it is something to do with the benchmark machine (which is an x86) -- I'll try and reproduce there later today

let len = array.len();
array.buffers.insert(0, array.views.into_inner());
let new_buffers = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this code is doing an extra allocation (namely it is now making a new Vec::with_capacity rather than reusing the previous buffers as it was before)

Copy link
Contributor Author

@ctsk ctsk Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce the slowdown for the concat kernel on my laptop.

For this allocation, I believe we in turn save doing this allocation during clone() when using Arc<ViewBuffers>...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it appears that I can't consistently reproduce the slowdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it is really strange

@@ -609,8 +617,9 @@ impl<T: ByteViewType + ?Sized> Array for GenericByteViewArray<T> {

fn shrink_to_fit(&mut self) {
self.views.shrink_to_fit();
self.buffers.iter_mut().for_each(|b| b.shrink_to_fit());
self.buffers.shrink_to_fit();
if let Some(buffers) = Arc::get_mut(&mut self.buffers.0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changes the semantics slightly -- and it only shrinks the buffers if they aren't shared

Maybe it should be using Arc::make_mut 🤔

Copy link
Contributor Author

@ctsk ctsk Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying Buffers are reference counted too, and only shrink themselves if the reference count is 1. So while this does change the semantics slightly, I don't think it changes as much in practice: When the Arc is shared, the (current) alternative would be to store references to the same buffers in another Vec - thus incrementing the reference counts on the underlying buffers and making their shrink_to_fit a no-op.

That's also why make_mut wont lead to more shrinking

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖: Benchmark completed

Details

group                                                                                arc-buffers                            main
-----                                                                                -----------                            ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.00    255.8±2.58ms        ? ?/sec    1.01    258.6±1.94ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.00      8.7±0.12ms        ? ?/sec    1.01      8.8±0.10ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      4.2±0.08ms        ? ?/sec    1.01      4.3±0.10ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.01      3.5±0.12ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.00    237.9±2.33ms        ? ?/sec    1.03    245.5±2.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.00      9.9±0.08ms        ? ?/sec    1.05     10.3±0.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.00      4.6±0.14ms        ? ?/sec    1.08      4.9±0.10ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.5±0.02ms        ? ?/sec    1.03      4.7±0.01ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.00     58.3±1.37ms        ? ?/sec    1.08     63.1±1.32ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.00     11.8±0.16ms        ? ?/sec    1.01     11.9±0.09ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.00      9.9±0.19ms        ? ?/sec    1.00      9.9±0.40ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.00      8.5±0.20ms        ? ?/sec    1.00      8.6±0.29ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.00     78.9±0.68ms        ? ?/sec    1.05     83.2±0.37ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     13.3±0.09ms        ? ?/sec    1.05     13.9±0.13ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.00     10.0±0.31ms        ? ?/sec    1.06     10.6±0.40ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.03     10.0±0.18ms        ? ?/sec    1.00      9.7±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.00     47.6±0.08ms        ? ?/sec    1.07     51.1±0.31ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.00      6.0±0.04ms        ? ?/sec    1.06      6.3±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.02      5.0±0.21ms        ? ?/sec    1.00      4.9±0.29ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.05      3.3±0.04ms        ? ?/sec    1.00      3.2±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.00     60.7±0.22ms        ? ?/sec    1.08     65.3±0.38ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.00      7.7±0.04ms        ? ?/sec    1.15      8.9±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.00      5.5±0.26ms        ? ?/sec    1.06      5.8±0.20ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.9±0.02ms        ? ?/sec    1.04      4.0±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.00     41.7±0.08ms        ? ?/sec    1.08     45.2±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.7±0.02ms        ? ?/sec    1.07      5.0±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.00      2.6±0.19ms        ? ?/sec    1.01      2.7±0.17ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.02      2.4±0.01ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.00     51.2±0.14ms        ? ?/sec    1.14     58.4±0.22ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.00      6.9±0.02ms        ? ?/sec    1.12      7.7±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.00      3.6±0.18ms        ? ?/sec    1.13      4.1±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.00      4.7±0.02ms        ? ?/sec    1.00      4.7±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.00     64.5±0.10ms        ? ?/sec    1.05     68.0±0.64ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.04      8.2±0.02ms        ? ?/sec    1.00      7.9±0.04ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.06      4.6±0.15ms        ? ?/sec    1.00      4.4±0.20ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.00      3.8±0.02ms        ? ?/sec    1.02      3.9±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.00     82.2±0.21ms        ? ?/sec    1.14     93.8±0.60ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.00     11.5±0.05ms        ? ?/sec    1.03     11.8±0.03ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.02      6.1±0.09ms        ? ?/sec    1.00      6.0±0.18ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      6.4±0.06ms        ? ?/sec    1.01      6.4±0.02ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-buffers (1bb37b2) to e42df82 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-buffers
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖: Benchmark completed

Details

group                                                                         arc-buffers                            main
-----                                                                         -----------                            ----
filter context decimal128 (kept 1/2)                                          1.10     47.7±8.83µs        ? ?/sec    1.00     43.2±6.19µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     49.7±1.42µs        ? ?/sec    1.00     49.7±1.23µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    242.1±0.25ns        ? ?/sec    1.01    243.7±0.32ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     97.7±0.14µs        ? ?/sec    1.00     98.0±0.19µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.00     13.6±0.51µs        ? ?/sec    1.02     13.9±0.46µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.20    578.0±1.32ns        ? ?/sec    1.00    482.4±0.58ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     70.7±0.60µs        ? ?/sec    1.00     70.9±0.18µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     70.6±0.09µs        ? ?/sec    1.01     71.0±0.21µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     70.6±0.12µs        ? ?/sec    1.00     70.9±0.08µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     70.7±0.10µs        ? ?/sec    1.00     70.9±0.08µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     70.7±0.18µs        ? ?/sec    1.00     70.9±0.11µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     70.7±0.15µs        ? ?/sec    1.00     70.9±0.09µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     70.7±0.08µs        ? ?/sec    1.00     70.9±0.12µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     70.6±0.09µs        ? ?/sec    1.00     70.9±0.08µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     70.7±0.15µs        ? ?/sec    1.00     70.8±0.06µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     22.6±0.04µs        ? ?/sec    1.00     22.6±0.05µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.04      6.7±0.48µs        ? ?/sec    1.00      6.4±0.33µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.00    244.7±0.30ns        ? ?/sec    1.02    248.4±0.47ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     93.9±0.16µs        ? ?/sec    1.00     94.1±0.15µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.03     14.1±0.74µs        ? ?/sec    1.00     13.7±0.50µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.01    478.4±0.55ns        ? ?/sec    1.00    472.9±0.88ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.04    118.1±7.48µs        ? ?/sec    1.00    113.9±5.88µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00     58.3±1.37µs        ? ?/sec    1.00     58.1±1.24µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.02    688.7±3.66ns        ? ?/sec    1.00    677.0±0.80ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    116.7±1.17µs        ? ?/sec    1.03    120.0±7.20µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00     57.5±1.42µs        ? ?/sec    1.02     58.8±1.29µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.04    507.3±0.87ns        ? ?/sec    1.00    486.4±0.55ns        ? ?/sec
filter context string (kept 1/2)                                              1.00   590.4±11.72µs        ? ?/sec    1.01   596.2±13.59µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.01     23.4±0.12µs        ? ?/sec    1.00     23.3±0.05µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00      7.1±0.29µs        ? ?/sec    1.08      7.7±0.42µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.01    819.6±2.14ns        ? ?/sec    1.00    813.5±1.62ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     94.7±0.21µs        ? ?/sec    1.00     94.9±0.30µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00     14.4±0.52µs        ? ?/sec    1.03     14.7±0.43µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00   1060.4±1.27ns        ? ?/sec    1.00   1062.4±6.53ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.00   626.6±15.05µs        ? ?/sec    1.00   624.9±18.95µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00   1089.6±2.57ns        ? ?/sec    1.00   1091.5±1.19ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     18.8±0.03µs        ? ?/sec    1.01     18.9±0.02µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.00   1805.7±8.20ns        ? ?/sec    1.00  1802.6±10.97ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.00    232.1±0.28ns        ? ?/sec    1.04    240.3±0.35ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     89.8±0.07µs        ? ?/sec    1.00     90.2±0.16µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      8.6±0.02µs        ? ?/sec    1.03      8.9±0.02µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.19    558.3±0.76ns        ? ?/sec    1.00    470.0±0.35ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.02     97.7±0.37µs        ? ?/sec    1.00     96.2±0.40µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.00     51.7±1.03µs        ? ?/sec    1.02     52.7±1.35µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.25      3.0±0.00µs        ? ?/sec    1.00      2.4±0.00µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.01    198.7±0.42µs        ? ?/sec    1.00    196.7±0.23µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.01    150.1±0.71µs        ? ?/sec    1.00    149.2±0.18µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.01     70.5±1.78µs        ? ?/sec    1.00     70.1±2.56µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.21      3.2±0.01µs        ? ?/sec    1.00      2.6±0.00µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    152.6±0.60µs        ? ?/sec    1.00    152.7±0.20µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00     10.7±0.65µs        ? ?/sec    1.10     11.7±0.65µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.23      3.1±0.01µs        ? ?/sec    1.00      2.5±0.00µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.00    188.5±4.41µs        ? ?/sec    1.03    193.9±4.42µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.02    213.7±8.33µs        ? ?/sec    1.00    209.8±7.52µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.27      3.2±0.00µs        ? ?/sec    1.00      2.5±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.01     92.2±0.12µs        ? ?/sec    1.00     91.6±0.11µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      8.8±0.39µs        ? ?/sec    1.02      9.0±0.38µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.26      3.1±0.00µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.02     84.2±0.21µs        ? ?/sec    1.00     82.7±0.17µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00      2.6±0.01µs        ? ?/sec    1.15      3.0±0.00µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.28      2.8±0.00µs        ? ?/sec    1.00      2.2±0.00µs        ? ?/sec
filter run array (kept 1/2)                                                   1.19    443.0±2.50µs        ? ?/sec    1.00    373.6±0.82µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.15    395.5±0.89µs        ? ?/sec    1.00    342.9±2.95µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.36    333.5±1.34µs        ? ?/sec    1.00    246.0±1.61µs        ? ?/sec
filter single record batch                                                    1.00     92.7±0.16µs        ? ?/sec    1.00     92.3±0.14µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.03     94.3±0.16µs        ? ?/sec    1.00     91.9±0.13µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      3.8±0.01µs        ? ?/sec    1.08      4.1±0.01µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.25      3.1±0.01µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec

///
/// Similar to `Arc<Vec<Buffer>>` or `Arc<[Buffer]>`
#[derive(Clone, Debug)]
pub struct ViewBuffers(pub(crate) Arc<[Buffer]>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea I had was instead of Arc<[Buffer>] what if we left it as Arc<Vec<Buffer>> so converting back/forth to Vec wasn't as costly 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate next-major-release the PR has API changes and it waiting on the next major version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make StringViewArray::slice() and BinaryViewArray::slice() faster / non allocating
3 participants