Skip to content

Conversation

@IvoDD
Copy link
Collaborator

@IvoDD IvoDD commented Dec 4, 2025

Reference Issues/PRs

Monday ref: 18053251438

What does this implement or fix?

After the MemBlock refactor we can now store packed bits inside the ChunkedBuffer in arrow_data_to_segment.

We later unpack the bools inside WriteToSegmentTask which runs in parallel for all segment slices.

Any other comments?

Timings to write a 10million rows x 10 bool columns

Number of threads Before After
1 214.94 ms 213.27 ms
2 214.14 ms 208.27 ms
4 215.44 ms 199.38 ms
8 214.11 ms 198.96 ms

Note that there was still big variance of +- 20ms on my machine but the above is averaged over 200 runs

Dummy benchmar script because of unreliable asv benchmarks:

ac = adb.Arctic("lmdb:///tmp/test")
ac.delete_library("benchmark")
lib = ac.create_library("benchmark", output_format="PYARROW")
lib._nvs._set_allow_arrow_input()

num_rows = 10_000_000
num_cols = 10
table = pa.table({f"bool_{i}": pa.array([j % (i + 1) == 0 for j in range(num_rows)]) for i in range(num_cols)})

for num_threads in [1, 2, 4, 8]:
    with adb.util.test.config_context("VersionStore.NumCPUThreads", num_threads):
        adb_async.reinit_task_scheduler()
        time = timeit.timeit(lambda: lib.write("sym", table), number=50) / 50
        print(f"Time for {num_threads} is {time*1000:.2f} ms")

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@IvoDD IvoDD added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Dec 4, 2025
@IvoDD IvoDD force-pushed the arrow-parallel-bool-unpacking branch from 596235c to a98c46f Compare December 4, 2025 14:23
@IvoDD IvoDD force-pushed the mem-block-refactor branch from 7822686 to 21acd7d Compare December 4, 2025 14:24
@IvoDD IvoDD changed the title Parallelise bool unpacking for arrow write [18053251438] Parallelise bool unpacking for arrow write Dec 4, 2025
@IvoDD IvoDD marked this pull request as ready for review December 4, 2025 14:33
@IvoDD IvoDD force-pushed the arrow-parallel-bool-unpacking branch from a98c46f to 1c974c7 Compare December 8, 2025 08:08
@IvoDD IvoDD force-pushed the mem-block-refactor branch from 21acd7d to 1753b0d Compare December 8, 2025 08:08
Copy link
Collaborator

@alexowens90 alexowens90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you done any benchmarking to see what the performance improvement is compared to the previous implementation?

@IvoDD IvoDD force-pushed the mem-block-refactor branch 2 times, most recently from 9d9b0bc to 0662547 Compare December 9, 2025 13:07

// ExternalPackedMemBlock implementation
ExternalPackedMemBlock::ExternalPackedMemBlock(
const uint8_t* data, size_t size, size_t shift, size_t offset, entity::timestamp ts, bool owning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does size mean number of bits? I was left with the impression that all block constructors take in the number of bytes and this confused me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On ExternalMemBlocks the size is the logical_size. Renamed in previous PR to reflect that.

util::check(
block->get_type() == MemBlockType::EXTERNAL_PACKED,
"Expected to see a packed external block but got: {}",
static_cast<int8_t>(block->get_type())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't mean much to people reading the error. Can you add fmt override for the enum values

"Expected to see a packed external block but got: {}",
static_cast<int8_t>(block->get_type())
);
const auto packed_block = dynamic_cast<ExternalPackedMemBlock*>(block);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already established that it's ExternalPackedMemBlock, static_casting will be safe if performance is crucial. If you insist on using dynamic_cast you can then then check if the result is !nullptr to avoid the virtual call to block->get_type() on the happy path. If it's not you'd still want to call block->get_type() but then we're throwing so the virtual call will be dwarfed by the cost of the exception

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to static cast

static_cast<int8_t>(block->get_type())
);
const auto packed_block = dynamic_cast<ExternalPackedMemBlock*>(block);
auto num_bits = std::min(packed_block->logical_bytes() - offset_in_block, bytes - pos_in_res);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be multiplied by 8 to be number of bits?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in previous PR to logical_size to reflect that this is indeed the number of bits.

@IvoDD IvoDD force-pushed the mem-block-refactor branch 3 times, most recently from a739aec to d3efb32 Compare December 12, 2025 08:03
@IvoDD IvoDD force-pushed the arrow-parallel-bool-unpacking branch 2 times, most recently from 949528b to b5978cb Compare December 12, 2025 16:43
@IvoDD IvoDD force-pushed the mem-block-refactor branch from d3efb32 to 1aa9c4f Compare December 23, 2025 11:12
@IvoDD IvoDD force-pushed the arrow-parallel-bool-unpacking branch from b5978cb to 650e430 Compare December 23, 2025 11:13
@IvoDD IvoDD force-pushed the mem-block-refactor branch from 1aa9c4f to 5ebe971 Compare December 23, 2025 12:49
After the MemBlock refactor we can now store packed bits inside the
ChunkedBuffer in `arrow_data_to_segment`.

We later unpack the bools inside `WriteToSegmentTask` which runs in
parallel for all segment slices.
@IvoDD IvoDD force-pushed the arrow-parallel-bool-unpacking branch from 650e430 to 7283525 Compare December 23, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants