-
Notifications
You must be signed in to change notification settings - Fork 158
[18343808662] MemBlock refactor #2792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
3c5044d to
de733b8
Compare
| pos_ += type_size_; | ||
|
|
||
| if (pos_ >= block_->bytes()) { | ||
| if (pos_ >= block_->logical_bytes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting a tiny behavior change. The Column::Iterator was previously broken for blocks with extra_bytes>0
fd7c6a3 to
7822686
Compare
21acd7d to
1753b0d
Compare
| virtual void resize(size_t bytes) = 0; | ||
| virtual void check_magic() const = 0; | ||
| // External block specific methods | ||
| [[nodiscard]] virtual uint8_t* release() = 0; | ||
| virtual void abandon() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a code smell to have interface methods for specific subclasses. The alternative is to have callers of these methods check the return value of get_type and do a reinterpret_cast as appropriate though, which also feels clunky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah only 90% of the interface is common and the choices either to have them as methods on the interface or to do dynamic_cast after checking get_type. The other option would be to not use inheritance but rather std::variant which would be clunkier.
Main reason I chose having them in the interface was that presumably it will be faster to just read the vtable once instead of doing get_type which also reads vtable and dynamic_cast which needs to read the RTTI.
Compiler might optimize away some of this though.
Will add a comment explaining why I think using this instead of dynamic_cast is presumably faster. Happy to be overruled though.
Note reinterpret_cast would probably be undefined behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two cents:
- Both are ugly
- We have something that is similar to the current proposal e.g. not all clauses support all clause methods and they throw exceptions. Thus I think this is ok
- Reinterpret cast won't be UB https://en.cppreference.com/w/cpp/language/reinterpret_cast.html 5) handles this case.
- Checking the vtable is not that slow especially if it's not in a hot loop
- Adding
finalto the derived classes would help the compiler optimize vtables if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for clarifying. Will add final to all methods that it makes sense for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
folly::Poly doesn't have a concept of a base class implementation, which is why we have to explicitly throw from the derived classes for clauses. We could make the base class methods non-virtual and throw an exception, and only override in derived classes where the methods make sense.
|
Worth running all the Arrow tests and a smokescreen of non-Arrow tests with valgrind to make sure we haven't introduced any leaks. |
9d9b0bc to
0662547
Compare
| ASSERT_EQ(buffer.blocks()[idx]->physical_bytes(), rows_per_batch[idx] * sizeof(TypeParam)); | ||
| ASSERT_EQ(buffer.blocks()[idx]->logical_size(), rows_per_batch[idx] * sizeof(TypeParam)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These two names look odd side by side. Does it make sense to either rename physical_bytes to physical_size or logical_size to logical_bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have explicitly named logical_size to highlight that it does not refer to number of bytes. For the ExternalPackedBuffer buffer it refers to number of bits.
On the other hand physical_bytes does refer to the actual size of memory occupied in bytes. Thus I think the distinction is useful.
| virtual void resize(size_t bytes) = 0; | ||
| virtual void check_magic() const = 0; | ||
| // External block specific methods | ||
| [[nodiscard]] virtual uint8_t* release() = 0; | ||
| virtual void abandon() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two cents:
- Both are ugly
- We have something that is similar to the current proposal e.g. not all clauses support all clause methods and they throw exceptions. Thus I think this is ok
- Reinterpret cast won't be UB https://en.cppreference.com/w/cpp/language/reinterpret_cast.html 5) handles this case.
- Checking the vtable is not that slow especially if it's not in a hot loop
- Adding
finalto the derived classes would help the compiler optimize vtables if possible
cpp/arcticdb/column_store/block.hpp
Outdated
| [[nodiscard]] virtual uint8_t& operator[](size_t pos); | ||
| [[nodiscard]] const uint8_t* ptr(size_t pos) const; | ||
| [[nodiscard]] uint8_t* ptr(size_t pos); | ||
| [[nodiscard]] uint8_t* end() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExternalPackedMemBlock's override of operator[] throws. On the other hand you can call ptr() on it as it's part of the base class. But they are mostly doing the same thing. I'd remove the virtual operator[].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks, the idea was ptr to use the operator[] but I missed that. Will remove operator[] as it's not used anywhere meaningfully.
cpp/arcticdb/column_store/block.cpp
Outdated
| ExternalMemBlock::ExternalMemBlock( | ||
| uint8_t* data, size_t size, size_t offset, entity::timestamp ts, bool owning, size_t extra_bytes | ||
| ) : | ||
| bytes_(size), | ||
| offset_(offset), | ||
| timestamp_(ts), | ||
| external_data_(data), | ||
| owns_external_data_(owning), | ||
| extra_bytes_(extra_bytes) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this. The other constructor's doing the same thing.
cpp/arcticdb/column_store/block.cpp
Outdated
| } | ||
|
|
||
| void ExternalMemBlock::abandon() { | ||
| free_detachable_memory(external_data_, physical_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check if this is owning the external data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good catch. Previously we were only doing an assert here, but now we use it in abandon_block, so using an if.
cpp/arcticdb/column_store/block.cpp
Outdated
|
|
||
| void ExternalMemBlock::resize(size_t bytes) { util::raise_rte("Can't resize a non dynamic block. Bytes: {}", bytes); } | ||
|
|
||
| void ExternalMemBlock::check_magic() const {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not releated to your changes but I think we should either get rid of the magic or start using it. AFAIK it was added for the V2 encoding which used native structures instead of protobuf. The magic was placed at the end of the buffer containing some object. This way you could
- Inspect the raw byte buffer and get a sense of what's in it
- Verify that something was loaded correctly
I don't think the mem blocks need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, removing entirely
|
|
||
| private: | ||
| size_t logical_size_ = 0UL; | ||
| size_t shift_ = 0UL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, but a comment about what shift is will be helpful
|
|
||
| static_assert(sizeof(BlockType) == BlockType::Align + BlockType::MinSize); | ||
| static_assert(DefaultBlockSize >= BlockType::MinSize); | ||
| static_assert(sizeof(DynamicMemBlock) == DynamicMemBlock::Align + DynamicMemBlock::MinSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes more sense for this to be near the DynamicMemBlock implementation rather than here
a739aec to
d3efb32
Compare
This is only a refactor to prepare for parallel bool unpacking and should not change existing behavior. It splits up the `MemBlock` into two separate classes inheriting a common interface `IMemBlock`: - `DynamicMemBlock` is responsible for the `AllocationType::DYNAMIC` and has a capacity which can gradually be filled up with resizes. This is the type of memory blocks we use for in memory operations with `ChunkedBuffer`s e.g. when tick streaming or when performing appends etc. - `ExternalMemBlock` is responsible for the `AllocationType::DETACHABLE` and for external pointers to views. It just holds a pointer to external memory which can be owning or not. This is what we use when either when returning Segments to python or when receiving views from python This change also adds the scaffolding needed for the packed bits buffer via `ExternalPackedMemBlock`. It will be used in a follow up PR to parallelise bool unpacking.
d3efb32 to
1aa9c4f
Compare
The checks in particular getting the `physical_bytes` introduced a slowdown in `BM_iteration_via_scalar_at` because: - Previously `bytes()` call could be inlined and it can no longer be inlined because of the runtime polymorphism - The non inlined function calls are taking a big part of the time. Removing the check in `operator[]` removes two function calls in release builds. This brings performance roughly inline to before the refactor.
1aa9c4f to
5ebe971
Compare
Reference Issues/PRs
Monday ref: 18343808662
What does this implement or fix?
This is only a refactor to prepare for parallel bool unpacking and should not change existing behavior.
It splits up the
MemBlockinto two separate classes inheriting a common interfaceIMemBlock:DynamicMemBlockis responsible for theAllocationType::DYNAMICand has a capacity which can gradually be filled up with resizes. This is the type of memory blocks we use for in memory operations withChunkedBuffers e.g. when tick streaming or when performing appends etc.ExternalMemBlockis responsible for theAllocationType::DETACHABLEand for external pointers to views. It just holds a pointer to external memory which can be owning or not. This is what we use when either when returning Segments to python or when receiving views from pythonThis change also adds the scaffolding needed for the packed bits buffer via
ExternalPackedMemBlock. It will be used in a follow up PR to parallelise bool unpacking.Any other comments?
Also introduced a few micro benchmarks from which the results are:
The benchmarks are run with:
I've not included the larger chunk size because that is identical for before and after. The only statistically significant difference is the allocation on a
DETACHABLEsegment improvement. That is because forExternalMemBlocks we no longer need to allocate the intrinsic data that was previously always included in theMemBlock.Checklist
Checklist for code changes...