-
Notifications
You must be signed in to change notification settings - Fork 159
Implement merge update for timeseries matching on the index #2781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vasil-pashov
wants to merge
53
commits into
vasil.pashov/feature/merge
Choose a base branch
from
vasil.pashov/merge-update-using-write-clause
base: vasil.pashov/feature/merge
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
53 commits
Select commit
Hold shift + click to select a range
167cb2f
Add basic structure for read_modify_write
vasil-pashov 580cf14
Working version of read modify write
vasil-pashov abe52e3
Implement write clause
vasil-pashov 59af346
Make the write clause store future components so that it does not dea…
vasil-pashov 97549b6
Fix a bug with empty dataframes. Test all filtering
vasil-pashov dfda400
Fix col range for multiindex
vasil-pashov 0473e65
wip
vasil-pashov 3490e74
Compact rows function
vasil-pashov b613264
Add comments
vasil-pashov 769e271
Fix compilation errors
vasil-pashov 83a8065
Fix compilation errors
vasil-pashov c57fdc0
Fix compilation errors
vasil-pashov 27cbe29
Apply comments
vasil-pashov 5f94216
Add C++ stress test to check for deadlocks in write clause
vasil-pashov 12e2c28
Add overloat for async_write that performs the encoding in the curren…
vasil-pashov 658e2dc
Structure write clause by row slice
vasil-pashov 056b8d8
Fix resampling tests
vasil-pashov c28903b
Address review comments
vasil-pashov 6c9438a
Extract the variant match in async write to a separate function
vasil-pashov 809d961
Fix resampling tests
vasil-pashov 6615c97
Prepare for merge update
vasil-pashov 556fdce
Add merge clause skeleton
vasil-pashov 7c861b2
Implement structure for processing
vasil-pashov fca60b9
WIP
vasil-pashov 38ac5b5
Unify iteration over source columns
vasil-pashov ab71903
Add utility functions for generating columns, native tensors and segm…
vasil-pashov 9b80b7a
Add one more utility function for generating a dense segment in memory
vasil-pashov 0e1921c
Add more testing C++ utils
vasil-pashov 8691d88
WIP on test
vasil-pashov 3d02c50
Fixes to C++ tests
vasil-pashov 40610a6
Passing unit test
vasil-pashov b1052de
Fix input_frame_from_tensors
vasil-pashov 25eea67
Merge branch 'master' into vasil.pashov/merge-update-using-write-clause
vasil-pashov dfc8496
Make python API for merge propagate to C++
vasil-pashov 437fee0
Merge branch 'vasil.pashov/feature/merge' into vasil.pashov/merge-upd…
vasil-pashov 0687e03
vcpkg
vasil-pashov 29a28e5
Fix custom formatters for gtest
vasil-pashov c88363d
Split utils from formatters
vasil-pashov 967b70c
Fix compilation issues on the CI
vasil-pashov 51f8854
Fix unreachable code
vasil-pashov 07580e9
Fix windows build issues
546ccb7
Fix use after stack free
vasil-pashov 336bae9
Fix out ouf bounds access
vasil-pashov c5f0516
Fix column selection when the source is SegmetInMemory
vasil-pashov ba437a8
Python tests for string column pass
vasil-pashov 8431069
Refactor
vasil-pashov 051af2c
Add comments
vasil-pashov 686d8d7
Fix concept
vasil-pashov 87ac39d
Address comments
vasil-pashov 993139e
Handle sequences of segments having a single index value
vasil-pashov f184883
Fix split descriptor
vasil-pashov a4b5844
Address comments
vasil-pashov 607ff55
Merge branch 'vasil.pashov/merge-update-string-columns' into vasil.pa…
vasil-pashov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| /* Copyright 2025 Man Group Operations Limited | ||
| * | ||
| * Use of this software is governed by the Business Source License 1.1 included in the file licenses/BSL.txt. | ||
| * | ||
| * As of the Change Date specified in that file, in accordance with the Business Source License, use of this software | ||
| * will be governed by the Apache License, version 2.0. | ||
| */ | ||
|
|
||
| #include <arcticdb/column_store/segment_utils.hpp> | ||
| #include <arcticdb/column_store/column.hpp> | ||
| #include <arcticdb/util/configs_map.hpp> | ||
| #include <arcticdb/column_store/column_algorithms.hpp> | ||
|
|
||
| namespace arcticdb { | ||
|
|
||
| ankerl::unordered_dense::set<entity::position_t> unique_values_for_string_column(const Column& column) { | ||
| ankerl::unordered_dense::set<entity::position_t> output_set; | ||
| // Guessing that unique values is a third of the column length | ||
| // TODO would be useful to have actual unique count here from stats | ||
| static auto map_reserve_ratio = ConfigsMap::instance()->get_int("UniqueColumns.AllocationRatio", 3); | ||
| output_set.reserve(column.row_count() / map_reserve_ratio); | ||
|
|
||
| details::visit_type(column.type().data_type(), [&](auto col_desc_tag) { | ||
| using type_info = ScalarTypeInfo<decltype(col_desc_tag)>; | ||
| if constexpr (is_sequence_type(type_info::data_type)) { | ||
| arcticdb::for_each<typename type_info::TDT>(column, [&output_set](auto value) { | ||
| output_set.emplace(value); | ||
| }); | ||
| } else { | ||
| util::raise_rte("Column {} is not a string type column"); | ||
| } | ||
| }); | ||
| return output_set; | ||
| } | ||
|
Comment on lines
+16
to
+34
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was moved from |
||
|
|
||
| std::vector<StreamDescriptor> split_descriptor(const StreamDescriptor& descriptor, const size_t cols_per_segment) { | ||
| if (descriptor.fields().size() <= cols_per_segment) { | ||
| return std::vector{descriptor}; | ||
| } | ||
| const size_t num_segments = (descriptor.fields().size() + cols_per_segment - 1) / cols_per_segment; | ||
| std::vector<StreamDescriptor> res; | ||
| res.reserve(num_segments); | ||
|
|
||
| const unsigned field_count = descriptor.field_count(); | ||
| for (size_t i = 0, source_field = descriptor.index().field_count(); i < num_segments; ++i) { | ||
| StreamDescriptor partial(descriptor.id()); | ||
| if (descriptor.index().field_count() > 0) { | ||
| partial.set_index(descriptor.index()); | ||
| for (unsigned index_field = 0; index_field < descriptor.index().field_count(); ++index_field) { | ||
| partial.add_field(descriptor.field(index_field)); | ||
| } | ||
| } | ||
| for (size_t field = 0; field < cols_per_segment && source_field < field_count; ++field) { | ||
| partial.add_field(descriptor.field(source_field++)); | ||
| } | ||
| res.push_back(std::move(partial)); | ||
| } | ||
| return res; | ||
| } | ||
|
|
||
| } // namespace arcticdb | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be in the test sources