-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduling_group: improve scheduling group creation exception safety #2617
base: master
Are you sure you want to change the base?
Conversation
cb8d016
to
88c4e8a
Compare
The CI fails in Seastar.unit.rpc with timeout |
@@ -37,17 +37,66 @@ namespace seastar { | |||
namespace internal { | |||
|
|||
struct scheduling_group_specific_thread_local_data { | |||
using val_ptr = std::unique_ptr<void, void (*)(void*) noexcept>; | |||
using cfg_ptr = std::shared_ptr<scheduling_group_key_config>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be seastar::lw_shared_ptr<scheduling_group_key_config>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -37,17 +37,66 @@ namespace seastar { | |||
namespace internal { | |||
|
|||
struct scheduling_group_specific_thread_local_data { | |||
using val_ptr = std::unique_ptr<void, void (*)(void*) noexcept>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point in making it smart-pointer if you track construction/destruction by hand anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is to manage the dynamic memory allocation and free it automatically
inline auto& get_sg_data(unsigned sg_id) { | ||
return _scheduling_group_specific_data.per_scheduling_group_data[sg_id]; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need in those helpers, AFAICS, the existing get_scheduling_group_specific_thread_local_data()
(and Co) already provide access to the array of per_scheduling_group_data-s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it because in several places we access the sg data and we do
auto& sg_data = _scheduling_group_specific_data;
auto& this_sg = sg_data.per_scheduling_group_data[sg._id];
which I thought is a little cumbersome and I didn't find another method for this
88c4e8a
to
cd957c2
Compare
|
specific_val(const specific_val& other) = delete; | ||
specific_val& operator=(const specific_val& other) = delete; | ||
|
||
specific_val(specific_val&& other) : valp(std::move(other.valp)), cfg(other.cfg) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not cfg(std::move(other.cfg))
? In general, the lifetime of specific_val::cfg
is unclear.
Here other.cfg
is kept non-nullptr, what for? Presumably (but maybe I'm wrong) this is to make ~specific_val()
avoid checking if cfg
exists or not? But the default constructor specific_val()
sets cfg to nullptr, so it looks like cfg
can be null, so why not std::move() it here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, I changed to std::move
there was no special reason to copy instead of move, I thought it doesn't matter much, but I suppose it's better to move
}; | ||
std::array<per_scheduling_group, max_scheduling_groups()> per_scheduling_group_data; | ||
std::map<unsigned long, scheduling_group_key_config> scheduling_group_key_configs; | ||
std::map<unsigned long, cfg_ptr> scheduling_group_key_configs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this map is no longer needed? The very came config can be obtained via per_scheduling_group_data[context.sg_id].specific_vals[key_id].cfg
? I'm not proposing to change anything right now, just checking if my understanding is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still used in init_scheduling_group
when we init a new scheduling group and go over all the keys to allocate them for the new sg
val_ptr valp; | ||
cfg_ptr cfg; | ||
|
||
specific_val() : valp(nullptr, &free), cfg(nullptr) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't free
defaulted by compiler for unique-ptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default uses delete
and it doesn't work because can't delete pointer to incomplete type
using val_ptr = internal::scheduling_group_specific_thread_local_data::val_ptr; | ||
using specific_val = internal::scheduling_group_specific_thread_local_data::specific_val; | ||
|
||
val_ptr valp(aligned_alloc(cfg->alignment, cfg->allocation_size), &free); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't free
defaulted by compiler for unique-ptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answered above
|
||
val_ptr valp(aligned_alloc(cfg->alignment, cfg->allocation_size), &free); | ||
if (!valp) { | ||
throw std::runtime_error("memory allocation failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was std::abort()
before this patch. Why did you change that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the intention of the patch is to improve error handling and allow the node to continue to function in face of unexpected errors
so if we can handle exceptions in this path I don't think there's reason anymore to abort in this specific case
}); | ||
} | ||
} | ||
return make_ready_future(); | ||
}).then([this, key_id, cfgp] () { | ||
_scheduling_group_specific_data.scheduling_group_key_configs[key_id] = std::move(cfgp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to mention this change in patch comment:
We also reorder the initialization order to make it safer. For
example, when creating a scheduling group, first allocate all data and
then swap it into the scheduling group's data structure.
but don't explain the motivation. Why can't scheduling_group_key_config[key_id] be assigned where it was before this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is to avoid the case of a partially initialized scheduling group.
if we allocate some of the values, transfer the ownership to the sg, and then fail, then we remain with a partially initialized scheduling which could cause problems, like if someone assumes it's all be initialized, or if these allocations "leak" because the operation is considered to have failed but we still store the data.
so instead we either allocate all keys or none
Move the function allocate_scheduling_group_specific_data from reactor class to an internal static function. Change it to handle only the allocation and construction of the data object, while the caller handles the assignment of it.
Improve handling of exceptions during scheduling group and scheduling group key creation, where a user-provided constructor for the keys may fail, for example. We use a new struct `specific_val` and smart pointers to manage memory allocation, construction and destruction of scheduling group data in a safe manner. We also reorder the initialization order to make it safer. For example, when creating a scheduling group, first allocate all data and then swap it into the scheduling group's data structure. Fixes scylladb#2222
cd957c2
to
a9615eb
Compare
|
Improve handling of exceptions during scheduling group and scheduling group key creation, where a user-provided constructor for the keys may fail, for example.
We use a new struct
specific_val
and smart pointers to manage memory allocation, construction and destruction of scheduling group data in a safe manner.We also reorder the initialization order to make it safer. For example, when creating a scheduling group, first allocate all data and then swap it into the scheduling group's data structure.
Fixes #2222