scheduling_group: improve scheduling group creation exception safety #2617

mlitvk · 2025-01-15T13:49:41Z

Improve handling of exceptions during scheduling group and scheduling group key creation, where a user-provided constructor for the keys may fail, for example.

We use a new struct specific_val and smart pointers to manage memory allocation, construction and destruction of scheduling group data in a safe manner.

We also reorder the initialization order to make it safer. For example, when creating a scheduling group, first allocate all data and then swap it into the scheduling group's data structure.

Fixes #2222

mlitvk · 2025-01-16T07:31:20Z

The CI fails in Seastar.unit.rpc with timeout
This is a known issue: #2620

xemul · 2025-01-16T09:54:37Z

include/seastar/core/scheduling_specific.hh

@@ -37,17 +37,66 @@ namespace seastar {
 namespace internal {

 struct scheduling_group_specific_thread_local_data {
+    using val_ptr = std::unique_ptr<void, void (*)(void*) noexcept>;
+    using cfg_ptr = std::shared_ptr<scheduling_group_key_config>;


Can it be seastar::lw_shared_ptr<scheduling_group_key_config>?

xemul · 2025-01-16T09:56:32Z

include/seastar/core/scheduling_specific.hh

@@ -37,17 +37,66 @@ namespace seastar {
 namespace internal {

 struct scheduling_group_specific_thread_local_data {
+    using val_ptr = std::unique_ptr<void, void (*)(void*) noexcept>;


What's the point in making it smart-pointer if you track construction/destruction by hand anyway?

The point is to manage the dynamic memory allocation and free it automatically

xemul · 2025-01-16T10:01:56Z

include/seastar/core/reactor.hh

+    inline auto& get_sg_data(unsigned sg_id) {
+        return _scheduling_group_specific_data.per_scheduling_group_data[sg_id];
+    }
+


There's no need in those helpers, AFAICS, the existing get_scheduling_group_specific_thread_local_data() (and Co) already provide access to the array of per_scheduling_group_data-s

I added it because in several places we access the sg data and we do

auto& sg_data = _scheduling_group_specific_data; auto& this_sg = sg_data.per_scheduling_group_data[sg._id];

which I thought is a little cumbersome and I didn't find another method for this

include/seastar/core/reactor.hh

include/seastar/core/scheduling_specific.hh

mlitvk · 2025-01-16T11:52:44Z

changed to lw_shared_ptr instead of std::shared_ptr
split the commit and added a preliminary commit to change allocate_scheduling_group_specific_data into an internal function
rebase

xemul · 2025-01-21T09:08:39Z

include/seastar/core/scheduling_specific.hh

+        specific_val(const specific_val& other) = delete;
+        specific_val& operator=(const specific_val& other) = delete;
+
+        specific_val(specific_val&& other) : valp(std::move(other.valp)), cfg(other.cfg) {}


Why not cfg(std::move(other.cfg))? In general, the lifetime of specific_val::cfg is unclear.

Here other.cfg is kept non-nullptr, what for? Presumably (but maybe I'm wrong) this is to make ~specific_val() avoid checking if cfg exists or not? But the default constructor specific_val() sets cfg to nullptr, so it looks like cfg can be null, so why not std::move() it here as well?

done, I changed to std::move
there was no special reason to copy instead of move, I thought it doesn't matter much, but I suppose it's better to move

xemul · 2025-01-21T09:12:24Z

include/seastar/core/scheduling_specific.hh

    };
    std::array<per_scheduling_group, max_scheduling_groups()> per_scheduling_group_data;
-    std::map<unsigned long, scheduling_group_key_config> scheduling_group_key_configs;
+    std::map<unsigned long, cfg_ptr> scheduling_group_key_configs;


It looks like this map is no longer needed? The very came config can be obtained via per_scheduling_group_data[context.sg_id].specific_vals[key_id].cfg? I'm not proposing to change anything right now, just checking if my understanding is correct

It's still used in init_scheduling_group when we init a new scheduling group and go over all the keys to allocate them for the new sg

xemul · 2025-01-21T09:13:10Z

include/seastar/core/scheduling_specific.hh

+        val_ptr valp;
+        cfg_ptr cfg;
+
+        specific_val() : valp(nullptr, &free), cfg(nullptr) {}


Isn't free defaulted by compiler for unique-ptr?

the default uses delete and it doesn't work because can't delete pointer to incomplete type

xemul · 2025-01-21T09:13:28Z

src/core/reactor.cc

+    using val_ptr = internal::scheduling_group_specific_thread_local_data::val_ptr;
+    using specific_val = internal::scheduling_group_specific_thread_local_data::specific_val;
+
+    val_ptr valp(aligned_alloc(cfg->alignment, cfg->allocation_size), &free);


Isn't free defaulted by compiler for unique-ptr?

answered above

xemul · 2025-01-21T09:14:29Z

src/core/reactor.cc

+
+    val_ptr valp(aligned_alloc(cfg->alignment, cfg->allocation_size), &free);
+    if (!valp) {
+        throw std::runtime_error("memory allocation failed");


It was std::abort() before this patch. Why did you change that?

because the intention of the patch is to improve error handling and allow the node to continue to function in face of unexpected errors
so if we can handle exceptions in this path I don't think there's reason anymore to abort in this specific case

xemul · 2025-01-21T09:17:04Z

src/core/reactor.cc

                    });
                }
            }
            return make_ready_future();
+        }).then([this, key_id, cfgp] () {
+            _scheduling_group_specific_data.scheduling_group_key_configs[key_id] = std::move(cfgp);


You seem to mention this change in patch comment:

We also reorder the initialization order to make it safer. For
example, when creating a scheduling group, first allocate all data and
then swap it into the scheduling group's data structure.

but don't explain the motivation. Why can't scheduling_group_key_config[key_id] be assigned where it was before this change?

it is to avoid the case of a partially initialized scheduling group.
if we allocate some of the values, transfer the ownership to the sg, and then fail, then we remain with a partially initialized scheduling which could cause problems, like if someone assumes it's all be initialized, or if these allocations "leak" because the operation is considered to have failed but we still store the data.
so instead we either allocate all keys or none

Move the function allocate_scheduling_group_specific_data from reactor class to an internal static function. Change it to handle only the allocation and construction of the data object, while the caller handles the assignment of it.

Improve handling of exceptions during scheduling group and scheduling group key creation, where a user-provided constructor for the keys may fail, for example. We use a new struct `specific_val` and smart pointers to manage memory allocation, construction and destruction of scheduling group data in a safe manner. We also reorder the initialization order to make it safer. For example, when creating a scheduling group, first allocate all data and then swap it into the scheduling group's data structure. Fixes scylladb#2222

mlitvk · 2025-01-21T11:41:21Z

in move constructor of specific_val, changed it to move the cfg shared ptr instead of copy
rebase

mlitvk marked this pull request as ready for review January 15, 2025 13:54

mlitvk requested a review from piodul January 15, 2025 13:54

mlitvk force-pushed the sg_exception_safety branch 4 times, most recently from cb8d016 to 88c4e8a Compare January 15, 2025 18:21

xemul reviewed Jan 16, 2025

View reviewed changes

include/seastar/core/reactor.hh Show resolved Hide resolved

xemul reviewed Jan 16, 2025

View reviewed changes

include/seastar/core/scheduling_specific.hh Show resolved Hide resolved

mlitvk force-pushed the sg_exception_safety branch from 88c4e8a to cd957c2 Compare January 16, 2025 11:48

xemul reviewed Jan 21, 2025

View reviewed changes

mlitvk added 2 commits January 21, 2025 13:39

mlitvk force-pushed the sg_exception_safety branch from cd957c2 to a9615eb Compare January 21, 2025 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduling_group: improve scheduling group creation exception safety #2617

scheduling_group: improve scheduling group creation exception safety #2617

mlitvk commented Jan 15, 2025

mlitvk commented Jan 16, 2025

xemul Jan 16, 2025

mlitvk Jan 16, 2025

xemul Jan 16, 2025

mlitvk Jan 16, 2025

xemul Jan 16, 2025

mlitvk Jan 16, 2025

mlitvk commented Jan 16, 2025

xemul Jan 21, 2025

mlitvk Jan 21, 2025

xemul Jan 21, 2025

mlitvk Jan 21, 2025

xemul Jan 21, 2025

mlitvk Jan 21, 2025

xemul Jan 21, 2025

mlitvk Jan 21, 2025

xemul Jan 21, 2025

mlitvk Jan 21, 2025

xemul Jan 21, 2025

mlitvk Jan 21, 2025

mlitvk commented Jan 21, 2025

scheduling_group: improve scheduling group creation exception safety #2617

Are you sure you want to change the base?

scheduling_group: improve scheduling group creation exception safety #2617

Conversation

mlitvk commented Jan 15, 2025

mlitvk commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlitvk commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlitvk commented Jan 21, 2025