-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix seastar::resource::allocate() error on EC2 m7gd.16xlarge instance #2624
base: master
Are you sure you want to change the base?
Conversation
On Fedora 41 AMI on some aarch64 instance such as m7gd.16xlarge, Seastar program such as Scylla fails to startup with following error message: ``` $ /opt/scylladb/bin/scylla --log-to-stdout 1 WARNING: debug mode. Not for benchmarking or production hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery. scylla: seastar/src/core/resource.cc:683: resources seastar::resource::allocate(configuration &): Assertion `!remain' failed. ``` It seems like hwloc is failed to initialize because of /sys/devices/system/cpu/cpu0/topology/ not available on the instance. I debugged src/core/resource.cc to find out why assert occured, and found that alloc_from_node() is failing because node->total_memory is 0. It is likely because of failure of hwloc initialize described above. To avoid the error on such environment, we should stop using hwloc on resource.cc. hwloc initalization function does not return error code even error message is printed, we need to check "topology" directory is available on /sys. Since resource.cc has code to build Seastar without libhwloc, we need to call them if "topology" directory is not available. Fixes scylladb/scylladb#22382 Related scylladb/scylla-pkg#4797
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was asked to review so I did. I'm not familar with this code but from a high level it looks good to me.
Someone more familiar with this should also review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm in general. in addition to the inlined comments.
- could you please use a more specific prefix in the title of the commit message? like: "resource: "
- and use a more specific title. like "fall back to single io group if hwloc fails to work".
BTW, could even split the commit into two. one for moving the non-hwloc code up. the other for using it when hwloc fails tell the CPU topology.
// cannot receive error from the API. | ||
// Therefore, we have to detect cpu topology availability in our code. | ||
static bool is_hwloc_available() { | ||
const std::string cpux_properties[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, std::string_view
would suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and better off referencing the related hwloc function, so that the posterity understand why we are using this logic to determine if hwloc is able to identify the CPU topology.
} | ||
|
||
// cpu0 might be offline, try to check first online cpu. | ||
auto online = read_first_line_as<std::string>("/sys/devices/system/cpu/online"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed we could potentially use read_first_line_as<unsigned>("/sys/devices/system/cpu/online")
and handle the exception. If this approach wouldn't significantly simplify the implementation, feel free to keep the current solution.
I think this is too extreme. Here's hwloc-ls output on m7gd.16xlarge:
So it aborted linux discovery, but was still able to keep going. Maybe hwloc-ls is telling hwloc to fall back to alternative methods if needed, and we are not. |
I think the problem is that hwloc doesn't report the memory as belonging to any NUMA nodes (on a normal machine the NUMA nodes have memory counts). // Divide local memory to cpus
for (auto&& cs : cpu_sets()) {
auto cpu_id = hwloc_bitmap_first(cs);
assert(cpu_id != -1);
auto node = cpu_to_node.at(cpu_id);
cpu this_cpu;
this_cpu.cpu_id = cpu_id;
size_t remain = mem_per_proc - alloc_from_node(this_cpu, node, topo_used_mem, mem_per_proc);
remains.emplace_back(std::move(this_cpu), remain);
}
// Divide the rest of the memory
auto depth = hwloc_get_type_or_above_depth(topology, HWLOC_OBJ_NUMANODE);
for (auto&& [this_cpu, remain] : remains) {
auto node = cpu_to_node.at(this_cpu.cpu_id);
auto obj = node;
while (remain) {
remain -= alloc_from_node(this_cpu, obj, topo_used_mem, remain);
do {
obj = hwloc_get_next_obj_by_depth(topology, depth, obj);
} while (!obj);
if (obj == node)
break;
}
assert(!remain);
ret.cpus.push_back(std::move(this_cpu));
} We need to add a third loop that allocates non-NUMA memory if the second loop fails. |
Bad NUMA detection:
Good NUMA detection:
So we just need to detect the case where the detected NUMA memory (here 0) is less that the memory we want to allocate, and treat that case too. |
On Fedora 41 AMI on some aarch64 instance such as m7gd.16xlarge, Seastar program such as Scylla fails to startup with following error message:
It seems like hwloc is failed to initialize because of /sys/devices/system/cpu/cpu0/topology/ not available on the instance.
I debugged src/core/resource.cc to find out why assert occured, and found that alloc_from_node() is failing because node->total_memory is 0. It is likely because of failure of hwloc initialize described above.
To avoid the error on such environment, we should stop using hwloc on resource.cc.
hwloc initalization function does not return error code even error message is printed, we need to check "topology" directory is available on /sys. Since resource.cc has code to build Seastar without libhwloc, we need to call them if "topology" directory is not available.
Fixes scylladb/scylladb#22382
Related scylladb/scylla-pkg#4797