-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scylla fails to startup on Fedora aarch64 AMI #22382
Comments
On Fedora 41 AMI on some aarch64 instance such as m7gd.16xlarge, Seastar program such as Scylla fails to startup with following error message: ``` $ /opt/scylladb/bin/scylla --log-to-stdout 1 WARNING: debug mode. Not for benchmarking or production hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery. scylla: seastar/src/core/resource.cc:683: resources seastar::resource::allocate(configuration &): Assertion `!remain' failed. ``` It seems like hwloc is failed to initialize because of /sys/devices/system/cpu/cpu0/topology/ not available on the instance. I debugged src/core/resource.cc to find out why assert occured, and found that alloc_from_node() is failing because node->total_memory is 0. It is likely because of failure of hwloc initialize described above. To avoid the error on such environment, we should stop using hwloc on resource.cc. hwloc initalization function does not return error code even error message is printed, we need to check "topology" directory is available on /sys. Since resource.cc has code to build Seastar without libhwloc, we need to call them if "topology" directory is not available. Fixes scylladb/scylladb#22382 Related scylladb/scylla-pkg#4797
Sounds like an issue that should be reported to Amazon? |
Forgot to describe on previous post, the problem does not occur on Amazon Linux 2023 and Ubuntu 24.04 AMIs. |
As I just described above it is working on official AMIs (Amazon Linux, Ubuntu), so probably we need to report it to Fedora not Amazon |
That's fine too. As long as we do report it. |
One more note: seems like not all instance size of m7gd are affected on this problem. |
The issue is originally reported on the thread in scylladb/scylla-pkg#4797
On Fedora 41 AMI on some aarch64 instance such as m7gd.16xlarge, Scylla fails to startup with following error message:
It seems like hwloc is fails to initialize, and returns incorrect HW information to Seastar.
The error "hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery." also occur on hwloc commands such as hwloc-ls.
The error message comming from check_sysfs_cpu_path(), and it is occur when /sys/devices/sys/cpu/cpuX/topology/ is not available.
It also can verify on shell:
This is probably kernel driver problem of the CPU.
I debugged seastar/src/core/resource.cc to find out why assert occured, and found that alloc_from_node() is failing because node->total_memory is 0.
It is likely because of failure of hwloc initialize described above.
This can be able to reproduce on hwloc-ls command.
On normal x86_64 machine, hwloc-ls output memory size and CPU cache size, but on m7gd.16xlarge nothing is shows up:
To avoid Scylla startup failure on such environment, we should stop using hwloc on seastar/src/core/resource.cc code.
Since resource.cc has code to build Seastar without libhwloc, we can use this code to fix the problem.
The text was updated successfully, but these errors were encountered: