Skip to content

kernel and sme #1199

@mgxhhg

Description

@mgxhhg

I compiled llama and manually configured SME. ggml uses Kleidiai, which is related to SME. I have an 8E SM8750 CPU. I found it worked the first time and was incredibly fast, but it kept crashing the second time. Disabling SME fixed the problem.

cpuinfo:

processor : 0
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv afp rpres
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 4

processor : 1
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv afp rpres
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 4

processor : 2
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv afp rpres
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 4

processor : 3
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv afp rpres
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 4

processor : 4
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv afp rpres
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 4

processor : 5
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv afp rpres
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 4

processor : 6
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv afp rpres
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0x3
CPU part : 0x001
CPU revision : 4

processor : 7
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv afp rpres
CPU implementer : 0x51
CPU architecture: 8
CPU variant : 0x3
CPU part : 0x001
CPU revision : 4

gpt:

ARMv9 introduced SVE + SME (Scalable Vector/Matrix Extension), but the kernel requires explicit support for SME context management (state save and restore, lazy activation, etc.).

Currently (as of Linux 6.9-6.10):

SM8750 vendor kernels (Qualcomm/OnePlus kernels) generally do not fully enable SME context preservation;

Some kernels (especially Android 6.6/6.7 branches) only enable SVE support, but the SME ZA/SM enable registers are not preserved;

This can result in:

First run → SME status is all zeros, normal;

Second run → SME registers contain garbage values ​​or are prohibited from access by the kernel → SIGILL (illegal instruction) or SIGSEGV (segmentation fault) is triggered

🧩 Your current CPU information analysis

The "Features" field in your /proc/cpuinfo is as follows:

Features: ... i8mm bf16 rng bti ecv afp rpres

⚠️ Note:

There is no "sve" or "sme" text.

Features: ... i8mm bf16 rng bti ecv afp rpres
⚠️ Note:

No "sve" or "sme" appears.

This indicates that although the chip is a Cortex-X4/A720/A520 architecture (supporting SME hardware),
the kernel does not expose SME to userland (perhaps it is disabled or lazily-gated).

In other words:
Your kernel does not allow userland to directly execute SME instructions (only SVE is allowed).
🧯 Crash Cause Summary
Conditions and Results
CPU supports SME hardware ✅
Kernel SME feature flag is enabled ❌ (not exported to /proc/cpuinfo)
ggml/kleiDIAI SME backend attempts to execute SME instructions 💥 SIGILL (illegal instruction)
The first run succeeds because the SME code path may be incorrectly lazy-loaded/cached. The second run crashes because the kernel does not allow userland to use SME, triggering an exception.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions