-
-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Description
I am experiencing sudden network disconnects on a cluster node running on a Lenovo M720s. The network interface hangs completely and requires a physical cable replug or a reboot to recover. This happens specifically under heavy load (likely triggered by storage replication traffic/DRBD).
The kernel logs show the notorious e1000e hardware unit hang:
kernel: e1000e 0000:00:1f.6 _p2cf05d501bfb: Detected Hardware Unit Hang:
TDH <6d>
TDT
next_to_use
next_to_clean <6c>
buffer_info[next_to_clean]:
time_stamp <1003c5316>
next_to_watch <6d>
jiffies <1003fde80>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
Steps Taken / Troubleshooting
BIOS Configuration: Disabled C-States and enabled Performance Mode in BIOS. The issue persisted.
IncusOS Network Config: I attempted to disable offloading via the system network edit API (setting disable_ipv4_tso, disable_gso, disable_gro to true). This did not resolve the hang.
Metadata
Incus : 6.20
IncusOS : 202601220238
Hardware: Lenovo ThinkCentre M720s
NIC: Intel Ethernet Connection I219-V
Driver: e1000e