You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a supermicro system with two power supplies, and there are two distinct issues, one of which can possibly be blamed on Supermicro. To test this plugin, we had our datacenter staff both physically pull the PSU out of one of our servers, as well as leaving it inserted, but remove the AC plug.
When a power supply is removed (pulled out of the system), it is shown as a not-available sensor, and the supermicro web UI doesn't treat this as an error. There's no option in this plugin, however, to say "two power supplies must read as ok".
This is absolutely not "nominal". This machine is in a data center beeping, the web UI reports a critical error and I believe the red LED on the front is illuminated on the front. This could be helped by allowing me to specify that "ok" is not a valid state for a given class of device. (In this case, "presence detected" is the good state, as ok can mean "not there but that's fine")
I suspect that it is reported as "nominal" because there are no thresholds reported by the BMC for this value.
The power-supply problem does show up in the SEL:
ID | Date | Time | Name | Type | State | Event
1 | Nov-25-2024 | 21:47:47 | System Chassis Chassis Intru | Physical Security | Critical | General Chassis Intrusion
2 | Dec-09-2024 | 20:45:38 | System Chassis Chassis Intru | Physical Security | Critical | General Chassis Intrusion
3 | Dec-09-2024 | 21:12:29 | Power Supply 2 PS2 Status | Power Supply | Critical | Power Supply Failure detected
However, the plugin only calls the SEL with --sensor-types=Memory,Processor (hardcoded, not override-able on the command line).
The easiest fix to this second failure mode would be to not hardcode the sensor-types, but that would not solve the first case, where a power supply was simply removed. (That doesn't even show in the SEL).
The text was updated successfully, but these errors were encountered:
We have a supermicro system with two power supplies, and there are two distinct issues, one of which can possibly be blamed on Supermicro. To test this plugin, we had our datacenter staff both physically pull the PSU out of one of our servers, as well as leaving it inserted, but remove the AC plug.
Here's what I see in the output for that:
However, when the power supply is present but has no input voltage, it shows like this:
This is absolutely not "nominal". This machine is in a data center beeping, the web UI reports a critical error and I believe the red LED on the front is illuminated on the front. This could be helped by allowing me to specify that "ok" is not a valid state for a given class of device. (In this case, "presence detected" is the good state, as
ok
can mean "not there but that's fine")I suspect that it is reported as "nominal" because there are no thresholds reported by the BMC for this value.
The power-supply problem does show up in the SEL:
ID | Date | Time | Name | Type | State | Event
1 | Nov-25-2024 | 21:47:47 | System Chassis Chassis Intru | Physical Security | Critical | General Chassis Intrusion
2 | Dec-09-2024 | 20:45:38 | System Chassis Chassis Intru | Physical Security | Critical | General Chassis Intrusion
3 | Dec-09-2024 | 21:12:29 | Power Supply 2 PS2 Status | Power Supply | Critical | Power Supply Failure detected
However, the plugin only calls the SEL with
--sensor-types=Memory,Processor
(hardcoded, not override-able on the command line).The easiest fix to this second failure mode would be to not hardcode the sensor-types, but that would not solve the first case, where a power supply was simply removed. (That doesn't even show in the SEL).
The text was updated successfully, but these errors were encountered: