-
Notifications
You must be signed in to change notification settings - Fork 208
Description
#2214 adds code to the psc_seq
task for generating ereports when a rectifier's presence or POWER_GOOD states change. Ereports represent edge-triggered notifications of an event. As discussed in RFD 589, we must also provide health endpoints that expose the same component health information in a manner that the control plane can poll. This is necessary in the event that an ereport is lost, either due to a SP reset or a full ereport buffer in the SP. If either of those conditions are detected by the control plane, it will poll the health of the SP's components to determine whether there may be a fault (or state change, more broadly) for which an ereport may have been lost.
Currently, we do expose presence information for rectifiers over the control-plane-agent
inventory interface. Presence is detected by the validate
task by reading the rectifier's model number over I2C, and returning an error if that doesn't work:
hubris/drv/i2c-devices/src/mwocp68.rs
Lines 764 to 770 in b50c3c7
impl Validate<Error> for Mwocp68 { | |
fn validate(device: &I2cDevice) -> Result<bool, Error> { | |
let expected = b"MWOCP68-3600-D-RM"; | |
pmbus_validate(device, CommandCode::MFR_MODEL, expected) | |
.map_err(Into::into) | |
} | |
} |
This is different from the PSC sequencer's understanding of presence, which is detected by reading the rectifier presence GPIO pins. In most cases, though, this should be more or less in line.
What isn't currently known to control-plane-agent
is whether the rectifier is healthy or has faulted, and whether the psc_seq
task has enabled or disabled the rectifier. The sequencer determines if a rectifier is faulted based on whether or not POWER_GOOD
is asserted, and enables and disables rectifiers by toggling GPIOs. The validate
task could potentially read PMBus status registers to determine if the rectifier is faulted (and we may want to do that anyway to expose PMBus status information to the control plane). However, this is a different mechanism from the psc_seq
task's understanding of faults, and it's the sequencer that actually controls whether a rectifier is turned on or not. Therefore, I think it might be better if we treat the sequencer as an authoritative source for this and just ask it whether it believes a rectifier is okay or not.
However, this might be a bit tricky to wire into the existing control-plane-agent inventory interface, which basically expects all devices are accessed through validate
. We could probably special-case the PSUs by putting them in control-plane-agent
's "devices with static validation" list, but presently, that would mean always treating them as present, which is actually worse. What we really need is a way to tell control plane agent that "this thing needs special validation behavior".