-
Notifications
You must be signed in to change notification settings - Fork 208
Description
This is summarizing work done by Alan, John, and Laura (among others).
After a Nexus-driven update on Berlin, one of the SPs is unreachable by the control plane (i.e. talking to the SP's control-plane-agent
task).
Other tasks are still reachable over the network, so Alan got a dump (at /staff/alan/berlin-sp-undiscovered/hubris.core.0
).
Here's the task list:
john@castle ~ $ humility -d /staff/alan/berlin-sp-undiscovered/hubris.core.0 tasks
humility: attached to dump
system time = 10797477
ID TASK GEN PRI STATE
0 jefe 0 0 recv, notif: fault timer(T+23)
1 net 0 5 recv, notif: eth-irq(irq61) wake-timer(T+300)
2 sys 0 1 recv, notif: exti-wildcard-irq(irq6/irq7/irq8/irq9/irq10/irq23/irq40)
3 spi2_driver 0 3 recv
4 i2c_driver 0 3 notif: i2c2-irq(irq33/irq34)
5 spd 0 2 notif: i2c1-irq(irq31/irq32)
6 packrat 0 1 recv
7 thermal 0 5 wait: reply from i2c_driver/gen0
8 power 0 6 wait: send to i2c_driver/gen0
9 hiffy 0 5 notif: bit31(T+190)
10 gimlet_seq 0 4 recv, notif: timer(T+28) vcore
11 gimlet_inspector 0 6 notif: socket
12 hash_driver 0 2 recv
13 rng_driver 0 6 recv
14 hf 0 3 recv, notif: timer
15 update_server 0 3 recv
16 sensor 0 4 recv
17 host_sp_comms 0 8 recv, notif: jefe-state-change usart-irq(irq82) multitimer control-plane-agent
18 udpecho 0 6 notif: socket
19 udpbroadcast 0 6 notif: bit31(T+350)
20 control_plane_agent 0 7 wait: reply from validate/gen0
21 sprot 0 4 notif: rot-irq timer(T+993)
22 validate 0 5 wait: send to i2c_driver/gen0
23 vpd 0 4 recv
24 user_leds 0 2 recv, notif: timer
25 dump_agent 0 6 wait: reply from sprot/gen0
26 snitch 0 6 notif: socket
27 sbrmi 0 4 recv
28 idle 0 9 RUNNING
Note that control-plane-agent
is waiting on validate
, which is waiting on i2c_driver
. Indeed, many tasks (validate
, thermal
, power
) are waiting on i2c_driver
, which is in turn waiting for a hardware IRQ.
This is the 1.0.47 release, git commit 87124929
:
871249297 (tag: all-sp-v1.0.47) host-sp-messages: don't copy that floppy^H^H^H^H^H^HInventoryData (#2253)
674e868ef Update lockfile from #2250 (#2254)
e2ecf169e ADM1273 support (#2250)
9c40dcce1 Update sparse registry hash for new toolchain (#2248)
e744ccf5d stm32xx-gpio-common: fix another glitch
e2ad457b0 stm32xx-i2c: flush txdr on NACK
43e13b955 jefe: fix comparison inversion in timeout handling
7b66c2227 stm32xx-sys: fix glitch on initial pin configuration
2a5c2b9d4 Disable SWD pins when debugger is connected (#2228)
caf0cd888 Use `NotificationBits` in Idol (#2233)
ef8e9acbb Add `NotificationBits` type; use it in `sys_recv_notification` (#2232)
e19947055 stm32xx-i2c: remove soft timeout, LostInterrupt
c07b417b1 bump `toml` and `toml_edit` (#2234)
0d9a6615b Bump idol to incorporate lease count check fix.
6affe395a Fix code that assumes timer notification => timer fired (#2230)
74a9279a0 psc_seq: rectifier ereports (#2214)
7ccf45982 cpu_seq: standardize ereport naming (#2231)
Notably, this release has a handful of I2C changes!
Here are the I2C ringbufs:
humility: ring buffer drv_stm32xx_i2c::__RINGBUF in i2c_driver:
TOTAL VARIANT
33588 Read
21366 Write
11308 WriteWait
8606 Wait
6036 ReadWait
4 Reset
NDX LINE GEN COUNT PAYLOAD
33 578 1033 3 WriteWait(ISR, 0x8021)
34 578 1033 1 WriteWait(ISR, 0x8061)
35 645 1033 2 Read(ISR, 0x8021)
36 645 1033 1 Read(ISR, 0x8025)
37 691 1033 3 ReadWait(ISR, 0x8021)
38 691 1033 1 ReadWait(ISR, 0x8061)
39 461 1033 1 Wait(ISR, 0x21)
40 546 1033 1 Write(ISR, 0x21)
41 546 1033 2 Write(ISR, 0x8021)
42 546 1033 1 Write(ISR, 0x8023)
43 578 1033 1 WriteWait(ISR, 0x8020)
44 578 1033 2 WriteWait(ISR, 0x8021)
45 578 1033 1 WriteWait(ISR, 0x8061)
46 461 1033 9 Wait(ISR, 0x8021)
47 461 1033 1 Wait(ISR, 0x21)
0 546 1034 3 Write(ISR, 0x21)
1 546 1034 1 Write(ISR, 0x8023)
2 578 1034 1 WriteWait(ISR, 0x8020)
3 578 1034 2 WriteWait(ISR, 0x8021)
4 578 1034 1 WriteWait(ISR, 0x8061)
5 645 1034 2 Read(ISR, 0x8021)
6 645 1034 1 Read(ISR, 0x8025)
7 645 1034 2 Read(ISR, 0x8021)
8 645 1034 1 Read(ISR, 0x8025)
9 645 1034 2 Read(ISR, 0x8021)
10 645 1034 1 Read(ISR, 0x8025)
11 645 1034 2 Read(ISR, 0x8021)
12 645 1034 1 Read(ISR, 0x8025)
13 645 1034 2 Read(ISR, 0x8021)
14 645 1034 1 Read(ISR, 0x8025)
15 645 1034 2 Read(ISR, 0x8021)
16 645 1034 1 Read(ISR, 0x8025)
17 645 1034 2 Read(ISR, 0x8021)
18 645 1034 1 Read(ISR, 0x8025)
19 645 1034 2 Read(ISR, 0x8021)
20 645 1034 1 Read(ISR, 0x8025)
21 691 1034 2 ReadWait(ISR, 0x8021)
22 691 1034 1 ReadWait(ISR, 0x8061)
23 461 1034 1 Wait(ISR, 0x21)
24 546 1034 1 Write(ISR, 0x21)
25 546 1034 2 Write(ISR, 0x8021)
26 546 1034 1 Write(ISR, 0x8023)
27 578 1034 1 WriteWait(ISR, 0x8020)
28 578 1034 2 WriteWait(ISR, 0x8021)
29 578 1034 1 WriteWait(ISR, 0x8061)
30 461 1034 10 Wait(ISR, 0x8021)
31 461 1034 1 Wait(ISR, 0x21)
32 546 1034 3 Write(ISR, 0x8021)
humility: ring buffer drv_stm32xx_i2c_server::__RINGBUF in i2c_driver:
NDX LINE GEN COUNT PAYLOAD
0 670 1 4 Wiggles(0x0)
1 484 1 1 Error(0x6a, BusLocked)
2 487 1 1 SegmentOnError((M1, S1))
3 670 1 1 Wiggles(0x0)
4 251 1 1 Reset((I2C2, PortIndex(0x0)))
5 261 1 1 ResetMux(0x73)
6 189 1 1 MuxUnknownRecover((I2C2, PortIndex(0x0)))
7 484 1 1 Error(0x6a, BusLocked)
8 487 1 1 SegmentOnError((M1, S2))
9 670 1 1 Wiggles(0x0)
10 251 1 1 Reset((I2C2, PortIndex(0x0)))
11 261 1 1 ResetMux(0x73)
12 189 1 1 MuxUnknownRecover((I2C2, PortIndex(0x0)))
Nothing is changing here; connecting remotely now (hours later) shows the same ringbuf values.
I'm not yet sure why the I2C task is hanging, more to come...