-
Notifications
You must be signed in to change notification settings - Fork 59
Description
in CI for #9204: https://github.com/oxidecomputer/omicron/pull/9204/checks?check_run_id=52638510286
the immediate issue is here, but is a bit downstream of what went wrong:
https://buildomat.eng.oxide.computer/wg/0/details/01K7FEEGS766GRVD9CVQN335B2/Y6RoWuCOWY9T52pA6RzMDnb6PwDV7bNwme0VY4fo1QMJQg0E/01K7FEF1PJQFT98MRGX5C1JHNG#S1369
Excerpt from the log showing the failure:
1367 2025-10-13T19:55:00.509Z 2025-10-13 19:55:00.497596603 UTC: attempting to log into API
1368 2025-10-13T19:55:15.541Z 2025-10-13 19:55:15.529168673 UTC: login failed: logging in: error sending request for url (https://recovery.sys.oxide.test/v1/login/recovery/local)
1369 2025-10-13T19:55:16.546Z Error: logging in
1370 2025-10-13T19:55:16.546Z
1371 2025-10-13T19:55:16.546Z Caused by:
1372 2025-10-13T19:55:16.822Z timed out after 609.319694462s
So on the surface we failed to log into the API after ten minutes. This looks a lot like #6772, and like other folks there, the deploy script is just reporting that Nexus isn't reachable after ten minutes.
Simultaneously, one of the external DNS zones failed to get a route set up: https://buildomat.eng.oxide.computer/wg/0/artefact/01K7FEEGS766GRVD9CVQN335B2/Y6RoWuCOWY9T52pA6RzMDnb6PwDV7bNwme0VY4fo1QMJQg0E/01K7FEF1PJQFT98MRGX5C1JHNG/01K7FJD9ASSQVHKDNH9QG5QZGP/oxide-opte-interface-setup:default.log?format=x-bunyan
zone-setup: failed to ensure OPTE gateway route on interface opte2 with gateway 172.30.1.1 and IP 172.30.1.5
Caused by:
Command [/usr/sbin/route add -host 172.30.1.1 172.30.1.5 -interface -ifp opte2] executed and failed with status: exit status: 128 stdout: add host 172.30.1.1: gateway 172.30.1.5: Network is unreachable
since the other external DNS zone seems fine (19386011-2bd0-4d7f-bcf9-f34d6cc33633
, with logs 19386011-2bd0-4d7f-bcf9-f34d6cc33633/root/var/svc/log/oxide-external_dns:default.log) I'm not really sure how this ends up with the CLI not being able to talk to the partially-up control plane. But I can't see why adding that route would fail. it's all local??
A re-run did succeed: https://github.com/oxidecomputer/omicron/runs/52650447795