Skip to content

helios is unintentionally pinned #8176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
iximeow opened this issue May 16, 2025 · 5 comments · May be fixed by #8185
Open

helios is unintentionally pinned #8176

iximeow opened this issue May 16, 2025 · 5 comments · May be fixed by #8185

Comments

@iximeow
Copy link
Member

iximeow commented May 16, 2025

looking at a dozen TUFs built from main since #8092 landed, Helios has been at 361574b73e for all of them, even though commits have been landing on stlouis in the mean time. #8092 includes:

We do not pin Helios on the main branch of Omicron by design,

but i think we're pinning Helios everywhere by not-design. this has resulted in (i'm sure among other things) a very funny and broken TUF after merging #8160 - what worked in the branch, but depended on newer OS bits, now yields sled panics from the main build as it has older and unfixed OS bits. in #8160, commit 7089202's TUF has OS commit 25eefe2 from the last few days, but when i pulled main in 8fa48d4, the host OS commit switched back to 361574b where it also is at for builds off of main.

@iximeow iximeow changed the title helios is unintentionally PINNED helios is unintentionally pinned May 16, 2025
@jclulow
Copy link
Collaborator

jclulow commented May 16, 2025

I suspect this is some artefact of the incorporation generation stuff that landed recently. Looking at the latest TUF repository built on main (9 hours ago): https://github.com/oxidecomputer/omicron/runs/42323592703

The incorporation locks to osnet-incorporation version:

$ curl -s https://buildomat.eng.oxide.computer/wg/0/artefact/01JVB1J6G0YX14QJER9TMMRV78/WJQPIeyMvlPfUuwNWhbdId5bSgRZApiW95RyVNMuW3tfnKqL/01JVB1JT4257HSFC347R8FQSX3/01JVB5JGK18RR83S43KXWGXAX7/incorporation.p5m | grep osnet-incorp
depend type=incorporate fmri=pkg:/consolidation/osnet/[email protected]:20250425T223722Z

That lines up with the commit you mention:

$ BRANCH=stlouis findver 23359
commit 361574b73ec81f3b08cbdf59d8fd191bc652dd5c
Author:     Andy Fiddaman <[email protected]>
AuthorDate: Fri Apr 25 22:06:38 2025 +0000
Commit:     Andy Fiddaman <[email protected]>
CommitDate: Fri Apr 25 22:06:38 2025 +0000

    Update .gitignore

The actual latest version of the incorporation in the repository, however, is:

$ pkgrepo list -s https://pkg.oxide.computer/helios/2/dev osnet-incorporation@latest
PUBLISHER  NAME                                          O VERSION
helios-dev consolidation/osnet/osnet-incorporation         0.5.11-2.0.23386:20250513T214436Z

I'm not sure how the new tooling is arriving at an incorporation that contains that older version, but that's what I think we should investigate first.

@jclulow
Copy link
Collaborator

jclulow commented May 16, 2025

Poking briefly at this, it seems like this is where we generate the incorporation:

let stdout = Command::new("pkg")
.args(["list", "-g", HELIOS_REPO, "-F", "json"])
.args(["-o", "fmri", "*@latest"])
.ensure_stdout(&logger)
.await?;

Running this on a machine I have that is pinned to an older Helios, I get:

$ pkg list -g https://pkg.oxide.computer/helios/2/dev -F json -o fmri '*@latest' > /tmp/sigh.json
$ jq < /tmp/sigh.json | grep osnet-incorp
    "fmri": "pkg://helios-dev/consolidation/osnet/[email protected]:20240820T102318Z"

Notably, that's the version I happen to have installed on that machine:

$ pkg info osnet-incorporation | grep FMRI
             FMRI: pkg://helios-dev/consolidation/osnet/[email protected]:20240820T102318Z

I would also note that this command is not what I was using to generate the release incorporations earlier. In that case, I was using pkgrepo, which only interacts directly with the repository you name by URL, and does not get confused by locally installed packages or cached catalogues, as it appears that pkg may be doing. The original command was (as appears in /staff/rel/v12/mkincorp.sh):

pkgrepo -s https://pkg.oxide.computer/helios/2/dev list -F tsv '*@latest' |
awk -v "ver=$ver" -v "pub=$pub" 'NR > 1 {
        fmri = $NF;
        sub("pkg://" pub "/", "", fmri);
        sub(":.*", "", fmri);

        if (fmri ~ "opte") {
                next;
        }

        if (fmri ~ "consolidation/oxide/omicron-release-incorporation@") {
                next;
        }

        printf("depend fmri=pkg:/%s type=incorporate\n", fmri);
}' | pkgfmt -f v2

This would, if run now, produce the correct version:

$ pkgrepo -s https://pkg.oxide.computer/helios/2/dev list -F tsv '*@latest' | grep osnet-incorp
helios-dev      consolidation/osnet/osnet-incorporation         0.5.11  5.11    2.0.23386       20250513T214436Z        pkg://helios-dev/consolidation/osnet/[email protected],5.11-2.0.23386:20250513T214436Z

That seems to be the correct latest version that we want:

$ BRANCH=stlouis findver 23386
commit 25eefe2a09aa06255f5f00df0f9688dce5f6a731 (HEAD -> stlouis, origin/stlouis, origin/HEAD)
Author:     Patrick Mooney <[email protected]>
AuthorDate: Fri May 2 13:49:01 2025 +0000
Commit:     Patrick Mooney <[email protected]>
CommitDate: Tue May 13 21:12:58 2025 +0000

    stlouis#733 clean up duplicate viona

    Change-Id: I6a6a636c7ce3cd5c3e6f63f04c305ff8a8eac204

I suspect we should probably change the releng tool to just do what I had originally been doing in mkincorp.sh. It's true that the output of the tool is TSV, not JSON, but that doesn't seem like a huge impediment.

@jclulow
Copy link
Collaborator

jclulow commented May 16, 2025

NB: If it helps, findver (used above) is:

#!/bin/bash

BRANCH=${BRANCH:-master}

want=$1
if [[ -z $want ]]; then
	printf 'ERROR: which commit number?\n' >&2
	exit 1
fi

#
# Find the current commit depth in this clone:
#
max=$(git rev-list --count "$BRANCH")

if (( want > max )); then
	printf 'ERROR: %d is higher than found maximum version %d\n' \
	    "$want" "$max" >&2
	exit 1
fi

(( diameter = max - want ))

found=$(git rev-list --count "$BRANCH~$diameter")
if (( found == want )); then
	GIT_PAGER= git log --pretty=fuller -n 1 "$BRANCH~$diameter"
	exit 0
fi

#
# If the simple thing doesn't work, it may be because of the craziness of git
# merge commits.  Just walk backwards one at a time until we find the one we
# want:
#
# XXX could binary search obviously!
#

for (( diameter = 1; diameter < max; diameter++ )); do
	found=$(git rev-list --count "$BRANCH~$diameter")
	if (( found == want )); then
		GIT_PAGER= git log --pretty=fuller -n 1 "$BRANCH~$diameter"
		exit 0
	fi
done

printf 'ERROR: could not find version %s\n' "$want" >&2
exit 1

You would run it in a clone of oxidecomputer/illumos-gate that includes an up-to-date stlouis branch.

@citrus-it
Copy link
Contributor

citrus-it commented May 16, 2025

I suggested using pkg as it has the parsable output formats etc. but in place of *@latest I think it should use -n *.

atrium:helios:cosmo% pkg list -g https://pkg.oxide.computer/helios/2/dev -F json -o fmri '*@latest' | jq | grep osnet-incorp
    "fmri": "pkg://helios-dev/consolidation/osnet/[email protected]:20240820T102318Z"

atrium:helios:cosmo% pkg list -g https://pkg.oxide.computer/helios/2/dev -F json -o fmri -n '*' | jq | grep osnet-incorp
    "fmri": "pkg://helios-dev/consolidation/osnet/[email protected]:20250513T214436Z"
           -n      Display the newest versions of all known packages,
                   regardless of installed state.

@iximeow
Copy link
Member Author

iximeow commented May 17, 2025

i wanted to understand what pkg list -n does vs the implicit pkg list -a and ended up at pkg5's __get_pkg_list, where -n gets here with self.LIST_NEWEST, and -a gets here with self.LIST_INSTALLED_NEWEST. i don't understand everything that's going on in here, but it seems remarkable that with inst_newest there's a filtering callback where newest instead sets use_last. the behavior these two control is described in entry_actions. so i think what happens here is the filtering callback would return True for the first installed version of a package (probably not the latest version) and that's the listed version, whereas with newest we simply get the last version which is the latest version.

this is to say, in a very immediate sense, it seems more like a docs bug for pkg that it's not super clear what happens for installed packages with newer version in pkg list -a, than a thing to go change in pkg. though you two would know better and may think otherwise!

before putting up #8185 i wanted to make sure that we'd still correctly pin packages if there was an incorporation.p5p available but after reading through dev-tools/releng/ i both do not see a way it wouldn't work, and am not wholly sure how i'd include an incorporation.p5p to limit package versions? and currently i'd have to hack in old versions - i don't think there's a really straightforward way to intentionally get older packages (locally, everything's just the versions i've built from source..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants