Skip to content

Conversation

@bhearsum
Copy link
Contributor

@bhearsum bhearsum commented Apr 13, 2021

@bhearsum bhearsum force-pushed the generic-worker-pools branch from d9858ff to c1c9b7e Compare April 13, 2021 19:31
* A new Trust Domain (`mozilla`) that is not tied to a specific project or product
* New Workers for builds on Linux, macOS 11.0, and Windows Server 2012
* These will be created under a new `mozilla-1` provisioner
* New Workers for tests on Linux (through developer provided Docker images), macOS 11.0, and Windows 10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about these separated build & test pools a bit more, specifically in the context of the lack of separation of these things proposed in #36. Specifically, I'm wondering if we should do away with that distinction here and just share general purpose pools. Something like:

Pool ID                   | Purpose
====================================================================================
mozilla-1/linux           | Linux jobs
mozilla-1/linux-highcpu   | Linux jobs requiring more CPU resources
mozilla-1/win2012         | Windows Server 2012 jobs
mozilla-1/win2012-highcpu | Windows Server 2012 jobs requiring more CPU resources
mozilla-1/win10           | Windows 10 jobs
mozilla-1/macos-bigsur    | macOS Big Sur jobs

I can see why we keep them separate for Gecko (it lets us share test machines for level 1 & level 3 jobs), but I can't think of a reason why we need to do it here. @escapewindow - do you have any thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to push a change with the proposed pools; this comment may disappear when I do so, but we can re-open or talk in a new thread.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why we keep them separate for Gecko (it lets us share test machines for level 1 & level 3 jobs), but I can't think of a reason why we need to do it here. @escapewindow - do you have any thoughts on this?

Another reason for the separate pools is so we can trust that builders haven't been corrupted by a rogue test run. At least in the case of "release builds" (whatever that means in this context), it would be good to have a high level of trust in the builder.

I know there are other ways to achieve that goal than a separate pool, but if there's a spot we can capture that requirement, that would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can "rogue builds" corrupt in the same way that "rogue tests" can? Or is it about minimizing the things that can influencing builds?

Either way, separating these out is not terribly difficult, so I don't strong feelings here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! we do care about rogue builds in the "accept outside PRs" world. If there is a concept of a "release build", it should include purging the cache of non-release-build artifacts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, keeping them compute pools rather than build/test pools may make sense for now, as long as we're limited to level 1 (in level 3, we need to guard against tests poisoning release build caches and workers). If we need separation in the future, we can add b- and t- prefixes as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it sounds like we're all OK with shared pools for level-1, but (almost?) certainly not for level-3 -- which are not part of this proposal.

@bhearsum bhearsum force-pushed the generic-worker-pools branch from 5f33541 to 5c21de6 Compare April 30, 2021 14:56
@bhearsum
Copy link
Contributor Author

I think we're getting close to consensus here. I think the two things left are:

  1. Making sure the v3 index format is sensible, fleshed out enough, and solves the problem it's meant to. @hwine, @escapewindow - what do you think here?
  2. Making sure the new LDAP group is doable and accepable. I'm pretty sure it's fine on our side (please speak up if you disagree), and I'm going to have a chat with IT about their side.

If there's anything else I'm missing, please let me know.

@hwine
Copy link

hwine commented May 11, 2021

I think we're getting close to consensus here. I think the two things left are:

1. Making sure the `v3` index format is sensible, fleshed out enough, and solves the problem it's meant to. @hwine, @escapewindow - what do you think here?

@ajvb -- I'll defer to you on the v3 index - that's deeper TaskCluster stuff than I'm comfortable with.

@ajvb
Copy link

ajvb commented May 11, 2021

@bhearsum In regards to the v3 index format, are there specific security concerns around that format that you have?

My (little) familiarity with indexes in Taskcluster leads me to believe there isn't much of security concern unless we are taking about one project polluting another project, which isn't really related to the index format.

@ajvb
Copy link

ajvb commented May 11, 2021

@bhearsum Ok, Hal gave me some more context about this over slack. As to "indexes being used to create segmentation" from a security context, I don't think I know enough to answer whether that's good enough. I'd imagine it'd be scopes + the v3 index format, but I'd want to loop in someone from the TC team.

@bhearsum
Copy link
Contributor Author

My (little) familiarity with indexes in Taskcluster leads me to believe there isn't much of security concern unless we are taking about one project polluting another project, which isn't really related to the index format.

The v3 format is specifically being proposed to prevent this from happening. v2 only contains repo names, so if you have multiple repos with the same names (but different orgs) there's potential for pollution.

@hwine
Copy link

hwine commented May 11, 2021

v2 only contains repo names, so if you have multiple repos with the same names (but different orgs) there's potential for pollution.

Oh I may have misunderstood then. BTW, you should capture the v4id for repos -- both org & repo names can be changed, allowing for takeovers.

@escapewindow
Copy link
Contributor

I think we're getting close to consensus here. I think the two things left are:

1. Making sure the `v3` index format is sensible, fleshed out enough, and solves the problem it's meant to. @hwine, @escapewindow - what do you think here?

I think so. I think this will have multiple parts:

  • what does the v3 index look like? how will it fit our future use cases?
  • are we migrating gecko, or just standalone taskgraph? if we're migrating, what does that look like? is there a watershed for e.g. mozregression?
  • are we migrating all standalone taskgraph projects (e.g. mobile) or are we somehow limiting this to the Mozilla trust domain? if we're migrating, what does that look like?
  • if we're supporting multiple index formats, what does that look like? how do we keep this from growing too complex, especially if there's a tool that needs to query multiple projects with different index formats?
2. Making sure the new LDAP group is doable and accepable. I'm pretty sure it's fine on our side (please speak up if you disagree), and I'm going to have a chat with IT about their side.

As mentioned elsewhere, I think we could have a similar process to the level 1 commit privileges agreement that allows us to know what github usernames should be enabled.

@bhearsum
Copy link
Contributor Author

v2 only contains repo names, so if you have multiple repos with the same names (but different orgs) there's potential for pollution.

Oh I may have misunderstood then. BTW, you should capture the v4id for repos -- both org & repo names can be changed, allowing for takeovers.

This is not something I particularly want to include in an index, as they are meant to human readable. Is it a requirement?

@hwine
Copy link

hwine commented May 11, 2021

This is not something I particularly want to include in an index, as they are meant to human readable. Is it a requirement?

It's a race condition -- the "full name" (org/repo) is only good until the name changes, then it may be impossible to relink. So "requirement" depends on your use cases.

@bhearsum
Copy link
Contributor Author

Putting some answers inline to make sure we're generally aligned before updating the RFC.

  1. Making sure the v3 index format is sensible, fleshed out enough, and solves the problem it's meant to. @hwine, @escapewindow - what do you think here?

I think so. I think this will have multiple parts:

* what does the v3 index look like? how will it fit our future use cases?

index.{trust_domain}.v3.{host}.{repo-with-path}.branch.{branch}
Two examples:

  • index.mozilla.v3.github.com.mozilla-mobile.mozilla-vpn-client.branch.main.latest.taskgraph.decision
  • index.gecko.v3.hg.mozilla.org.releases.mozilla-beta.latest.firefox.decision
* are we migrating gecko, or just standalone taskgraph? if we're migrating, what does that look like? is there a watershed for e.g. mozregression?

Explicitly not migrating Gecko. There's no reason to, so it would just be scope creep IMO. We should do it at some point, when it becomes important.

* are we migrating all standalone taskgraph projects (e.g. mobile) or are we somehow limiting this to the Mozilla trust domain? if we're migrating, what does that look like?

Same as Gecko - not migrating until there's a need to.

* if we're supporting multiple index formats, what does that look like? how do we keep this from growing too complex, especially if there's a tool that needs to query multiple projects with different index formats?

I'm a little unsure how to respond to this part, so let me know if I'm missing something...

We're talking about adding one new index format, not N, so I don't see this as terribly complex. Any tool that needs to generally look for things will need to:

  • Add support for v3 indexes, without removing v2 support
  • Either: a) maintain a mapping of project -> index format, or b) look for a project in one, and fall back to the other if not found

Even if we were migrating everything to v3 indexes, we'd still need support for both to do a graceful transition. I imagine we'll eventually move everything over to v3 and drop v2 support - I just don't think it makes sense to include it in the scope of this.

2. Making sure the new LDAP group is doable and accepable. I'm pretty sure it's fine on our side (please speak up if you disagree), and I'm going to have a chat with IT about their side.

As mentioned elsewhere, I think we could have a similar process to the level 1 commit privileges agreement that allows us to know what github usernames should be enabled.

I agree that we should do whatever we currently do for L1 here. For some reason I thought that we automatically granted L1 to new engineers, but I could be wrong about that. I'll look into it.

@bhearsum
Copy link
Contributor Author

This is not something I particularly want to include in an index, as they are meant to human readable. Is it a requirement?

It's a race condition -- the "full name" (org/repo) is only good until the name changes, then it may be impossible to relink. So "requirement" depends on your use cases.

IMO this is a caveat we can live with. If a repo moves or renames, I think this sort of "bustage" is to be expected. The benefits of having a user-readable index (which is the point of the index in the first place) outweigh the benefits of stability in a rename/move. (If you disagree with this, please say so!)

@escapewindow
Copy link
Contributor

Putting some answers inline to make sure we're generally aligned before updating the RFC.

Ah, I thought we were going to spin the specific v3 index format out into its own RFC. I have concerns with the v3 index as proposed in this comment.

This is not something I particularly want to include in an index, as they are meant to human readable. Is it a requirement?

It's a race condition -- the "full name" (org/repo) is only good until the name changes, then it may be impossible to relink. So "requirement" depends on your use cases.

IMO this is a caveat we can live with. If a repo moves or renames, I think this sort of "bustage" is to be expected. The benefits of having a user-readable index (which is the point of the index in the first place) outweigh the benefits of stability in a rename/move. (If you disagree with this, please say so!)

I agreed with you here, Ben, in that I wanted human-readable indexes, and then I realized we could make it less human-unfriendly with tooling and a map. The index format is worth further discussion, since every version adds complexity we have to maintain or migrate to. Do we want to continue discussing that here, or resolve this point with "we'll decide what that index looks like in FUTURE-RFC"?

@bhearsum
Copy link
Contributor Author

Putting some answers inline to make sure we're generally aligned before updating the RFC.

Ah, I thought we were going to spin the specific v3 index format out into its own RFC. I have concerns with the v3 index as proposed in this comment.

This is not something I particularly want to include in an index, as they are meant to human readable. Is it a requirement?

It's a race condition -- the "full name" (org/repo) is only good until the name changes, then it may be impossible to relink. So "requirement" depends on your use cases.

IMO this is a caveat we can live with. If a repo moves or renames, I think this sort of "bustage" is to be expected. The benefits of having a user-readable index (which is the point of the index in the first place) outweigh the benefits of stability in a rename/move. (If you disagree with this, please say so!)

I agreed with you here, Ben, in that I wanted human-readable indexes, and then I realized we could make it less human-unfriendly with tooling and a map. The index format is worth further discussion, since every version adds complexity we have to maintain or migrate to. Do we want to continue discussing that here, or resolve this point with "we'll decide what that index looks like in FUTURE-RFC"?

I think it's pretty clear that a separate RFC makes sense for this. It's a big enough thing on its (more than I expected), and has a separate set of stakeholders. Do you want to block resolution of this on that being resolved? I could go either way.

@escapewindow
Copy link
Contributor

I think it's pretty clear that a separate RFC makes sense for this. It's a big enough thing on its (more than I expected), and has a separate set of stakeholders. Do you want to block resolution of this on that being resolved? I could go either way.

I think I'd be happy here if we say, "we need a new index format, schema and specifics TBD, but our requirements here are that we don't have index collisions between repositories of the same name but different orgs" or something. It sounds like we may need to open the v3 index discussion to allow for standalone taskgraph and gecko taskgraph to live together, and other consumers may have use cases that help us decide what that thing looks like. Hopefully we can get this hashed out soonish, but we may want to take enough time to get a v3 index format that will stick for a while, and not have to resort to a v3.1 or v4 soon after.

I'm hoping it's clear that I'm raising concerns that will help improve the project, and I'm not trying to throw arbitrary obstacles in your way. I'd love for this to be a simple problem. Thank you for tackling this!

@bhearsum
Copy link
Contributor Author

I think it's pretty clear that a separate RFC makes sense for this. It's a big enough thing on its (more than I expected), and has a separate set of stakeholders. Do you want to block resolution of this on that being resolved? I could go either way.

I think I'd be happy here if we say, "we need a new index format, schema and specifics TBD, but our requirements here are that we don't have index collisions between repositories of the same name but different orgs" or something. It sounds like we may need to open the v3 index discussion to allow for standalone taskgraph and gecko taskgraph to live together, and other consumers may have use cases that help us decide what that thing looks like. Hopefully we can get this hashed out soonish, but we may want to take enough time to get a v3 index format that will stick for a while, and not have to resort to a v3.1 or v4 soon after.

OK, let's not block moving this along in that case, and I will make sure that I push along index discussions elsewhere in short order.

I'm hoping it's clear that I'm raising concerns that will help improve the project, and I'm not trying to throw arbitrary obstacles in your way. I'd love for this to be a simple problem. Thank you for tackling this!

You're just pointing out obstacles that are already there, it's super helpful!

@escapewindow
Copy link
Contributor

Some thoughts I had last night on indexes:

Aiui, mshal spent multiple months and/or quarters refining and implementing the v2 index format for Gecko. This was a complex problem. We may be able to bug spelunk to determine his requirements and solutions, though we may have shoehorned additional use cases on top of Gecko v2 index since then.

However, we then also implemented another v2 index format for standalone taskgraph, and I suspect that the two v2 index formats are not fully compatible, which can cause issues when we use standalone taskgraph in Gecko. We may want to define the requirements here: finding decision tasks, cached tasks, toolchain + docker-image formats, treeherder, etc. as well as existing indexes for current standalone taskgraph projects. Plus, with the new Release for Mozilla RFCs, we want to avoid index collisions between projects of the same name but different organizations.

I suspect if we gather all of the above, we'll have a good idea of the requirements for a consolidated v3 index format. If we look at the audience involved in the v2 Gecko index format, we'll have part of the answer of "who is affected by an index format change", although we'll need to answer that question for standalone taskgraph projects as well. I suspect we'll want that shared and standardized v3 index format to solve all the above use cases. We probably want to send this proposal out to a wider audience, since this will affect downstream consumers.

However, I understand that while a consolidated v3 index format that solves all our combined use cases would be preferable, it's also a lot of work. I could also see proposing adding a v3 index format specifically for the mozilla trust domain, and dealing with the incompatible v2/v3 split across trust domains in build-decision, ci-config, ci-admin, standalone taskgraph, and any other tooling that queries indexes: shipit, CoT, ? If that is a preferred solution for the intermediate-term, we could do that, and plan to reach a consolidated v4 index format later down the road.

@bhearsum
Copy link
Contributor Author

Thanks for all of this, it's great to have some of the historical perspective here.

Some thoughts I had last night on indexes:

Aiui, mshal spent multiple months and/or quarters refining and implementing the v2 index format for Gecko. This was a complex problem. We may be able to bug spelunk to determine his requirements and solutions, though we may have shoehorned additional use cases on top of Gecko v2 index since then.

Regardless of what we do here, looking into what we did for the last change to the index format seems like a good idea if we're proposing a new one :).

However, I understand that while a consolidated v3 index format that solves all our combined use cases would be preferable, it's also a lot of work. I could also see proposing adding a v3 index format specifically for the mozilla trust domain, and dealing with the incompatible v2/v3 split across trust domains in build-decision, ci-config, ci-admin, standalone taskgraph, and any other tooling that queries indexes: shipit, CoT, ? If that is a preferred solution for the intermediate-term, we could do that, and plan to reach a consolidated v4 index format later down the road.

IMO, the only viable path forward in the medium-term is a new that's only used by either the mozilla trust domain, or everything that uses out-of-tree taskgraph. I can't find any justification for blocking this work on a new index format for Gecko (especially when it doesn't really provide clear benefit for it). I think we should make an effort to make the new format Gecko-compatible, but I don't think we need to go to the extent of consulting all of the Gecko stakeholders at this time (that's something we can do when we we're ready to have Gecko upgrade). I don't think this is conflicting with anything you're saying, but please let me know if it is so we can resolve it.

There's also another option that we haven't given serious consideration, which is: don't change the index format at all. The only driver for it is avoiding collisions, and when I step back and look at it, the likelyhood of two repos having the exact same name is not terribly likely, so it may be worth thinking a bit more about the cost/benefit, and whether or not this is something we must deal with now.

@escapewindow
Copy link
Contributor

escapewindow commented May 13, 2021

Regardless of what we do here, looking into what we did for the last change to the index format seems like a good idea if we're proposing a new one :).

+1 :)

IMO, the only viable path forward in the medium-term is a new that's only used by either the mozilla trust domain, or everything that uses out-of-tree taskgraph. I can't find any justification for blocking this work on a new index format for Gecko (especially when it doesn't really provide clear benefit for it). I think we should make an effort to make the new format Gecko-compatible, but I don't think we need to go to the extent of consulting all of the Gecko stakeholders at this time (that's something we can do when we we're ready to have Gecko upgrade). I don't think this is conflicting with anything you're saying, but please let me know if it is so we can resolve it.

Sure, I mentioned that's a possible stopgap v3, and then @ahal can do the v4 index compatibility work as a blocker to using standalone taskgraph in Gecko. Unless the standalone and gecko indexes are already compatible, which I think we're all hoping for.

There's also another option that we haven't given serious consideration, which is: don't change the index format at all. The only driver for it is avoiding collisions, and when I step back and look at it, the likelyhood of two repos having the exact same name is not terribly likely, so it may be worth thinking a bit more about the cost/benefit, and whether or not this is something we must deal with now.

This is interesting, and I gave it some thought.

We're helped here by the fact that the current proposal only targets level 1, and we've historically been ok with, say, Try tasks being capable of polluting other Try tasks, as long as Release tasks are secure.

Some thoughts here:

First, this would be easier to accept if, say, we knew that there were a level-1 string in all the indexes that the mozilla trust domain has access to. I'm not certain that will be the case, e.g. this block seems to have some index templates that don't refer to level at all. I'm a bit worried that later on down the line, we may want to add some level-3 indexes in this namespace, and those will be pollutable from the level-1 security projects: escalation of privilege. Some checks that verify that we don't have any level > 1 stuff in the mozilla namespace may be good here.

Second, I'm not sure mozilla is a descriptive enough name now. Given that a) we know this is only for level-1, and b) we already use project:mozilla: as a prefix for some scopes (search for project:mozilla: in this file, for example), we may already have some level-3 scopes that may be affected by this new project. Maybe mozilla-level-1, mozilla-try, mozilla-ci, mozilla-auto-tc, github-auto, ?

Third, I'm not sure how uncommon name collisions will be. Maybe we don't care from a security perspective if github.com/org1/persona can pollute indexes from github.com/org2/persona, because it's level 1. However, from a usability standpoint, going to a latest index for your build or test and going to a task from another repo may end up wasting time and causing frustration or confusion. Even if we think that we will rarely have a github.com/org{1,2}/persona collision for two completely different projects, Github forks will automatically have the same project name as the original repo, and it's not far fetched to think that someone will want to enable Taskcluster on their fork of a Taskcluster-enabled original repo. If both are in the mozilla trust domain, then the fork will pollute the indexes of the main repo, and someone may think that they're downloading the latest bits from the main repo, but they're downloading the bits from the fork.

[EDIT] Automation may make this worse: if we go find the decision task via the latest index, then grab the parameters, and schedule some tasks, we may actually fire cron tasks on the wrong repo, schedule actions on the wrong repo, etc. etc. If we have a toolchain or docker-image task with the same name on both repos, then we may end up using the wrong dependencies in our tasks and get completely busted builds and tests ([Edit 2] maybe not for this example, since these may be pinned by sha? [Edit 3] depends - if we look at the sha index, we're good. If we go to a decision task and grab the label-to-taskid.json, and we have overlapping labels and overlapping project names, then possibly?). We may still decide that we don't want to overrotate on this and move forward, but this will likely become a problem that gets worse once the mozilla trust domain gains popularity.

@escapewindow
Copy link
Contributor

escapewindow commented May 14, 2021

I'll note that this collision problem predates this RFC: When enabling the staging repos for fenix, android-components, xpi-manifest, and adhoc-signing, I intentionally added a staging- prefix to these forks because otherwise the staging tasks would pollute the production index namespaces. [Edit 3] This meant that until this happened, shipit-prod might get confused and try to promote a staging fenix build, or shipit-stage might get confused and try to promote a production fenix build. Scopes and CoT may save us here, but it causes noise and unhappiness and such.

However, the mozilla trust domain will remove the human reviewer and automate adding these, so we accelerate the process with no human saying "um, we already have 5 persona repos, and that's causing a problem, so maybe a different name"? I gave some thought to the site or integration somehow checking a master list and saying "you can't install me on your org2/persona repo because there's already an org1/persona repo", but then it would have to catch repo renames and deleted repos with old indexes lying around and I'm not sure how complex we want our workaround solution to be.

[EDIT] I suppose we could put in a check that verifies the repo names in the auto-projects module are unique, and fails out, and we prevent an autoland with a failed check... could work? I'm not sure that's a good user experience, especially having to rename your fork, but that might be a short- or intermediate-term solution. [Edit 2] I'm not sure what happens here on repo rename.

(Sorry for thinking out loud in Github comments, rubber ducky)

@escapewindow
Copy link
Contributor

escapewindow commented May 14, 2021

Another thought, sorry: we could limit the mozilla trust domain to a single org, no user repos, no 2nd org, which prevents name collisions. This reduces its functionality in that it's useful to enable TC on your fork to test out automation changes before creating a PR, but this could be a solution.

[Edit, because of course] Then I thought, maybe a mozilla trust domain per org? So auto-org1 trust domain, auto-org2 trust domain. That means we need a set of worker pools per, which is probably a hard no for hardware workers unless we get some cross-trust-domain-worker stuff set up.

@ahal
Copy link
Contributor

ahal commented May 14, 2021 via email

@escapewindow
Copy link
Contributor

For the collision problem couldn't we make sure the org is part of the index? GitHub projects are always identified by <org>/<repo> everywhere that I've seen.

Yes :) This echoes what I said in April :)


## `v3` Taskcluster index format

The current `v2` index format only includes repository names as an identifier (not user or organization). This is generally not an issue for any of current trust domains (because they generally only support one project), but in this new pool where we have N projects, it introduces the potential for collisions or pollution between them. To ensure this isn't an issue we will introduce a new `v3` index format that includes the repository location in its path as well - including both domain and path. Examples include:
Copy link
Contributor

@ahal ahal Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mobile repos define a trust_project in their project.yml:
https://hg.mozilla.org/ci/ci-configuration/file/73a8d7921adb15a8ca3b726225655fc9114524a7/projects.yml#l485

This is then used in the index route. I think we could simply define this as org/repo for any projects that use the mozilla trust domain instead of a creating a new format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me - good idea!

* `index.gecko.v3.hg.mozilla.org.releases.mozilla-beta.latest.firefox.decision`

This will require changes to a few things to support the new format:
* build-decision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What build-decision changes were you anticipating? I'd guess any projects using this will be on Github.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I...I have no idea at this point. Maybe I was thinking that some new projects might not be on Git. We can probably remove/ignore this.

* New Workers for tests on Linux, macOS 10.15, and Windows 10
* A RelEng maintained Docker image will be provided for Linux
* These will be created under a new `mozilla-t` provisioner
* New Scriptworker instances for signing and mac-signing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit on the fence about this. On the one hand, I realize that the intent here is to make it easier for us to stand things up, especially for things like mac signers. On the other, maybe when a project reaches the point of needing scriptworkers, then that is a signal that it is time for them to graduate to their own trust domain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we're only supporting L1 scriptworkers on this pool, then is it really saving us any work? We'll still need to wrangle mac signers for the L3 pool anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loses a fair amount of value if we can't have signing as part of this. We'd have to do one of the following pretty quickly:

  • Manually sign (avoiding this was part of the movitation behind this idea)
  • Move projects very quickly from the shared pool to their own exclusive one
  • Have automation to sign outside of scriptworkers (eg: a task that signs through autograph-edge). I don't even know if SecEng would allow this though :).

A potentially crazier idea could be to allow most tasks to run in the shared pool, but somehow put signing in its own pool (which means tasks in the same graph would be have two trust domains...which would probably means CoT changes). This would at least mean that signing would be purely adding to existing things, rather than getting blocked on a migration to a different pool.

* Workers will be prefixed with `mozilla-t-`
* mac-signing will run 10.14, like our other mac-signing workers (there's no known reason to upgrade)

Notably, we are only concerned with level 1 workers at this time, which means we can ignore things like scriptworkers that are only used when shipping. Level 3 workers will be dealt with at a later stage, and most likely will not use a shared trust domain or workers across projects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a sort of follow-on to my previous comment, what if instead of disallowing L3 on the shared pools, we instead disallow scriptworkers? Then once the project is a bit more mature, we would graduate them to their own trust domain and add the scriptworker tasks.

This way they could get started with Taskcluster right away, and have a proper PR == L1 and push == L3 setup. Security wise, the setup wouldn't be ideal.. but it would still be better than GH Actions, or doing signing on personal dev machines.

It kind of reminds me of the autonomous driving quote. It doesn't have to be perfect, it just has to be better than manual.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a sort of follow-on to my previous comment, what if instead of disallowing L3 on the shared pools, we instead disallow scriptworkers?

I think this is a great idea, independent of what we do with any other part of this! I can't think of any downsides to separating PRs and pushes.

@bhearsum
Copy link
Contributor Author

bhearsum commented Sep 7, 2023

@ahal - is this RFC still relevant given your recent work on this?

@ahal
Copy link
Contributor

ahal commented Sep 7, 2023

Hmm, yeah I guess I should have merged this first. There might still be some open issues around non-level 1 / scriptworker pools.. but yeah, it's mostly implemented already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants