Skip to content

Conversation

@bhearsum
Copy link
Contributor

@bhearsum bhearsum commented Apr 16, 2021

Implementing this will involve small to medium sized changes to the ci-configuration repository, the ci-admin tool, taskcluster-github, and hg.mozilla.org hooks. It will also require a small new tool to make the actual changes.

## Overview
When a user installs the Firefox CI Taskcluster integration, Taskcluster-Github will receive either a `installation` or `installation_repository` event. In response it will create a Pulse message ontaining repository, user or organization, and sender information. A new tool, `auto-ci-config`, will receive these messages and update ci-configuration in response to them (more on that below). Finally, CloudOps' Jenkins instance will apply the ci-configuration change, as it does for any other ci-configuration change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also discussed pre-populating orgs and users in here. Pre-population may be more rough and less elegant than your proposed solution, but we might be able to get it working before we have all the automation done for this RFC, and there would be no delay for someone already allowlisted to add a new repo with scopes. Still an option, or are we going the proposed direction fully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less inclined to go the prepopulate route now that I'd dug into how much work this would be (some, but not a crazy amount). If the number of project repos we had was static I'd be more inclined to, but I suspect it won't be long before a repo we haven't prepopulated needs support.

repositories:
https://github.com/mozilla/mozilla-vpn-client:
default_branch: main
features:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even consider removing features from the file, and adding a hardcoded set when we process the file. This way if the autoland creds are compromised, they're limited to adding, removing, or editing repositories in this list, but they can't grant extra scopes.

If we do this, we may want to rename projects-automanaged.yml to something like mozilla_level_1.yml to be more precise about what the file contains.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to think about private repos at this point.
Either we detect this by https://github.com/ vs [email protected], or we could have a private-repo boolean, or both. I imagine that would mean its artifacts would be private artifacts in the project/mozilla/ space, and all mozilla-level-1 repos and taskcluster_users users would have scopes to download those. Also, we may have to have github token(s) with read access to the private repos to enable cloning.

(Sadly, this may mean we have to invite a bot user to each private repo, which requires releng to log in as that user and accept. It's easier if we're in an existing org, where the bot is already part a team in the org, and we just invite the team to read the repo.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of those issues may be smoother if this is handled by the GitHub App, which could have the required perms to do the setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even consider removing features from the file, and adding a hardcoded set when we process the file. This way if the autoland creds are compromised, they're limited to adding, removing, or editing repositories in this list, but they can't grant extra scopes.

Yeah, I think this is probably for the best. Let's do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we detect this

Good news, it's part of the installation event payload, so we can just pass it along :):

...
  "repositories_added": [
    {
      "id": 186853007,
      "node_id": "MDEwOlJlcG9zaXRvcnkxODY4NTMwMDc=",
      "name": "Space",
      "full_name": "Codertocat/Space",
      "private": false
    }
  ],
...

You've highlighted a lot of other things about private repos that I haven't thought about though - I'll investigate those, and see if there's any other things that need addressing too. In some ways I'm tempted to descope them, but let's see how much work we think it is first (there's definitely going to be projects that start off as private in the future).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying the Github App should have hg.m.o ssh creds with access to push to ci-configuration without review? I would lean more towards the Github App sending a pulse message that kicks off a taskcluster hook that triggers a treescript scriptworker task that has creds to push to ci-configuration, but we do have to make sure that the task can't push any changes that breaks treescript.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I must have misinterpreted some of the prior statements. It sounded as if there were additional GitHub permissions that needed to be established ("we have to invite a bot user to each private repo"). The GitHub App might be a choice there. I'm not intending to add to any discussion around hg.m.o.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that part is "private repos need to grant additional permissions to the Firefox-CI Taskcluster integration" (please correct me if I'm wrong @escapewindow), which obviously isn't something tc-gh could do itself.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're getting in the weeds here on this implementation detail -- I suggest a zoom if we want to continue :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Inviting the bot is so the github token that we use to read/write the repo has perms when we clone or make any repo queries or whatnot from scriptworker or decision tasks or builds or whatever. Not sure we want to use the app for that? and +1 to the zoom call

* Hal was OK with the general idea of this.
* Is an LDAP group the right thing to use? Or should we use a Mozillians group? Some other way of authorizing the request?
* Which `features` will we enable for automanaged projects?
* Should `features` be hardcode in `ci-admin` like `level` and `trust_domain` are?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking yes, although we could do something like:

automanaged-projects.yml: contains a map of filepath to features, e.g.

automanaged_projects:
   mozilla-level-1:
       path: automanaged-projects/mozilla_level_1.yml
       features:
           - ... 
       trust-domain: mozilla
       level: 1

That means we can configure these in ci-configuration rather than hardcode them in ci-admin. And since this is a separate file, the autopush creds can't modify them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea!

@djmitche
Copy link

With apologies for a drive-by comment here, but:

From a high level, this proposal seems to be implementing a sync from the state on GitHub to state in an hg repository, using delta events (repo-added, repo-removed). That's trusting that anything that adds the GH app is OK if the github performing the add is in the taskcluster_users Mozilla access group.

Rather than synchronize things, perhaps consider calling GitHub or TC-GH APIs directly from ci-admin to get the same data straight from the authoritative source. That eliminates having push credentials, and avoids any issues from missed messages or the like (such as if GH fails to deliver a webhook).

The tricky bit would be the permission model, since I don't think anything stores who added a particular app to a repo. If that's the desired access model, we could potentially store that and make it available from TC-GH.

@bhearsum
Copy link
Contributor Author

With apologies for a drive-by comment here, but:

No worries at all, I appreciate your input!

From a high level, this proposal seems to be implementing a sync from the state on GitHub to state in an hg repository, using delta events (repo-added, repo-removed). That's trusting that anything that adds the GH app is OK if the github performing the add is in the taskcluster_users Mozilla access group.

Rather than synchronize things, perhaps consider calling GitHub or TC-GH APIs directly from ci-admin to get the same data straight from the authoritative source. That eliminates having push credentials, and avoids any issues from missed messages or the like (such as if GH fails to deliver a webhook).

The tricky bit would be the permission model, since I don't think anything stores who added a particular app to a repo. If that's the desired access model, we could potentially store that and make it available from TC-GH.

I'm open to this idea, and if it truly does eliminate the need for hg.mozilla.org credentials, it seems like a very obvious improvement.

Just to be 100% sure I understand, it sounds like you're suggesting not adding automanaged-projects to ci-configuration at all, and pulling it each time ci-admin` runs.

If that's the case, we'll still need a mechanism to trigger ci-admin runs on installs (we could do that as described in this RFC, and just not pull anything from the payload), or run it on a schedule.

@tomprince
Copy link
Contributor

I think having the information on installed repos (and installer) available, and using that information, rather than just the change information makes sense. However, I think one of the strengths of ci-admin/ci-config is that it doesn't[1] pull information from other sources to determine what to generate, so that the ci-config repository reflects the history of what the cluster configuration is. Thus, I think it makes sense to include the configuration in the repo, but ci-admin check could verify it against the taskcluster github app (much like it verifies repository level for hgmo repositories).

[1] The only exceptions I am aware of are:

  • the tacluster.yml used in action tasks
  • historical action hooks, which are kept around for 60 days (which is needed because of the former.
    I suspect both of these could potentially be removed by using build-decision for generating action tasks.

@bhearsum
Copy link
Contributor Author

I think having the information on installed repos (and installer) available, and using that information, rather than just the change information makes sense. However, I think one of the strengths of ci-admin/ci-config is that it doesn't[1] pull information from other sources to determine what to generate, so that the ci-config repository reflects the history of what the cluster configuration is.

Hey Tom! Thanks for your comment and insight here - I didn't realize that this was an explicit design decision until now. I can certainly see how relying on the state of tc-gh at some point in time may be less than ideal (and also opens up ci-admin to an additional mode of failure).

Thus, I think it makes sense to include the configuration in the repo, but ci-admin check could verify it against the taskcluster github app (much like it verifies repository level for hgmo repositories).

IMO, the main reason we'd want to look at tc-gh instead of embedded into ci-config is to avoid the need for some system to have hg credentials to push changes to ci-config. I don't think this addresses that, but maybe we can find a way to get the config in the repo without having those credentials. Eg: maybe we create a phab revision instead, and land it post-facto.

@djmitche
Copy link

I agree with @tomprince that my suggestion (which you accurately summarized. @bhearsum) would lose history, and that history is an important feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants