Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions rfcs/0034-automatic-ci-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# RFC 34 - Automatically update ci-configuration for new repositories
* Comments: [#34](https://api.github.com/repos/mozilla-releng/releng-rfcs/issues/34)
* Proposed by: @bhearsum

# Summary

Automatically configure the Firefox CI cluster for new repositories when they install the Firefox CI Taskcluster integration.

## Motivation

Currently, ci-configuration changes must be made manually for every new repository that wants to use the Firefox CI Taskcluster instance. These changes are not typically difficult to make, but they can only be done by a small set of authorized people. This is a barrier to entry for people using Taskcluster (especially when you compare to bootstrapping something like CircleCI or Github Actions).

# Details

Implementing this will involve small to medium sized changes to the ci-configuration repository, the ci-admin tool, taskcluster-github, and hg.mozilla.org hooks. It will also require a small new tool to make the actual changes.

## Overview
When a user installs the Firefox CI Taskcluster integration, Taskcluster-Github will receive either a `installation` or `installation_repository` event. In response it will create a Pulse message ontaining repository, user or organization, and sender information. A new tool, `auto-ci-config`, will receive these messages and update ci-configuration in response to them (more on that below). Finally, CloudOps' Jenkins instance will apply the ci-configuration change, as it does for any other ci-configuration change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also discussed pre-populating orgs and users in here. Pre-population may be more rough and less elegant than your proposed solution, but we might be able to get it working before we have all the automation done for this RFC, and there would be no delay for someone already allowlisted to add a new repo with scopes. Still an option, or are we going the proposed direction fully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less inclined to go the prepopulate route now that I'd dug into how much work this would be (some, but not a crazy amount). If the number of project repos we had was static I'd be more inclined to, but I suspect it won't be long before a repo we haven't prepopulated needs support.


## auto-ci-config

This tool will listen for events on a Pulse queue that receives messages whenever a new Firefox-CI Taskcluster installation happens. In response, it will:
* Finds the sender in the Mozilla Person API (via queries like https://people.mozilla.org/api/v4/search/simple/?q=sciurus&w=staff)
* Verifies that they are in the `taskcluster_users` LDAP group. If not, exit
* Finds the default branch of the repository by querying the Github API
* Creates and pushes the necessary ci-configuration change

## ci-configuration
A new file (`projects-automanaged.yml`) will be added that is used for automanaged Github projects. It will look similar to projects.yml except that it will contain a top-level key of `github-projects`. Additionally, a key called `repositories` will be required, which will contain a list of github repositories that should be configured with the noted parameters. In essence, this is an inverted and limited form of what is already used in `projects.yml`. This structure implies that all projects listed in it will share the same configuration. Projects that need separate configuration may graduate to `projects.yml` later.

Proposed format:
```
github-projects:
repositories:
https://github.com/mozilla/mozilla-vpn-client:
default_branch: main
features:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even consider removing features from the file, and adding a hardcoded set when we process the file. This way if the autoland creds are compromised, they're limited to adding, removing, or editing repositories in this list, but they can't grant extra scopes.

If we do this, we may want to rename projects-automanaged.yml to something like mozilla_level_1.yml to be more precise about what the file contains.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to think about private repos at this point.
Either we detect this by https://github.com/ vs [email protected], or we could have a private-repo boolean, or both. I imagine that would mean its artifacts would be private artifacts in the project/mozilla/ space, and all mozilla-level-1 repos and taskcluster_users users would have scopes to download those. Also, we may have to have github token(s) with read access to the private repos to enable cloning.

(Sadly, this may mean we have to invite a bot user to each private repo, which requires releng to log in as that user and accept. It's easier if we're in an existing org, where the bot is already part a team in the org, and we just invite the team to read the repo.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of those issues may be smoother if this is handled by the GitHub App, which could have the required perms to do the setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even consider removing features from the file, and adding a hardcoded set when we process the file. This way if the autoland creds are compromised, they're limited to adding, removing, or editing repositories in this list, but they can't grant extra scopes.

Yeah, I think this is probably for the best. Let's do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we detect this

Good news, it's part of the installation event payload, so we can just pass it along :):

...
  "repositories_added": [
    {
      "id": 186853007,
      "node_id": "MDEwOlJlcG9zaXRvcnkxODY4NTMwMDc=",
      "name": "Space",
      "full_name": "Codertocat/Space",
      "private": false
    }
  ],
...

You've highlighted a lot of other things about private repos that I haven't thought about though - I'll investigate those, and see if there's any other things that need addressing too. In some ways I'm tempted to descope them, but let's see how much work we think it is first (there's definitely going to be projects that start off as private in the future).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying the Github App should have hg.m.o ssh creds with access to push to ci-configuration without review? I would lean more towards the Github App sending a pulse message that kicks off a taskcluster hook that triggers a treescript scriptworker task that has creds to push to ci-configuration, but we do have to make sure that the task can't push any changes that breaks treescript.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I must have misinterpreted some of the prior statements. It sounded as if there were additional GitHub permissions that needed to be established ("we have to invite a bot user to each private repo"). The GitHub App might be a choice there. I'm not intending to add to any discussion around hg.m.o.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that part is "private repos need to grant additional permissions to the Firefox-CI Taskcluster integration" (please correct me if I'm wrong @escapewindow), which obviously isn't something tc-gh could do itself.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're getting in the weeds here on this implementation detail -- I suggest a zoom if we want to continue :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Inviting the bot is so the github token that we use to read/write the repo has perms when we clone or make any repo queries or whatnot from scriptworker or decision tasks or builds or whatever. Not sure we want to use the app for that? and +1 to the zoom call

taskgraph-actions: true
github-taskgraph: true
trust-domain-scopes: true
```

We will need a couple of small changes to grants to:
* Allow decision tasks in the automanaged projects to have the scopes they need to run tasks for automanaged projects
* Allow anyone in the `taskcluster_users` LDAP group to create, rerun, etc. tasks for automanaged projects

Notably, this means that users from project A can cancel, rerun, etc. tasks for project B. This is considered acceptable for level 1 access.

## ci-admin

ci-admin will need support adding for processing the new `projects-automanaged.yml` file. This will include hardcoding `level` to `1` and `trust_domain` to `mozilla` to ensure that all automanaged projects use these values in their scopes and grants (which are the only valid values).

## hg.mozilla.org

We will need a hook that prevents the account that will be pushing these changes from modifying files other than `projects-automanaged.yml`, to prevent any compromise of that system from making any changes to Firefox, Fenix, or other more high risk parts of ci-configuration. There is existing prior art on this sort of limitation, so this is mostly configuration, not development.

## taskcluster-github

Taskcluster-Github will be updated to support received `installation` and `installation_repository` events, and creating Pulse messages in response to them.

# Open Questions

* Are we OK with something pushing ci-config changes fully automatically, with it restricted to one particular file?
* Hal was OK with the general idea of this.
* Is an LDAP group the right thing to use? Or should we use a Mozillians group? Some other way of authorizing the request?
* Which `features` will we enable for automanaged projects?
* Should `features` be hardcode in `ci-admin` like `level` and `trust_domain` are?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking yes, although we could do something like:

automanaged-projects.yml: contains a map of filepath to features, e.g.

automanaged_projects:
   mozilla-level-1:
       path: automanaged-projects/mozilla_level_1.yml
       features:
           - ... 
       trust-domain: mozilla
       level: 1

That means we can configure these in ci-configuration rather than hardcode them in ci-admin. And since this is a separate file, the autopush creds can't modify them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea!


# Implementation

<once the RFC is decided, these links will provide readers a way to track the
implementation through to completion>

* <link to tracker bug, issue, etc.>
* <...>