-
Notifications
You must be signed in to change notification settings - Fork 10
RFC 35: Easy hg/github → hg.mozilla.org syncing #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # RFC 35 - Easy hg/github → hg.mozilla.org syncing | ||
| * Comments: [#35](https://api.github.com/repos/mozilla-releng/releng-rfcs/issues/35) | ||
| * Proposed by: @bhearsum | ||
|
|
||
| # Summary | ||
|
|
||
| Support one-way syncing from existing hg or Github repos into Phabricator-enabled repositories. | ||
|
|
||
| ## Motivation | ||
|
|
||
| A handful of projects (mostly notably PDF.js and NSS) regularly need to sync code into Gecko. In all cases, this is currently done by hand on laptops, making it annoying, error-prone, and low bus factor. Supporting this as a first class operation would make things better & easier for existing projects, and allow others to do so much more easily, enabling other use cases. | ||
|
|
||
| ## Out of Scope | ||
|
|
||
| This RFC is intentionally tightly scoped to the currently known use cases. In particular, we are specifically _not_ supporting 2-way syncs or syncing things _into_ a Git or Github repository as part of this. These things may be considered in the future, but are explicitly out of scope for now. | ||
|
|
||
| # Details | ||
|
|
||
| Because this RFC is scoped to only support Phabricator-enabled repositories we can ignore some of the usual security concerns (because we won't have high-value credentials, just a Phabricator token). This also means we don't need to use something as complex as a scriptworker -- docker-worker should be perfectly fine for this use case. Most of the implemention work will be glueing existing things together. At a high level, we need a Task that: | ||
|
|
||
| * Takes the artifact from an upstream Task | ||
| * Unpacks it | ||
| * Clones the target repository | ||
| * Copies all or some of the contents to a target repository | ||
| * Runs `moz-phab` to create a Phabricator revision | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A key safety feature here is that none of the just-modified code can be executed during the landing process. It's not clear to me if Related, I'd like to see some hard coding as to what parts of the repository are allowed to be updated via this process. This type of process should not be able to update any of the build code (including taskgraphs), nor modify any file that will be passed to If these precautions are possible, please doc as requirements. If not possible - how else can we protect against that?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is practical, if I'm understanding correctly. As an example, NSS imports into mozilla-central, including files like https://hg.mozilla.org/mozilla-central/file/tip/security/nss/build.sh, which can obviously eval and run arbitrary code. Is this something you would want to be denied landing through this system? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, unless we can think of sufficient "other layers" to mitigate the risk. Combined with the SaC & approve-own-patch issue, I'm feeling less comfortable. :/ You may want to ask for an early RRA/consultation -- I'm unsure how the "risks of new approach" compare to "risks of current system" |
||
|
|
||
| Choosing to take an artifact from an upstream Task rather than simply copy files from a source repository clone is notable, as it provides the flexibility for projects to run build or other preprocessing steps before syncing is completed (which is a known use case for both PDF.js and NSS). If uses cases for more simple copies present themselves in the future, we may add support for the simpler operation as well - but it is not necessary at this time. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as this artifact is reproducible, and there's a good way for us to reproduce it in autoland/reviewbot, I'm ok with that. If it isn't reproducible, we may be looking at a CoT-like solution to verify the original artifact was built in trusted workers from a trusted repo and revision.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this really a requirement, or a nice to have? The alternative we're comparing to is people doing similar things on their laptops.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if we're doing it properly, not allowing for a single bad actor to pwn m-c is preferred.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if we do updatebot or similar from the gecko trust domain, cloning the source repo and running a script, then we can use CoT to verify the artifact is from a trusted tree/worker, and we can worry a bit less about reproducibility. |
||
|
|
||
| ## Taskgraph | ||
|
|
||
| ### New Docker Image | ||
|
|
||
| A new Docker image will be created that provides Mercurial (including any necessary tools to do fast clones), `moz-phab`, and a simple helper script that wraps the process described above. (This helper script primarily exists to ensure we never run into command line length limits.) | ||
|
|
||
| ### Kind & Transforms | ||
|
|
||
| A new `kind` will be created that can create sync jobs. Sensible defaults will be used for descriptions, treeherder symbols, etc. It will run on docker-worker with the aforementioned image. | ||
|
|
||
| This kind will need a new transform that will take information provided by a project repository (see below), and use it to construct the right command to execute. | ||
|
|
||
| ## Project Repositories | ||
|
|
||
| Will need to provide: | ||
|
|
||
| * Target repository URL | ||
| * Upstream task name and artifact name | ||
| * Source -> Target mapping, relatively to artifact root & target repository root, ie: | ||
|
|
||
| ```yaml:: | ||
| syncMappings: | ||
| targetRepository: https://hg.mozilla.org/mozilla-central | ||
| sourceArtifacts: | ||
| - {buildTask}/buildArtifact.tar.bz2: | ||
| src/: toolkit/components/pdfjs/content/build/ | ||
| web/: toolkit/components/pdfjs/content/web/ | ||
| ``` | ||
|
|
||
| This information will be provided in the simplified Taskcluster config described by RFC XXX. (This needs updating after that RFC is opened.) | ||
|
|
||
| # Open Questions | ||
|
|
||
| * Does this enable single actor attacks? Ie: a privileged person in the PDF.js repo gets automation to create a Phab revision, and then reviews it themselves and lands it? | ||
| * I think people can already land without review, so maybe it's effectively no different? | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think so, as written now, before my comments:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If they do so, it probably looks like they landed their own patch with an If they follow this approach, it looks like they reviewed a valid vendoring patch, and these patches can be large enough that finding the badness can be a needle in a haystack without the verification task. Also, people can land without review if they're level 4, which is a restricted list. A leaked phab token could essentially open this up to all level 3 users.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a good point that the look of someone landing their own patch is very different here. I definitely agree that it's less likely to be noticed at something alarming. |
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How will use of this feature be restricted? It's one thing to have it used in vetted situations, it's quite another to allow general usage. The "only in this subtree" restriction suggested above is one level of protection, but are any more planned? I'm assuming this would only run on level 3 builders "for real", but I don't know how that impacts testing, etc. |
||
| # Implementation | ||
|
|
||
| <once the RFC is decided, these links will provide readers a way to track the | ||
| implementation through to completion> | ||
|
|
||
| * <link to tracker bug, issue, etc.> | ||
| * <...> | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we lose this token, then something or someone can create phab patches that look like valid sync task output, but contain something arbitrary or malicious. That doesn't necessarily mean we need to guard the token as tightly as release signing secrets, but we may want something to verify that the phab patch matches the vendored code it's supposed to come from.
We already have a workflow like this in updatebot; we may want to look at reusing some of the same workflows. Aiui, updatebot creates a vendored code patch in phab, someone reviews and merges that phab patch, and tests will run against that revision that verifies that if we rerun the automation to generate the patch from the original source repo and revision, we get the exact same patch.
Also, aiui updatebot is restricted to only change code inside of a vendored subtree, and is not able to change the code that verifies the vendored code.
However, I think there are still things we'd have to do in order to use updatebot, namely have a scripted process where updatebot can generate the patches and verify them, at least. I don't know enough about it to say how fully automated it is, and how much is tjr and company pushing it along. Worth looking into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good point; I somehow managed to forget about updatebot when writing this 🙃