|
| 1 | +# GCP backup |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +In GCP (Google Cloud Platform) we keep offsite backups for both Rust releases and crates |
| 6 | +to protect us against security threats that could involve losing crates or releases. |
| 7 | +These threats were identified in the [threat model]. |
| 8 | + |
| 9 | +## Motivation |
| 10 | + |
| 11 | +While we have multiple measures in place to prevent accidental deletion of Rust releases or crates in AWS, |
| 12 | +e.g. bucket replication to a different region and restricted access, our current setup does not sufficiently protect us against a few threats: |
| 13 | + |
| 14 | +1. _AWS Account compromise_. The [threat model] for Rust's infrastructure, created by the Rust Foundation's security engineer, highlights the risk of an AWS account compromise. |
| 15 | + If a malicious actor was able to gain administrator access to the AWS account of one of the [infra-admins], |
| 16 | + they could bypass a lot of safe guards and delete data. |
| 17 | +2. _AWS Account deletion_. AWS could accidentally delete our account, resulting in the possible deletion of data and backups. |
| 18 | + For example, something similar happened at [Google](https://arstechnica.com/gadgets/2024/05/google-cloud-accidentally-nukes-customer-account-causes-two-weeks-of-downtime/) recently. |
| 19 | + |
| 20 | +- To mitigate threat 1, the new backup needs to have separate admin access. |
| 21 | +- To mitigate threat 2, the new backup needs to be in a separate cloud environment. |
| 22 | + |
| 23 | +## Implementation overview |
| 24 | + |
| 25 | +These new backups are hosted in a dedicated GCP account and have totally separate access controls compared to AWS. |
| 26 | +Specifically, none of the current `infra-admins` have admin access to this separate environment to protect against an account compromise. |
| 27 | +This GCP account is not used for anything else (just for backups). |
| 28 | + |
| 29 | +The backups are automatically copied daily by GCP. |
| 30 | + |
| 31 | +### Access 👤 |
| 32 | + |
| 33 | +We limit admin access to the GCP backups to two members of the Rust Foundation for the following reasons: |
| 34 | + |
| 35 | +- _ensure a strong separation of access_: as explained in the first [motivation](#Motivation), the GCP admins should be different from the AWS admins. |
| 36 | + This means we can't give admin access to any of the `infra-admins`. |
| 37 | +- _accountability_: the Rust Foundation employees are accountable for their actions, which means they can't delete the backups intentionally without breaking the law. |
| 38 | + |
| 39 | +People with admin access to the GCP account: |
| 40 | + |
| 41 | +- Joel Marcey (Director of Technology @ Rust Foundation) |
| 42 | +- Walter Pearce (Security Engineer @ Rust Foundation) |
| 43 | + |
| 44 | +People with read-only access to the GCP project: |
| 45 | + |
| 46 | +- `infra-admins` |
| 47 | + |
| 48 | +The admin access of the GCP account is bound to the `@rustfoundation.org` Google Workspace account. |
| 49 | +This means that if an employee leaves the Rust Foundation, they lose access to the GCP account. |
| 50 | +In this case we need to add a new admin to the GCP account. |
| 51 | + |
| 52 | +> [!NOTE] |
| 53 | +> The `infra-admins` team can have admin access to the GCP staging project if needed. |
| 54 | +
|
| 55 | +### In case of emergency 🧯 |
| 56 | + |
| 57 | +- In case our data in AWS is deleted, the `infra-admin` team can restore it by: |
| 58 | + - copying the data from GCP to AWS using the GCP read-only access. |
| 59 | + - restoring the `crates-io-index` bucket from the `db-dump` stored in the `crates-io` bucket. Use [this](https://github.com/rust-lang/crates.io/blob/e0bb0049daa12f5362def463b04febd6c036d315/src/worker/jobs/git.rs#L19-L129) code. |
| 60 | +- If the GCP synchronization mechanism breaks, the Infra team can raise a PR to fix the terraform configuration and a GCP admin can apply it. |
| 61 | + |
| 62 | +### New threat model 🦹 |
| 63 | + |
| 64 | +To delete our data, an attacker would need to compromise both: |
| 65 | + |
| 66 | +- one AWS admin account (an `infra-admin`) |
| 67 | +- one GCP admin account (Joel or Walter) |
| 68 | + |
| 69 | +This improves our security posture because compromising two accounts is harder than compromising one. |
| 70 | + |
| 71 | +The accidental account deletion is not a threat anymore because if either AWS or GCP delete our account, we can restore the data from the other provider. |
| 72 | + |
| 73 | +## Implementation details |
| 74 | + |
| 75 | +The account where we store the backup is called `rust-backup`. It contains two GCP projects: `backup-prod` and `backup-staging`. |
| 76 | +Here we have one Google [Object Storage](https://cloud.google.com/storage?hl=en) in the `europe-west-1` (Belgium) region for the following AWS S3 buckets: |
| 77 | + |
| 78 | +- `crates-io`. Cloudfront URL: `cloudfront-static.crates.io`. It contains the crates published by the Rust community. |
| 79 | +- `static-rust-lang-org`. Cloudfront Url: `cloudfront-static.rust-lang.org`. Among other things, it contains the Rust releases. |
| 80 | + |
| 81 | +The [storage class](https://cloud.google.com/storage/docs/storage-classes) is set to "archive" for both buckets. This is the cheapest class for infrequent access. |
| 82 | + |
| 83 | +We use [Storage Transfer](https://cloud.google.com/storage-transfer/docs/overview) to automatically transfer the content of the s3 bucket into the Google Object Storage. |
| 84 | +This is a service managed by Google. We'll use it to download the S3 buckets from cloudfront to perform a daily incremental transfer. The transfers only move files that are new, updated, or deleted since the last transfer, minimizing the amount of data that needs to be transferred. |
| 85 | + |
| 86 | +### Monitoring 🕵️ |
| 87 | + |
| 88 | +To check that the backups are working: |
| 89 | + |
| 90 | +- Ensure the number of files and the size of the GCP buckets is the same as the respective AWS buckets by looking at the metrics |
| 91 | +- Ensure that only the authorized people have access to the account |
| 92 | + |
| 93 | +You can also run the following test: |
| 94 | + |
| 95 | +- Upload a file in an AWS S3 bucket and check that it appears in GCP. |
| 96 | +- Edit the file in AWS and check that you can recover the previous version from GCP. |
| 97 | +- Delete the in AWS and check that you can recover all previous versions from GCP. |
| 98 | + |
| 99 | +In the future, we might want to create alarts in Datadog to monitor if the transfer job fails. |
| 100 | + |
| 101 | +### FAQ 🤔 |
| 102 | + |
| 103 | +#### Do we need a multi-region backup for the object storage? |
| 104 | + |
| 105 | +No. [Multi-region](https://cloud.google.com/storage/docs/availability-durability#cross-region-redundancy) only helps if we want to serve this data real-time and we want to have a fallback mechanism if a GCP region fails. We just need this object storage for backup purposes, so we don't need to pay more 👍 |
| 106 | + |
| 107 | +#### Why did you choose the `us-west-1` GCP region? |
| 108 | + |
| 109 | +It's the same region where the AWS S3 buckets are located. In this way, we reduce the latency of the transfer job. |
| 110 | +Also, the cost calculator indicates that this regions has a "Low CO2" and it's among the cheapest regions. |
| 111 | + |
| 112 | +#### Why GCP? |
| 113 | + |
| 114 | +Both the Rust Foundation and the Rust project have a good working relationship with Google, and it is where the Rust Foundation's Security Initiative hosts its infrastructure. |
| 115 | +Due to the good collaboration with Google, we expect that we can cover the costs of the backup with credits provided by Google. |
| 116 | + |
| 117 | +[infra-admins]: https://github.com/rust-lang/team/blob/master/teams/infra-admins.toml |
| 118 | +[threat model]: https://docs.google.com/document/d/10Qlf8lk7VbpWhA0wHqJj4syYuUVr8rkGVM-k2qkb0QE |
0 commit comments