Skip to content

Commit a1657f1

Browse files
committed
docs: document gcp backup
1 parent 0372cf7 commit a1657f1

File tree

1 file changed

+118
-0
lines changed

1 file changed

+118
-0
lines changed

service-catalog/gcp-backup/README.md

+118
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# GCP backup
2+
3+
## Summary
4+
5+
In GCP (Google Cloud Platform) we keep offsite backups for both Rust releases and crates
6+
to protect us against security threats that could involve losing crates or releases.
7+
These threats were identified in the [threat model].
8+
9+
## Motivation
10+
11+
While we have multiple measures in place to prevent accidental deletion of Rust releases or crates in AWS,
12+
e.g. bucket replication to a different region and restricted access, our current setup does not sufficiently protect us against a few threats:
13+
14+
1. _AWS Account compromise_. The [threat model] for Rust's infrastructure, created by the Rust Foundation's security engineer, highlights the risk of an AWS account compromise.
15+
If a malicious actor was able to gain administrator access to the AWS account of one of the [infra-admins],
16+
they could bypass a lot of safe guards and delete data.
17+
2. _AWS Account deletion_. AWS could accidentally delete our account, resulting in the possible deletion of data and backups.
18+
For example, something similar happened at [Google](https://arstechnica.com/gadgets/2024/05/google-cloud-accidentally-nukes-customer-account-causes-two-weeks-of-downtime/) recently.
19+
20+
- To mitigate threat 1, the new backup needs to have separate admin access.
21+
- To mitigate threat 2, the new backup needs to be in a separate cloud environment.
22+
23+
## Implementation overview
24+
25+
These new backups are hosted in a dedicated GCP account and have totally separate access controls compared to AWS.
26+
Specifically, none of the current `infra-admins` have admin access to this separate environment to protect against an account compromise.
27+
This GCP account is not used for anything else (just for backups).
28+
29+
The backups are automatically copied daily by GCP.
30+
31+
### Access 👤
32+
33+
We limit admin access to the GCP backups to two members of the Rust Foundation for the following reasons:
34+
35+
- _ensure a strong separation of access_: as explained in the first [motivation](#Motivation), the GCP admins should be different from the AWS admins.
36+
This means we can't give admin access to any of the `infra-admins`.
37+
- _accountability_: the Rust Foundation employees are accountable for their actions, which means they can't delete the backups intentionally without breaking the law.
38+
39+
People with admin access to the GCP account:
40+
41+
- Joel Marcey (Director of Technology @ Rust Foundation)
42+
- Walter Pearce (Security Engineer @ Rust Foundation)
43+
44+
People with read-only access to the GCP project:
45+
46+
- `infra-admins`
47+
48+
The admin access of the GCP account is bound to the `@rustfoundation.org` Google Workspace account.
49+
This means that if an employee leaves the Rust Foundation, they lose access to the GCP account.
50+
In this case we need to add a new admin to the GCP account.
51+
52+
> [!NOTE]
53+
> The `infra-admins` team can have admin access to the GCP staging project if needed.
54+
55+
### In case of emergency 🧯
56+
57+
- In case our data in AWS is deleted, the `infra-admin` team can restore it by:
58+
- copying the data from GCP to AWS using the GCP read-only access.
59+
- restoring the `crates-io-index` bucket from the `db-dump` stored in the `crates-io` bucket. Use [this](https://github.com/rust-lang/crates.io/blob/e0bb0049daa12f5362def463b04febd6c036d315/src/worker/jobs/git.rs#L19-L129) code.
60+
- If the GCP synchronization mechanism breaks, the Infra team can raise a PR to fix the terraform configuration and a GCP admin can apply it.
61+
62+
### New threat model 🦹
63+
64+
To delete our data, an attacker would need to compromise both:
65+
66+
- one AWS admin account (an `infra-admin`)
67+
- one GCP admin account (Joel or Walter)
68+
69+
This improves our security posture because compromising two accounts is harder than compromising one.
70+
71+
The accidental account deletion is not a threat anymore because if either AWS or GCP delete our account, we can restore the data from the other provider.
72+
73+
## Implementation details
74+
75+
The account where we store the backup is called `rust-backup`. It contains two GCP projects: `backup-prod` and `backup-staging`.
76+
Here we have one Google [Object Storage](https://cloud.google.com/storage?hl=en) in the `europe-west-1` (Belgium) region for the following AWS S3 buckets:
77+
78+
- `crates-io`. Cloudfront URL: `cloudfront-static.crates.io`. It contains the crates published by the Rust community.
79+
- `static-rust-lang-org`. Cloudfront Url: `cloudfront-static.rust-lang.org`. Among other things, it contains the Rust releases.
80+
81+
The [storage class](https://cloud.google.com/storage/docs/storage-classes) is set to "archive" for both buckets. This is the cheapest class for infrequent access.
82+
83+
We use [Storage Transfer](https://cloud.google.com/storage-transfer/docs/overview) to automatically transfer the content of the s3 bucket into the Google Object Storage.
84+
This is a service managed by Google. We'll use it to download the S3 buckets from cloudfront to perform a daily incremental transfer. The transfers only move files that are new, updated, or deleted since the last transfer, minimizing the amount of data that needs to be transferred.
85+
86+
### Monitoring 🕵️
87+
88+
To check that the backups are working:
89+
90+
- Ensure the number of files and the size of the GCP buckets is the same as the respective AWS buckets by looking at the metrics
91+
- Ensure that only the authorized people have access to the account
92+
93+
You can also run the following test:
94+
95+
- Upload a file in an AWS S3 bucket and check that it appears in GCP.
96+
- Edit the file in AWS and check that you can recover the previous version from GCP.
97+
- Delete the in AWS and check that you can recover all previous versions from GCP.
98+
99+
In the future, we might want to create alarts in Datadog to monitor if the transfer job fails.
100+
101+
### FAQ 🤔
102+
103+
#### Do we need a multi-region backup for the object storage?
104+
105+
No. [Multi-region](https://cloud.google.com/storage/docs/availability-durability#cross-region-redundancy) only helps if we want to serve this data real-time and we want to have a fallback mechanism if a GCP region fails. We just need this object storage for backup purposes, so we don't need to pay more 👍
106+
107+
#### Why did you choose the `us-west-1` GCP region?
108+
109+
It's the same region where the AWS S3 buckets are located. In this way, we reduce the latency of the transfer job.
110+
Also, the cost calculator indicates that this regions has a "Low CO2" and it's among the cheapest regions.
111+
112+
#### Why GCP?
113+
114+
Both the Rust Foundation and the Rust project have a good working relationship with Google, and it is where the Rust Foundation's Security Initiative hosts its infrastructure.
115+
Due to the good collaboration with Google, we expect that we can cover the costs of the backup with credits provided by Google.
116+
117+
[infra-admins]: https://github.com/rust-lang/team/blob/master/teams/infra-admins.toml
118+
[threat model]: https://docs.google.com/document/d/10Qlf8lk7VbpWhA0wHqJj4syYuUVr8rkGVM-k2qkb0QE

0 commit comments

Comments
 (0)