From b16ab330c57bd2507cdd4e922597f407fcdebcd2 Mon Sep 17 00:00:00 2001 From: MarcoIeni <11428655+MarcoIeni@users.noreply.github.com> Date: Tue, 20 Aug 2024 12:17:03 +0200 Subject: [PATCH 01/10] docs: document gcp backup --- service-catalog/gcp-backup/README.md | 132 +++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 service-catalog/gcp-backup/README.md diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/gcp-backup/README.md new file mode 100644 index 0000000..bbe12fa --- /dev/null +++ b/service-catalog/gcp-backup/README.md @@ -0,0 +1,132 @@ +# GCP backup + +## Summary + +In GCP (Google Cloud Platform) we keep offsite backups for both Rust releases and crates +to protect us against security threats that could involve losing crates or releases. +These threats were identified in the [threat model]. + +## Motivation + +While we have multiple measures in place to prevent accidental deletion of Rust releases or crates in AWS, +e.g. bucket replication to a different region and restricted access, our current setup does not sufficiently protect us against a few threats: + +1. _AWS Account compromise_. The [threat model] for Rust's infrastructure, created by the Rust Foundation's security engineer, highlights the risk of an AWS account compromise. + If a malicious actor was able to gain administrator access to the AWS account of one of the [infra-admins], + they could bypass a lot of safe guards and delete data. +2. _AWS Account deletion_. AWS could accidentally delete our account, resulting in the possible deletion of data and backups. + For example, something similar happened at [Google](https://arstechnica.com/gadgets/2024/05/google-cloud-accidentally-nukes-customer-account-causes-two-weeks-of-downtime/) recently. + +- To mitigate threat 1, the new backup needs to have separate admin access. +- To mitigate threat 2, the new backup needs to be in a separate cloud environment. + +## Implementation overview + +These new backups are hosted in a dedicated GCP account and have totally separate access controls compared to AWS. +Specifically, none of the current `infra-admins` have admin access to this separate environment to protect against an account compromise. +This GCP account is not used for anything else (just for backups). + +The backups are automatically copied daily by GCP. + +### Access ๐Ÿ‘ค + +We limit admin access to the GCP backups to two members of the Rust Foundation for the following reasons: + +- _ensure a strong separation of access_: as explained in the first [motivation](#Motivation), the GCP admins should be different from the AWS admins. + This means we can't give admin access to any of the `infra-admins`. +- _accountability_: the Rust Foundation employees are accountable for their actions, which means they can't delete the backups intentionally without breaking the law. + +People with admin access to the GCP account: + +- Joel Marcey (Director of Technology @ Rust Foundation) +- Walter Pearce (Security Engineer @ Rust Foundation) + +People with read-only access to the GCP project: + +- `infra-admins` + +The admin access of the GCP account is bound to the `@rustfoundation.org` Google Workspace account. +This means that if an employee leaves the Rust Foundation, they lose access to the GCP account. +In this case we need to add a new admin to the GCP account. + +> [!NOTE] +> The `infra-admins` team can have admin access to the GCP staging project if needed. + +### In case of emergency ๐Ÿงฏ + +- In case our data in AWS is deleted, the `infra-admin` team can restore it by: + - copying the data from GCP to AWS using the GCP read-only access. + - restoring the `crates-io-index` bucket from the `db-dump` stored in the `crates-io` bucket. Use [this](https://github.com/rust-lang/crates.io/blob/e0bb0049daa12f5362def463b04febd6c036d315/src/worker/jobs/git.rs#L19-L129) code. +- If the GCP synchronization mechanism breaks, the Infra team can raise a PR to fix the terraform configuration and a GCP admin can apply it. + +### New threat model ๐Ÿฆน + +To delete our data, an attacker would need to compromise both: + +- one AWS admin account (an `infra-admin`) +- one GCP admin account (Joel or Walter) + +This improves our security posture because compromising two accounts is harder than compromising one. + +The accidental account deletion is not a threat anymore because if either AWS or GCP delete our account, we can restore the data from the other provider. + +## Implementation details + +The account where we store the backup is called `rust-backup`. It contains two GCP projects: `backup-prod` and `backup-staging`. +Here we have one Google [Object Storage](https://cloud.google.com/storage?hl=en) in the `europe-west1` (Belgium) region for the following AWS S3 buckets: + +- `crates-io`. Cloudfront URL: `cloudfront-static.crates.io`. It contains the crates published by the Rust community. +- `static-rust-lang-org`. Cloudfront Url: `cloudfront-static.rust-lang.org`. Among other things, it contains the Rust releases. + +For the objects: + +- Set the [storage class](https://cloud.google.com/storage/docs/storage-classes) to "archive" for all buckets. + This is the cheapest class for infrequent access. +- Enable [object-versioning](https://cloud.google.com/storage/docs/object-versioning) and [soft-delete](https://cloud.google.com/storage/docs/soft-delete), + so that we can recover updates and deletes so that we can recover updates and deletes. + +We use [Storage Transfer](https://cloud.google.com/storage-transfer/docs/overview) to automatically transfer the content of the s3 bucket into the Google Object Storage. +This is a service managed by Google. We'll use it to download the S3 buckets from cloudfront to perform a daily incremental transfer. The transfers only move files that are new, updated, or deleted since the last transfer, minimizing the amount of data that needs to be transferred. + +### Monitoring ๐Ÿ•ต๏ธ + +To check that the backups are working: + +- Ensure the number of files and the size of the GCP buckets is the same as the respective AWS buckets by looking at the metrics +- Ensure that only the authorized people have access to the account + +You can also run the following test: + +- Upload a file in an AWS S3 bucket and check that it appears in GCP. +- Edit the file in AWS and check that you can recover the previous version from GCP. +- Delete the in AWS and check that you can recover all previous versions from GCP. + +In the future, we might want to create alerts in: + +- _Datadog_: to monitor if the transfer job fails. +- _Wiz_: to monitor if the access control changes. + +### Backup maintenance ๐Ÿงน + +If a crate version is deleted from the crates-io bucket (e.g. for GDPR reasons), an admin needs to delete it from the GCP bucket as well. +Even though the delete will propagate to GCP, the `soft-delete` feature will preserve the data, so we need to delete it manually. + +### FAQ ๐Ÿค” + +#### Do we need a multi-region backup for the object storage? + +No. [Multi-region](https://cloud.google.com/storage/docs/availability-durability#cross-region-redundancy) only helps if we want to serve this data real-time and we want to have a fallback mechanism if a GCP region fails. We just need this object storage for backup purposes, so we don't need to pay more ๐Ÿ‘ + +#### Why did you choose the `europe-west1` GCP region? + +It's far from the `us-west-1` region where the AWS S3 buckets are located. This protects us from geographical disasters. +The con is that the latency of the transfer job is higher when compared to a region in the US. +Also, the cost calculator indicates that this regions has a "Low CO2" and it's among the cheapest regions. + +#### Why GCP? + +Both the Rust Foundation and the Rust project have a good working relationship with Google, and it is where the Rust Foundation's Security Initiative hosts its infrastructure. +Due to the good collaboration with Google, we expect that we can cover the costs of the backup with credits provided by Google. + +[infra-admins]: https://github.com/rust-lang/team/blob/master/teams/infra-admins.toml +[threat model]: https://docs.google.com/document/d/10Qlf8lk7VbpWhA0wHqJj4syYuUVr8rkGVM-k2qkb0QE From 9d9ca4c88a74ba8dd8c456d26982ed4a13cce660 Mon Sep 17 00:00:00 2001 From: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> Date: Wed, 25 Sep 2024 15:18:08 +0200 Subject: [PATCH 02/10] Update service-catalog/gcp-backup/README.md Co-authored-by: Jan David --- service-catalog/gcp-backup/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/gcp-backup/README.md index bbe12fa..2989e2d 100644 --- a/service-catalog/gcp-backup/README.md +++ b/service-catalog/gcp-backup/README.md @@ -4,7 +4,7 @@ In GCP (Google Cloud Platform) we keep offsite backups for both Rust releases and crates to protect us against security threats that could involve losing crates or releases. -These threats were identified in the [threat model]. +These threats were identified in a [threat model] for the project's infrastructure, created by the Rust Foundation's security engineer Walter. ## Motivation From 89405a1e3bec407c67ec5dd88b065f5301055346 Mon Sep 17 00:00:00 2001 From: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> Date: Wed, 25 Sep 2024 15:18:28 +0200 Subject: [PATCH 03/10] Update service-catalog/gcp-backup/README.md Co-authored-by: Jan David --- service-catalog/gcp-backup/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/gcp-backup/README.md index 2989e2d..f30d086 100644 --- a/service-catalog/gcp-backup/README.md +++ b/service-catalog/gcp-backup/README.md @@ -11,7 +11,7 @@ These threats were identified in a [threat model] for the project's infrastructu While we have multiple measures in place to prevent accidental deletion of Rust releases or crates in AWS, e.g. bucket replication to a different region and restricted access, our current setup does not sufficiently protect us against a few threats: -1. _AWS Account compromise_. The [threat model] for Rust's infrastructure, created by the Rust Foundation's security engineer, highlights the risk of an AWS account compromise. +1. _AWS Account compromise_. The [threat model] highlights the risk of an AWS account compromise. If a malicious actor was able to gain administrator access to the AWS account of one of the [infra-admins], they could bypass a lot of safe guards and delete data. 2. _AWS Account deletion_. AWS could accidentally delete our account, resulting in the possible deletion of data and backups. From 35909e81ad40d35adb2d1cdea386d254cca0ef65 Mon Sep 17 00:00:00 2001 From: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> Date: Wed, 25 Sep 2024 15:18:48 +0200 Subject: [PATCH 04/10] Update service-catalog/gcp-backup/README.md Co-authored-by: Jan David --- service-catalog/gcp-backup/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/gcp-backup/README.md index f30d086..39b6e61 100644 --- a/service-catalog/gcp-backup/README.md +++ b/service-catalog/gcp-backup/README.md @@ -15,7 +15,7 @@ e.g. bucket replication to a different region and restricted access, our current If a malicious actor was able to gain administrator access to the AWS account of one of the [infra-admins], they could bypass a lot of safe guards and delete data. 2. _AWS Account deletion_. AWS could accidentally delete our account, resulting in the possible deletion of data and backups. - For example, something similar happened at [Google](https://arstechnica.com/gadgets/2024/05/google-cloud-accidentally-nukes-customer-account-causes-two-weeks-of-downtime/) recently. + Something similar happened to a customer on [GCP](https://arstechnica.com/gadgets/2024/05/google-cloud-accidentally-nukes-customer-account-causes-two-weeks-of-downtime/) in 2024. - To mitigate threat 1, the new backup needs to have separate admin access. - To mitigate threat 2, the new backup needs to be in a separate cloud environment. From 1df769ff43a48d75ae85228bcb56268266c59be6 Mon Sep 17 00:00:00 2001 From: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> Date: Wed, 25 Sep 2024 15:19:11 +0200 Subject: [PATCH 05/10] Update service-catalog/gcp-backup/README.md Co-authored-by: Jan David --- service-catalog/gcp-backup/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/gcp-backup/README.md index 39b6e61..a08ddde 100644 --- a/service-catalog/gcp-backup/README.md +++ b/service-catalog/gcp-backup/README.md @@ -34,7 +34,7 @@ We limit admin access to the GCP backups to two members of the Rust Foundation f - _ensure a strong separation of access_: as explained in the first [motivation](#Motivation), the GCP admins should be different from the AWS admins. This means we can't give admin access to any of the `infra-admins`. -- _accountability_: the Rust Foundation employees are accountable for their actions, which means they can't delete the backups intentionally without breaking the law. +- _accountability_: The Rust Foundation employees have signed an employment contract and can be legally liable for malicious actions. People with admin access to the GCP account: From 868a79b9bfa298cc35e8880b664e18d4d2e28391 Mon Sep 17 00:00:00 2001 From: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> Date: Wed, 25 Sep 2024 15:19:20 +0200 Subject: [PATCH 06/10] Update service-catalog/gcp-backup/README.md Co-authored-by: Jan David --- service-catalog/gcp-backup/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/gcp-backup/README.md index a08ddde..a889d42 100644 --- a/service-catalog/gcp-backup/README.md +++ b/service-catalog/gcp-backup/README.md @@ -83,7 +83,7 @@ For the objects: - Set the [storage class](https://cloud.google.com/storage/docs/storage-classes) to "archive" for all buckets. This is the cheapest class for infrequent access. - Enable [object-versioning](https://cloud.google.com/storage/docs/object-versioning) and [soft-delete](https://cloud.google.com/storage/docs/soft-delete), - so that we can recover updates and deletes so that we can recover updates and deletes. + so that we can recover updates and deletes. We use [Storage Transfer](https://cloud.google.com/storage-transfer/docs/overview) to automatically transfer the content of the s3 bucket into the Google Object Storage. This is a service managed by Google. We'll use it to download the S3 buckets from cloudfront to perform a daily incremental transfer. The transfers only move files that are new, updated, or deleted since the last transfer, minimizing the amount of data that needs to be transferred. From 82e31351f76a41b7ae3b5646a36ca26f3105ee53 Mon Sep 17 00:00:00 2001 From: MarcoIeni <11428655+MarcoIeni@users.noreply.github.com> Date: Thu, 26 Sep 2024 14:58:41 +0200 Subject: [PATCH 07/10] spell "Infrastructure" --- service-catalog/gcp-backup/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/gcp-backup/README.md index a889d42..2c377a4 100644 --- a/service-catalog/gcp-backup/README.md +++ b/service-catalog/gcp-backup/README.md @@ -57,7 +57,7 @@ In this case we need to add a new admin to the GCP account. - In case our data in AWS is deleted, the `infra-admin` team can restore it by: - copying the data from GCP to AWS using the GCP read-only access. - restoring the `crates-io-index` bucket from the `db-dump` stored in the `crates-io` bucket. Use [this](https://github.com/rust-lang/crates.io/blob/e0bb0049daa12f5362def463b04febd6c036d315/src/worker/jobs/git.rs#L19-L129) code. -- If the GCP synchronization mechanism breaks, the Infra team can raise a PR to fix the terraform configuration and a GCP admin can apply it. +- If the GCP synchronization mechanism breaks, the Infrastructure team can raise a PR to fix the terraform configuration and a GCP admin can apply it. ### New threat model ๐Ÿฆน From 1000b25fe80c7d3884704d9a339ad80538b3968f Mon Sep 17 00:00:00 2001 From: MarcoIeni <11428655+MarcoIeni@users.noreply.github.com> Date: Thu, 26 Sep 2024 15:00:25 +0200 Subject: [PATCH 08/10] spell products correctly --- service-catalog/gcp-backup/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/gcp-backup/README.md index 2c377a4..7dc2bad 100644 --- a/service-catalog/gcp-backup/README.md +++ b/service-catalog/gcp-backup/README.md @@ -57,7 +57,7 @@ In this case we need to add a new admin to the GCP account. - In case our data in AWS is deleted, the `infra-admin` team can restore it by: - copying the data from GCP to AWS using the GCP read-only access. - restoring the `crates-io-index` bucket from the `db-dump` stored in the `crates-io` bucket. Use [this](https://github.com/rust-lang/crates.io/blob/e0bb0049daa12f5362def463b04febd6c036d315/src/worker/jobs/git.rs#L19-L129) code. -- If the GCP synchronization mechanism breaks, the Infrastructure team can raise a PR to fix the terraform configuration and a GCP admin can apply it. +- If the GCP synchronization mechanism breaks, the Infrastructure team can raise a PR to fix the Terraform configuration and a GCP admin can apply it. ### New threat model ๐Ÿฆน @@ -75,8 +75,8 @@ The accidental account deletion is not a threat anymore because if either AWS or The account where we store the backup is called `rust-backup`. It contains two GCP projects: `backup-prod` and `backup-staging`. Here we have one Google [Object Storage](https://cloud.google.com/storage?hl=en) in the `europe-west1` (Belgium) region for the following AWS S3 buckets: -- `crates-io`. Cloudfront URL: `cloudfront-static.crates.io`. It contains the crates published by the Rust community. -- `static-rust-lang-org`. Cloudfront Url: `cloudfront-static.rust-lang.org`. Among other things, it contains the Rust releases. +- `crates-io`. CloudFront URL: `cloudfront-static.crates.io`. It contains the crates published by the Rust community. +- `static-rust-lang-org`. CloudFront Url: `cloudfront-static.rust-lang.org`. Among other things, it contains the Rust releases. For the objects: @@ -86,7 +86,7 @@ For the objects: so that we can recover updates and deletes. We use [Storage Transfer](https://cloud.google.com/storage-transfer/docs/overview) to automatically transfer the content of the s3 bucket into the Google Object Storage. -This is a service managed by Google. We'll use it to download the S3 buckets from cloudfront to perform a daily incremental transfer. The transfers only move files that are new, updated, or deleted since the last transfer, minimizing the amount of data that needs to be transferred. +This is a service managed by Google. We'll use it to download the S3 buckets from CloudFront to perform a daily incremental transfer. The transfers only move files that are new, updated, or deleted since the last transfer, minimizing the amount of data that needs to be transferred. ### Monitoring ๐Ÿ•ต๏ธ From aebc9e65be81aaa44cbdc84811f4bea7de75cb52 Mon Sep 17 00:00:00 2001 From: MarcoIeni <11428655+MarcoIeni@users.noreply.github.com> Date: Thu, 26 Sep 2024 15:02:40 +0200 Subject: [PATCH 09/10] rename --- service-catalog/{gcp-backup => rust-assets-backup}/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename service-catalog/{gcp-backup => rust-assets-backup}/README.md (99%) diff --git a/service-catalog/gcp-backup/README.md b/service-catalog/rust-assets-backup/README.md similarity index 99% rename from service-catalog/gcp-backup/README.md rename to service-catalog/rust-assets-backup/README.md index 7dc2bad..84c2a43 100644 --- a/service-catalog/gcp-backup/README.md +++ b/service-catalog/rust-assets-backup/README.md @@ -1,4 +1,4 @@ -# GCP backup +# Rust assets backup ## Summary From d882010692199d325a653d150fec82b6da4406c3 Mon Sep 17 00:00:00 2001 From: MarcoIeni <11428655+MarcoIeni@users.noreply.github.com> Date: Thu, 26 Sep 2024 15:58:05 +0200 Subject: [PATCH 10/10] extract faq and maintenance --- service-catalog/rust-assets-backup/README.md | 44 +++---------------- service-catalog/rust-assets-backup/faq.md | 16 +++++++ .../rust-assets-backup/maintenance.md | 24 ++++++++++ 3 files changed, 46 insertions(+), 38 deletions(-) create mode 100644 service-catalog/rust-assets-backup/faq.md create mode 100644 service-catalog/rust-assets-backup/maintenance.md diff --git a/service-catalog/rust-assets-backup/README.md b/service-catalog/rust-assets-backup/README.md index 84c2a43..2a14e04 100644 --- a/service-catalog/rust-assets-backup/README.md +++ b/service-catalog/rust-assets-backup/README.md @@ -80,53 +80,21 @@ Here we have one Google [Object Storage](https://cloud.google.com/storage?hl=en) For the objects: -- Set the [storage class](https://cloud.google.com/storage/docs/storage-classes) to "archive" for all buckets. +- The [storage class](https://cloud.google.com/storage/docs/storage-classes) is set to "archive" for all buckets. This is the cheapest class for infrequent access. -- Enable [object-versioning](https://cloud.google.com/storage/docs/object-versioning) and [soft-delete](https://cloud.google.com/storage/docs/soft-delete), +- [object-versioning](https://cloud.google.com/storage/docs/object-versioning) and [soft-delete](https://cloud.google.com/storage/docs/soft-delete) are enabled, so that we can recover updates and deletes. We use [Storage Transfer](https://cloud.google.com/storage-transfer/docs/overview) to automatically transfer the content of the s3 bucket into the Google Object Storage. This is a service managed by Google. We'll use it to download the S3 buckets from CloudFront to perform a daily incremental transfer. The transfers only move files that are new, updated, or deleted since the last transfer, minimizing the amount of data that needs to be transferred. -### Monitoring ๐Ÿ•ต๏ธ +## Explanations -To check that the backups are working: +- [FAQ](./faq.md) -- Ensure the number of files and the size of the GCP buckets is the same as the respective AWS buckets by looking at the metrics -- Ensure that only the authorized people have access to the account +## How-to Guides -You can also run the following test: - -- Upload a file in an AWS S3 bucket and check that it appears in GCP. -- Edit the file in AWS and check that you can recover the previous version from GCP. -- Delete the in AWS and check that you can recover all previous versions from GCP. - -In the future, we might want to create alerts in: - -- _Datadog_: to monitor if the transfer job fails. -- _Wiz_: to monitor if the access control changes. - -### Backup maintenance ๐Ÿงน - -If a crate version is deleted from the crates-io bucket (e.g. for GDPR reasons), an admin needs to delete it from the GCP bucket as well. -Even though the delete will propagate to GCP, the `soft-delete` feature will preserve the data, so we need to delete it manually. - -### FAQ ๐Ÿค” - -#### Do we need a multi-region backup for the object storage? - -No. [Multi-region](https://cloud.google.com/storage/docs/availability-durability#cross-region-redundancy) only helps if we want to serve this data real-time and we want to have a fallback mechanism if a GCP region fails. We just need this object storage for backup purposes, so we don't need to pay more ๐Ÿ‘ - -#### Why did you choose the `europe-west1` GCP region? - -It's far from the `us-west-1` region where the AWS S3 buckets are located. This protects us from geographical disasters. -The con is that the latency of the transfer job is higher when compared to a region in the US. -Also, the cost calculator indicates that this regions has a "Low CO2" and it's among the cheapest regions. - -#### Why GCP? - -Both the Rust Foundation and the Rust project have a good working relationship with Google, and it is where the Rust Foundation's Security Initiative hosts its infrastructure. -Due to the good collaboration with Google, we expect that we can cover the costs of the backup with credits provided by Google. +- [Maintenance](./maintenance.md) [infra-admins]: https://github.com/rust-lang/team/blob/master/teams/infra-admins.toml [threat model]: https://docs.google.com/document/d/10Qlf8lk7VbpWhA0wHqJj4syYuUVr8rkGVM-k2qkb0QE diff --git a/service-catalog/rust-assets-backup/faq.md b/service-catalog/rust-assets-backup/faq.md new file mode 100644 index 0000000..5fb1589 --- /dev/null +++ b/service-catalog/rust-assets-backup/faq.md @@ -0,0 +1,16 @@ +# Rust Assets Backup: FAQ + +## Do we need a multi-region backup for the object storage? + +No. [Multi-region](https://cloud.google.com/storage/docs/availability-durability#cross-region-redundancy) only helps if we want to serve this data real-time and we want to have a fallback mechanism if a GCP region fails. We just need this object storage for backup purposes, so we don't need to pay more ๐Ÿ‘ + +## Why did you choose the `europe-west1` GCP region? + +It's far from the `us-west-1` region where the AWS S3 buckets are located. This protects us from geographical disasters. +The con is that the latency of the transfer job is higher when compared to a region in the US. +Also, the cost calculator indicates that this regions has a "Low CO2" and it's among the cheapest regions. + +## Why GCP? + +Both the Rust Foundation and the Rust project have a good working relationship with Google, and it is where the Rust Foundation's Security Initiative hosts its infrastructure. +Due to the good collaboration with Google, we expect that we can cover the costs of the backup with credits provided by Google. diff --git a/service-catalog/rust-assets-backup/maintenance.md b/service-catalog/rust-assets-backup/maintenance.md new file mode 100644 index 0000000..4c49d87 --- /dev/null +++ b/service-catalog/rust-assets-backup/maintenance.md @@ -0,0 +1,24 @@ +# Rust Assets Backup: Maintenance + +## Monitoring ๐Ÿ•ต๏ธ + +To check that the backups are working: + +- Ensure the number of files and the size of the GCP buckets is the same as the respective AWS buckets by looking at the metrics +- Ensure that only the authorized people have access to the account + +You can also run the following test: + +- Upload a file in an AWS S3 bucket and check that it appears in GCP. +- Edit the file in AWS and check that you can recover the previous version from GCP. +- Delete the in AWS and check that you can recover all previous versions from GCP. + +In the future, we might want to create alerts in: + +- _Datadog_: to monitor if the transfer job fails. +- _Wiz_: to monitor if the access control changes. + +## Backup maintenance ๐Ÿงน + +If a crate version is deleted from the crates-io bucket (e.g. for GDPR reasons), an admin needs to delete it from the GCP bucket as well. +Even though the delete will propagate to GCP, the `soft-delete` feature will preserve the data, so we need to delete it manually.