Skip to content

Use R2 for distillation training data#35

Merged
ravwojdyla merged 1 commit intomainfrom
rav-read-from-r2
Jul 15, 2025
Merged

Use R2 for distillation training data#35
ravwojdyla merged 1 commit intomainfrom
rav-read-from-r2

Conversation

@ravwojdyla
Copy link
Copy Markdown
Contributor

Use Cloudflare R2 object store to avoid egress charges from GCS.

Some more context: we had the distillation training data on GCS bucket (in us-central1), it's about ~150GB of data (mostly the logits). We've used Colab, Lambda and Vast for GPU rental, in most cases we can't control whether the VM is in GCP us-central1 region, which led to some unexpected egress charges. To avoid the GCS egress charges, we will use R2 object store, which has no egress charges 1.

Footnotes

  1. this doesn't mean that reading from R2 is free, reading operations are still tallied up as class B operations and charged after a free tier, doc.

@ravwojdyla ravwojdyla requested a review from yonromai July 15, 2025 22:35
Copy link
Copy Markdown
Contributor

@yonromai yonromai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@ravwojdyla ravwojdyla merged commit a437c6e into main Jul 15, 2025
2 checks passed
@ravwojdyla ravwojdyla deleted the rav-read-from-r2 branch July 15, 2025 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants