-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(kafka-deduplicator): add checksum to checkpoint tracking, handle incremental retention #39917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 files reviewed, no comments
11dc521 to
16581bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, tiny number of nitpicks. Test should be fixed before merged as they have a small logic error. Approving preemptively though
aecfb8e to
0508c01
Compare
Problem
In order to track all the files we need to reconstitute a complete RocksDB store for reimport while minimizing uploads to remote storage across many incremental checkpoints, we'll need to include file checksums and use them in the decision to upload files present in each incremental checkpoint. The rules are different for SST vs. non-SST store file types.
Changes
How did you test this code?
Locally and in CI
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Changelog: (features only) Is this feature complete?
No update required