Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cio optimization #2550

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open

feat: cio optimization #2550

wants to merge 49 commits into from

Conversation

sshanzel
Copy link
Member

@sshanzel sshanzel commented Dec 17, 2024

CIO optimization by removing/registering users based on their current activity, along with downgrading their digest if we need to.

Once we've agreed on the processing code-wise, I will start working on the rollout plan, as it should basically use the same processing.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@@ -9,7 +9,7 @@ debezium.source.database.password=%database_pass%
debezium.source.database.dbname=%database_dbname%
debezium.source.database.server.name=api
debezium.source.table.include.list=public.comment,public.user_comment,public.comment_mention,public.source_request,public.post,public.user,public.post_report,public.source_feed,public.settings,public.reputation_event,public.submission,public.user_state,public.notification_v2,public.source_member,public.feature,public.source,public.post_mention,public.content_image,public.comment_report,public.user_post,public.banner,public.post_relation,public.marketing_cta,public.squad_public_request,public.user_streak,public.bookmark,public.user_company,public.source_report,public.user_top_reader,public.source_post_moderation
debezium.source.column.exclude.list=public.post.tsv,public.post.placeholder,public.source.flags,public.user_top_reader.image
debezium.source.column.exclude.list=public.post.tsv,public.post.placeholder,public.source.flags,public.user_top_reader.image,public.user.cioRegistered,public.user.acceptedMarketing,public.user.followingEmail,public.user.notificationEmail
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to track changes from any of these columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure though?
For example acceptedMarketing should trigger userUpdated no? to sync them to CIO (when they not in your flow)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. We have these props in CIO, they should definitely be synced. Let me check the worker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should not add these columns, but only when running the initial rollout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please note, this list doesn't exclude these columns from triggering cdc but from being sent in the debezium payload. To ignore them you will need to add a filter, it's more complex process but you can find examples in this file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does, didn't we fix that issue a while back by finally updating debezium?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an issue just misunderstanding of what this config does from our side. But it's better to check with adhoc infra any change in debezium props

src/cio.ts Outdated
user: ChangeObject<User>,
) => {
const { id } = user;
const changed = JSON.parse(JSON.stringify(user));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we have a util function that can turn regular objects into ChangeObject type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structuredClone maybe?

Copy link
Member Author

@sshanzel sshanzel Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Yeah, I think that's better than json-parse-stringify.

Edit: introduced a function instead that makes the same parse and stringify of our objects to pubsub. Structured clone returns Date object still for which we need either a string or number, etc.

This comment has been minimized.

@sshanzel sshanzel marked this pull request as ready for review December 18, 2024 14:38
@sshanzel sshanzel requested review from capJavert and a team as code owners December 18, 2024 14:38

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@@ -28,4 +28,5 @@ debezium.transforms.ReadOperationFilter.condition=!(valueSchema.field('op') && v
debezium.transforms.PostsFilter.type=io.debezium.transforms.Filter
debezium.transforms.PostsFilter.language=jsr223.groovy
debezium.transforms.PostsFilter.condition=!(valueSchema.field('op') && value.op == 'u' && value.source.table == 'post' && value.before.views != value.after.views)
# debezium.transforms.PostsFilter.condition=!(valueSchema.field('op') && value.op == 'u' && value.source.table == 'post' && value.before.views != value.after.views) && !(valueSchema.field('op') && value.op == 'u' && value.source.table == 'user' && value.before.cioRegistered != value.after.cioRegistered && !value.after.cioRegistered)
Copy link
Member

@idoshamun idoshamun Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactivation is still one-by-one based on this filter right?
We is it commented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we are running it also in batches to avoid the rate-limiting as the users can reach up to thousands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run in batches, we definitely need to uncomment this line, don't we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes for that part. This is in part of the preparation, so we can merge this PR once done, and when the initial rollout is about to start, we will just uncomment this line. I will mention this in the confluence page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we need this to be uncommented for production as well, not just initial rollout. Don't we? We don't rely on CDC anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Yes, moving forward, we also won't have to track the changes for the column. In that case, then yeah, we can always have it ignored.

src/common/mailing.ts Outdated Show resolved Hide resolved

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Member

@idoshamun idoshamun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't commit the initial rollout csv yet. we need to do it only when we're ready

@sshanzel
Copy link
Member Author

sshanzel commented Jan 2, 2025

we can't commit the initial rollout csv yet. we need to do it only when we're ready

Yes, this was just a sample. Just 1k records to test it.

@sshanzel
Copy link
Member Author

sshanzel commented Jan 2, 2025

I will add more details in the rollout plan confluence page on how to proceed.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@sshanzel sshanzel requested a review from idoshamun January 2, 2025 09:19
Copy link
Member

@idoshamun idoshamun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but let's not merge until we meet to do the initial rollout

Copy link
Contributor

@rebelchris rebelchris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, just monitor very closely I guess.

This comment has been minimized.

Copy link

pulumi bot commented Jan 3, 2025

🍹 The Update (preview) for dailydotdev/api/prod was successful.

Resource Changes

    Name                                            Type                           Operation
~   vpc-native-personalized-digest-deployment       kubernetes:apps/v1:Deployment  update
~   vpc-native-generate-search-invites-cron         kubernetes:batch/v1:CronJob    update
~   vpc-native-ws-deployment                        kubernetes:apps/v1:Deployment  update
-   vpc-native-api-migration-64eff7cd               kubernetes:batch/v1:Job        delete
+-  vpc-native-debezium-props                       kubernetes:core/v1:Secret      create-replacement
~   vpc-native-private-deployment                   kubernetes:apps/v1:Deployment  update
~   vpc-native-check-analytics-report-cron          kubernetes:batch/v1:CronJob    update
~   vpc-native-deployment                           kubernetes:apps/v1:Deployment  update
~   vpc-native-personalized-digest-cron             kubernetes:batch/v1:CronJob    update
~   vpc-native-update-trending-cron                 kubernetes:batch/v1:CronJob    update
~   vpc-native-update-highlighted-views-cron        kubernetes:batch/v1:CronJob    update
+-  vpc-native-debezium-deployment                  kubernetes:apps/v1:Deployment  create-replacement
~   vpc-native-update-tag-recommendations-cron      kubernetes:batch/v1:CronJob    update
~   vpc-native-temporal-deployment                  kubernetes:apps/v1:Deployment  update
~   vpc-native-generic-referral-reminder-cron       kubernetes:batch/v1:CronJob    update
~   vpc-native-bg-deployment                        kubernetes:apps/v1:Deployment  update
~   vpc-native-clean-zombie-users-cron              kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-images-cron             kubernetes:batch/v1:CronJob    update
~   vpc-native-update-source-public-threshold-cron  kubernetes:batch/v1:CronJob    update
~   vpc-native-sync-subscription-with-cio-cron      kubernetes:batch/v1:CronJob    update
~   vpc-native-update-views-cron                    kubernetes:batch/v1:CronJob    update
~   vpc-native-hourly-notification-cron             kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-user-companies-cron     kubernetes:batch/v1:CronJob    update
~   vpc-native-update-source-tag-view-cron          kubernetes:batch/v1:CronJob    update
~   vpc-native-update-current-streak-cron           kubernetes:batch/v1:CronJob    update
~   vpc-native-daily-digest-cron                    kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tags-str-cron                 kubernetes:batch/v1:CronJob    update
~   vpc-native-calculate-top-readers-cron           kubernetes:batch/v1:CronJob    update
+   vpc-native-api-migration-94455505               kubernetes:batch/v1:Job        create

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants