Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom #3856

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

beryllw
Copy link
Contributor

@beryllw beryllw commented Jan 14, 2025

Add ending split chunk first to avoid TaskManager oom

@beryllw beryllw changed the title Add ending split chunk first to avoid TaskManager oom [FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom Jan 14, 2025
@github-actions github-actions bot added the base label Jan 15, 2025
@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

CI:https://github.com/beryllw/flink-cdc/actions/runs/12779452404
@leonardBang Could you please help me retrigger the CI?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

Adding the ending split chunk first changes the snapshot read result order. I will fix the unit test.

@leonardBang
Copy link
Contributor

THanks @beryllw for the contribution, the idea makes sense to me, but we need to consider the compatibility, a configuration is recommended

@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

THanks @beryllw for the contribution, the idea makes sense to me, but we need to consider the compatibility, a configuration is recommended

If data consistency is not an issue, are there any compatibility concerns we need to address?

@lvyanquan
Copy link
Contributor

What‘s more, could you add this optimization to MongoDBChunkSplitter?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 16, 2025

What‘s more, could you add this optimization to MongoDBChunkSplitter?

Sure, i will check MongoDBChunkSplitter.

@beryllw
Copy link
Contributor Author

beryllw commented Jan 17, 2025

What‘s more, could you add this optimization to MongoDBChunkSplitter?

#3704 (comment)

I overlooked this issue #3704, good idea. Maybe support AssignStrategy in base module is better? Do we implement this in a future PR or in this PR?

@lvyanquan
Copy link
Contributor

lvyanquan commented Jan 17, 2025

I missed this issue #3704, good idea. Maybe support AssignStrategy in base module is better? Do we implement this in a future PR or in this PR?

My previous PR provided both ascending_order and descending_order order configurations, but I think it would be better to place both start and end unbounded chunks at the beginning, so you can go ahead.

I think if there is no impact on restarting from the previous state, there is no compatibility issue, but adding a parameter to control it would be safer.

@github-actions github-actions bot added the docs Improvements or additions to documentation label Jan 17, 2025
@lvyanquan
Copy link
Contributor

I think adding a test in MySqlSourceITCase to test the case of restoring from failure will be better.

@beryllw
Copy link
Contributor Author

beryllw commented Jan 20, 2025

I think adding a test in MySqlSourceITCase to test the case of restoring from failure will be better.

agree.

@beryllw beryllw requested a review from lvyanquan January 20, 2025 08:57
@lvyanquan
Copy link
Contributor

lvyanquan commented Jan 22, 2025

LGTM.
I'm worried if this parameter is a bit complicated, what about naming it scan.incremental.assign-max-chunk-first.enabled to avoid using ending? As chunk is a concept in snapshot phase.
WDYT @beryllw @leonardBang?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 23, 2025

LGTM. I'm worried if this parameter is a bit complicated, what about naming it scan.incremental.assign-max-chunk-first.enabled to avoid using ending? As chunk is a concept in snapshot phase. WDYT @beryllw @leonardBang?

What about scan.incremental.snapshot.assign-ending-chunk-first.enabled, when the write throughput is low, the ending chunk is not equal to the maximum chunk.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 7, 2025

@leonardBang @lvyanquan PTAL

@lvyanquan
Copy link
Contributor

Thanks @beryllw for this update, since you've added this feature to jdbc common module, it seems like that this config option was not added in DynamicTableSourceFactory and was not passed to Source.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 11, 2025

Thanks @beryllw for this update, since you've added this feature to jdbc common module, it seems like that this config option was not added in DynamicTableSourceFactory and was not passed to Source.

Thanks for pointing out the error, fixed. Awaiting CI.
https://github.com/beryllw/flink-cdc/actions/runs/13256583939?pr=7

@lvyanquan
Copy link
Contributor

Hi, @beryllw.
What I want to point out is that this option was not included in optionalOptions of PostgreSQLTableFactory, and was also not used in createDynamicTableSource, So it didn't take effect in SQL.

Do you want to support it all here, or do you want to split the modification of JDBC base into another PR.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 11, 2025

Do you want to support it all here, or do you want to split the modification of JDBC base into another PR.

Support it all here is better.Thanks.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 11, 2025

@beryllw
Copy link
Contributor Author

beryllw commented Feb 12, 2025

@beryllw beryllw requested a review from lvyanquan February 12, 2025 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants