[FLINK-32344][connectors/mongodb] Support unbounded streaming read via change stream feature. #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.
Change streams are available for replica sets and sharded clusters.
We can use MongoDB change streams feature to support unbounded streaming read for mongodb connector.
Startup Mode
We can determine whether the source runs in bounded or unbounded mode by setting the
scan.startup.mode
configuration.Changelog Mode
UPSERT
Before mongodb version 6.0, pre and post images were not saved in the oplog.
This means that we cannot directly obtain complete pre and post changed record to generate ALL mode changelog.
By default, change streams only return the delta of fields during the update operation. However, we can configure the change stream to return the most current majority-committed version of the updated document to generate UPSERT mode changelog by update lookup feature.
Before mongodb 6.0, we can set the 'change-stream.full-document.strategy' as 'update-lookup' , which is also the default.
However, update lookup will bring additional query time overhead. And the changelog in UPSERT mode will have an additional changelog normalize operator, which will continuously increase the state of the task. So we have to consider using an external state store to reduce memory pressure when using it, such as rockdb backend.
ALL
Starting in MongoDB 6.0, we can can use change stream events pre and post images feature to output the version of a document before and after changes to generate ALL mode changelog.
We can enable pre and post images of a collection by create or collMod commands:
Then we can set the 'change-stream.full-document.strategy' as 'pre-and-post-images' to generate ALL mode changelog.