-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clickhouse/sink: Add Exactly-Once Semantics to ClickHouse Target #146
Comments
The mentioned transactions support in Clickhouse has been in an early beta state for more than a year, and it is not expected to be ready for production in cluster env for an observable time. So it should not be the base for Exactly once functionality. The approach based on idempotent inserts and block deduplication is pretty solid and used in tools like Clickhouse Kafka Sync Connector (java) for the same purpose. Moreover, if you use Keeper, it would be the best clusterized transactional store for source table offset instead of a source DB table or s3 basket. Universal between different sources connectors, too. For Exactly Once, it is better to store offsets on dest DB, not sourceDB. The KeeperMap Engine allows you to manage the data without a separate Keeper connection by standard SQL commands like SELECT/INSERT/ALTER. The setting keeper_map_strict_mode=1 makes an UPDATE process transactional. |
Yes, storing offset in destination DB inside keeper table (see here), this would allow to utilize feature without waiting for tx-support in clickhouse itself. |
Great! I still propose using keeper_map_strict_mode=1 for better atomicity of operations. |
Add Exactly-Once Semantics to ClickHouse Target
Feature Request
Implement exactly-once delivery semantics for the ClickHouse target in the Transfer project. This ensures that data processed by the Transfer pipeline is delivered to ClickHouse without duplication or loss, even in the presence of failures or retries.
Motivation
ClickHouse is widely used for analytics workloads, where data consistency and accuracy are critical. Supporting exactly-once semantics in the ClickHouse target will:
Proposed Approach
Adopt techniques similar to the Kafka Connect Exactly-Once Delivery model:
_partition
and_offset
columns to track processed records.Testing
Add test cases to verify exactly-once semantics:
References
Additional Notes
KeeperMap
table engine.The text was updated successfully, but these errors were encountered: