Skip to content

clickpipes: improve MySQL CDC documentation #4101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/integrations/data-ingestion/clickpipes/mysql/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,7 @@ You have several options to resolve these issues:
3. **Configure server certificate** - Update your server's SSL certificate to include all connection hostnames and use a trusted Certificate Authority.

4. **Skip certificate verification** - For self-hosted MySQL or MariaDB, whose default configurations provision a self-signed certificate we can't validate ([MySQL](https://dev.mysql.com/doc/refman/8.4/en/creating-ssl-rsa-files-using-mysql.html#creating-ssl-rsa-files-using-mysql-automatic), [MariaDB](https://mariadb.com/kb/en/securing-connections-for-client-and-server/#enabling-tls-for-mariadb-server)). Relying on this certificate encrypts the data in transit but runs the risk of server impersonation. We recommend properly signed certificates for production environments, but this option is useful for testing on a one-off instance or connecting to legacy infrastructure.

### Do you support schema changes? {#do-you-support-schema-changes}

Please refer to the [ClickPipes for MySQL: Schema Changes Propagation Support](./schema-changes) page for more information.
35 changes: 16 additions & 19 deletions docs/integrations/data-ingestion/clickpipes/mysql/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
sidebar_label: 'ClickPipes for MySQL'
sidebar_label: 'Ingesting Data from MySQL to ClickHouse'
description: 'Describes how to seamlessly connect your MySQL to ClickHouse Cloud.'
slug: /integrations/clickpipes/mysql
title: 'Ingesting Data from MySQL to ClickHouse (using CDC)'
title: 'Ingesting data from MySQL to ClickHouse (using CDC)'
---

import BetaBadge from '@theme/badges/BetaBadge';
Expand All @@ -15,20 +15,15 @@
import ch_permissions from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ch-permissions.jpg'
import Image from '@theme/IdealImage';

# Ingesting data from MySQL to ClickHouse using CDC
# Ingesting data from MySQL to ClickHouse (using CDC)

<BetaBadge/>

:::info
Currently, ingesting data from MySQL to ClickHouse Cloud via ClickPipes is in Private Preview.
:::


You can use ClickPipes to ingest data from your source MySQL database into ClickHouse Cloud. The source MySQL database can be hosted on-premises or in the cloud.
You can use ClickPipes to ingest data from your source MySQL database into ClickHouse Cloud. The source MySQL database can be hosted on-premises or in the cloud using services like Amazon RDS, Google Cloud SQL, and others.

## Prerequisites {#prerequisites}

To get started, you first need to make sure that your MySQL database is set up correctly. Depending on your source MySQL instance, you may follow any of the following guides:
To get started, you first need to ensure that your MySQL database is correctly configured for binlog replication. The configuration steps depend on how you're deploying MySQL, so please follow the relevant guide below:

Check notice on line 26 in docs/integrations/data-ingestion/clickpipes/mysql/index.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Wordy

Suggestion: Use 'please' only if we've inconvenienced the user.

1. [Amazon RDS MySQL](./mysql/source/rds)

Expand All @@ -44,7 +39,7 @@

Once your source MySQL database is set up, you can continue creating your ClickPipe.

## Create your ClickPipe {#creating-your-clickpipe}
## Create your ClickPipe {#create-your-clickpipe}

Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).

Expand All @@ -61,20 +56,18 @@

<Image img={mysql_tile} alt="Select MySQL" size="lg" border/>

### Add your source MySQL database connection {#adding-your-source-mysql-database-connection}
### Add your source MySQL database connection {#add-your-source-mysql-database-connection}

4. Fill in the connection details for your source MySQL database which you configured in the prerequisites step.

:::info

Before you start adding your connection details make sure that you have whitelisted ClickPipes IP addresses in your firewall rules. On the following page you can find a [list of ClickPipes IP addresses](../index.md#list-of-static-ips).
For more information refer to the source MySQL setup guides linked at [the top of this page](#prerequisites).

:::

<Image img={mysql_connection_details} alt="Fill in connection details" size="lg" border/>

#### (Optional) Set up SSH tunneling {#optional-setting-up-ssh-tunneling}
#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}

You can specify SSH tunneling details if your source MySQL database is not publicly accessible.

Expand All @@ -88,12 +81,10 @@
4. Click on "Verify Connection" to verify the connection.

:::note

Make sure to whitelist [ClickPipes IP addresses](../clickpipes#list-of-static-ips) in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel.

:::

Once the connection details are filled in, click on "Next".
Once the connection details are filled in, click `Next`.

#### Configure advanced settings {#advanced-settings}

Expand All @@ -106,7 +97,7 @@
- **Snapshot number of tables in parallel**: This is the number of tables that will be fetched in parallel during the initial snapshot. This is useful when you have a large number of tables and you want to control the number of tables fetched in parallel.


### Configure the tables {#configuring-the-tables}
### Configure the tables {#configure-the-tables}

5. Here you can select the destination database for your ClickPipe. You can either select an existing database or create a new one.

Expand All @@ -121,3 +112,9 @@
<Image img={ch_permissions} alt="Review permissions" size="lg" border/>

Finally, please refer to the ["ClickPipes for MySQL FAQ"](/integrations/clickpipes/mysql/faq) page for more information about common issues and how to resolve them.

## What's next? {#whats-next}

[//]: # "TODO Write a MySQL-specific migration guide and best practices similar to the existing one for PostgreSQL. The current migration guide points to the MySQL table engine, which is not ideal."

Once you've set up your ClickPipe to replicate data from MySQL to ClickHouse Cloud, you can focus on how to query and model your data for optimal performance. For common questions around MySQL CDC and troubleshooting, see the [MySQL FAQs page](/integrations/data-ingestion/clickpipes/mysql/faq.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: 'Schema Changes Propagation Support'
slug: /integrations/clickpipes/mysql/schema-changes
description: 'Page describing schema change types detectable by ClickPipes in the source tables'
---

ClickPipes for MySQL can detect schema changes in the source tables and, in some cases, automatically propagate the changes to the destination tables. The way each DDL operation is handled is documented below:

[//]: # "TODO Extend this page with behavior on rename, data type changes, and truncate + guidance on how to handle incompatible schema changes."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't currently propagate any of these. to alter columns they should add new column with new name & drop old column


| Schema Change Type | Behaviour |
| ----------------------------------------------------------------------------------- | ------------------------------------- |
| Adding a new column (`ALTER TABLE ADD COLUMN ...`) | Propagated automatically. The new column(s) will be populated for all rows replicated after the schema change |
| Adding a new column with a default value (`ALTER TABLE ADD COLUMN ... DEFAULT ...`) | Propagated automatically. The new column(s) will be populated for all rows replicated after the schema change, but existing rows will not show the default value without a full table refresh |
| Dropping an existing column (`ALTER TABLE DROP COLUMN ...`) | Detected, but **not** propagated. The dropped column(s) will be populated with `NULL` for all rows replicated after the schema change |
Original file line number Diff line number Diff line change
Expand Up @@ -19,83 +19,91 @@

# Aurora MySQL source setup guide

This is a step-by-step guide on how to configure your Aurora MySQL instance for replicating its data via the MySQL ClickPipe.
<br/>
:::info
We also recommend going through the MySQL FAQs [here](/integrations/data-ingestion/clickpipes/mysql/faq.md). The FAQs page is being actively updated.
:::
This step-by-step guide shows you how to configure Amazon Aurora MySQL to replicate data into ClickHouse Cloud using the [MySQL ClickPipe](../index.md). For common questions around MySQL CDC, see the [MySQL FAQs page](/integrations/data-ingestion/clickpipes/mysql/faq.md).

## Enable binary log retention {#enable-binlog-retention-aurora}
The binary log is a set of log files that contain information about data modifications made to an MySQL server instance, and binary log files are required for replication. Both of the below steps must be followed:

### 1. Enable binary logging via automated backup {#enable-binlog-logging-aurora}
The automated backups feature determines whether binary logging is turned on or off for MySQL. It can be set in the AWS console:
The binary log is a set of log files that contain information about data modifications made to a MySQL server instance, and binary log files are required for replication. To configure binary log retention in Aurora MySQL, you must [enable binary logging](#enable-binlog-logging) and [increase the binlog retention interval](#binlog-retention-interval).

Check notice on line 26 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.SentenceLength

Suggestion: Improve readability by using fewer than 25 words in this sentence.

### 1. Enable binary logging via automated backup {#enable-binlog-logging}

The automated backups feature determines whether binary logging is turned on or off for MySQL. Automated backups can be configured for your instance in the RDS Console by navigating to **Modify** > **Additional configuration** > **Backup** and selecting the **Enable automated backups** checkbox (if not selected already).

<Image img={rds_backups} alt="Enabling automated backups in Aurora" size="lg" border/>

Setting backup retention to a reasonably long value depending on the replication use-case is advisable.
We recommend setting the **Backup retention period** to a reasonably long value, depending on the replication use case.

### 2. Increase the binlog retention interval {#binlog-retention-interval}

:::warning
If ClickPipes tries to resume replication and the required binlog files have been purged due to the configured binlog retention value, the ClickPipe will enter an errored state and a resync is required.
:::

By default, Aurora MySQL purges the binary log as soon as possible (i.e., _lazy purging_). We recommend increasing the binlog retention interval to at least **72 hours** to ensure availability of binary log files for replication under failure scenarios. To set an interval for binary log retention ([`binlog retention hours`](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/mysql-stored-proc-configuring.html#mysql_rds_set_configuration-usage-notes.binlog-retention-hours)), use the [`mysql.rds_set_configuration`](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/mysql-stored-proc-configuring.html#mysql_rds_set_configuration) procedure:

### 2. Binlog retention hours {#binlog-retention-hours-aurora}
The procedure below must be called to ensure availability of binary logs for replication:
[//]: # "NOTE Most CDC providers recommend the maximum retention period for Aurora RDS (7 days/168 hours). Since this has an impact on disk usage, we conservatively recommend a mininum of 3 days/72 hours."

```text
mysql=> call mysql.rds_set_configuration('binlog retention hours', 24);
mysql=> call mysql.rds_set_configuration('binlog retention hours', 72);
```
If this configuration isn't set, Amazon RDS purges the binary logs as soon as possible, leading to gaps in the binary logs.

## Configure binlog settings in the parameter group {#binlog-parameter-group-aurora}
If this configuration isn't set or is set to a low interval, it can lead to gaps in the binary logs, compromising ClickPipes' ability to resume replication.

Check warning on line 50 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.EOLWhitespace

Remove whitespace characters from the end of the line.

Check notice on line 50 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Ability

Suggestion: Try to replace ('ability to') with more precise language, unless this content is about security. See the word list for details.

Check notice on line 50 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.SentenceLength

Suggestion: Improve readability by using fewer than 25 words in this sentence.

## Configure binlog settings {#binlog-settings}

The parameter group can be found when you click on your MySQL instance in the RDS Console, and then heading over to the `Configurations` tab.
The parameter group can be found when you click on your MySQL instance in the RDS Console, and then navigate to the **Configuration** tab.

:::tip
If you have a MySQL cluster, the parameters below can be found in the [DB cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_WorkingWithParamGroups.CreatingCluster.html) parameter group instead of the DB instance group.
:::

<Image img={aurora_config} alt="Where to find parameter group in Aurora" size="lg" border/>

Upon clicking on the parameter group link, you will be taken to the page for it. You will see an Edit button in the top-right.
<br/>
Click the parameter group link, which will take you to its dedicated page. You should see an **Edit** button in the top right.

<Image img={edit_button} alt="Edit parameter group" size="lg" border/>

The following settings need to be set as follows:
<br/>
The following parameters need to be set as follows:

1. `binlog_format` to `ROW`.

<Image img={binlog_format} alt="Binlog format to ROW" size="lg" border/>

2. `binlog_row_metadata` to `FULL`
2. `binlog_row_metadata` to `FULL`.

<Image img={binlog_row_metadata} alt="Binlog row metadata" size="lg" border/>

3. `binlog_row_image` to `FULL`
3. `binlog_row_image` to `FULL`.

<Image img={binlog_row_image} alt="Binlog row image" size="lg" border/>

Then click on `Save Changes` in the top-right. You may need to reboot your instance for the changes to take effect - a way of knowing this is if you see `Pending reboot` next to the parameter group link in the Configurations tab of the RDS instance.
<br/>
Then, click on **Save Changes** in the top right corner. You may need to reboot your instance for the changes to take effect — a way of knowing this is if you see `Pending reboot` next to the parameter group link in the **Configuration** tab of the Aurora instance.

## Enable GTID mode (recommended) {#gtid-mode}

:::tip
If you have a MySQL cluster, the above parameters would be found in a [DB Cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_WorkingWithParamGroups.CreatingCluster.html) parameter group and not the DB instance group.
The MySQL ClickPipe also supports replication without GTID mode. However, enabling GTID mode is recommended for better performance and easier troubleshooting.

Check notice on line 88 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Uppercase

Suggestion: Instead of uppercase for 'GTID', use lowercase or backticks (`) if possible. Otherwise, ask a Technical Writer to add this word or acronym to the rule's exception list.

Check notice on line 88 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Uppercase

Suggestion: Instead of uppercase for 'GTID', use lowercase or backticks (`) if possible. Otherwise, ask a Technical Writer to add this word or acronym to the rule's exception list.
:::

## Enabling GTID mode {#gtid-mode-aurora}
Global Transaction Identifiers (GTIDs) are unique IDs assigned to each committed transaction in MySQL. They simplify binlog replication and make troubleshooting more straightforward.
[Global Transaction Identifiers (GTIDs)](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) are unique IDs assigned to each committed transaction in MySQL. They simplify binlog replication and make troubleshooting more straightforward. We **recommend** enabling GTID mode, so that the MySQL ClickPipe can use GTID-based replication.

Check notice on line 91 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Uppercase

Suggestion: Instead of uppercase for 'GTID', use lowercase or backticks (`) if possible. Otherwise, ask a Technical Writer to add this word or acronym to the rule's exception list.

If your MySQL instance is MySQL 5.7, 8.0 or 8.4, we recommend enabling GTID mode so that the MySQL ClickPipe can use GTID replication.
GTID-based replication is supported for Amazon Aurora MySQL v2 (MySQL 5.7) and v3 (MySQL 8.0), as well as Aurora Serverless v2. To enable GTID mode for your Aurora MySQL instance, follow these steps:

Check notice on line 93 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Uppercase

Suggestion: Instead of uppercase for 'GTID', use lowercase or backticks (`) if possible. Otherwise, ask a Technical Writer to add this word or acronym to the rule's exception list.

Check notice on line 93 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Wordy

Suggestion: Use 'and' instead of 'as well as'.

Check notice on line 93 in docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Uppercase

Suggestion: Instead of uppercase for 'GTID', use lowercase or backticks (`) if possible. Otherwise, ask a Technical Writer to add this word or acronym to the rule's exception list.

To enable GTID mode for your MySQL instance, follow the steps as follows:
1. In the RDS Console, click on your MySQL instance.
2. Click on the `Configurations` tab.
2. Click on the **Configuration** tab.
3. Click on the parameter group link.
4. Click on the `Edit` button in the top-right corner.
4. Click on the **Edit** button in the top right corner.
5. Set `enforce_gtid_consistency` to `ON`.
6. Set `gtid-mode` to `ON`.
7. Click on `Save Changes` in the top-right corner.
7. Click on **Save Changes** in the top right corner.
8. Reboot your instance for the changes to take effect.

<Image img={enable_gtid} alt="GTID enabled" size="lg" border/>

<br/>
:::info
The MySQL ClickPipe also supports replication without GTID mode. However, enabling GTID mode is recommended for better performance and easier troubleshooting.
:::

## Configure a database user {#configure-database-user-aurora}
## Configure a database user {#configure-database-user}

Connect to your Aurora MySQL instance as an admin user and execute the following commands:

Expand All @@ -122,12 +130,16 @@

### IP-based access control {#ip-based-access-control}

If you want to restrict traffic to your Aurora instance, please add the [documented static NAT IPs](../../index.md#list-of-static-ips) to the `Inbound rules` of your Aurora security group as shown below:
To restrict traffic to your Aurora MySQL instance, add the [documented static NAT IPs](../../index.md#list-of-static-ips) to the **Inbound rules** of your Aurora security group.

<Image img={security_group_in_rds_mysql} alt="Where to find security group in Aurora MySQL?" size="lg" border/>

<Image img={edit_inbound_rules} alt="Edit inbound rules for the above security group" size="lg" border/>

### Private access via AWS PrivateLink {#private-access-via-aws-privatelink}

To connect to your Aurora instance through a private network, you can use AWS PrivateLink. Follow our [AWS PrivateLink setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup-for-clickpipes) to set up the connection.
To connect to your Aurora MySQL instance through a private network, you can use AWS PrivateLink. Follow the [AWS PrivateLink setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup-for-clickpipes) to set up the connection.

## What's next? {#whats-next}

Now that your Amazon Aurora MySQL instance is configured for binlog replication and securely connecting to ClickHouse Cloud, you can [create your first MySQL ClickPipe](/integrations/clickpipes/mysql/#create-your-clickpipe). For common questions around MySQL CDC, see the [MySQL FAQs page](/integrations/data-ingestion/clickpipes/mysql/faq.md).
Loading