Mysql/source: binlog position/starttime logic #209

work-vv · 2025-02-07T18:39:50Z

Binlog start position or/and binlog startime missed in mysql adapter config, though pointer used in runtime logic. It looks like a minor change but huge improvement which can reduce replication time in some cases, such as quick recovering replication on top of existing snapshot.

laskoviymishka · 2025-02-07T23:07:44Z

Thought about this, but not sure regards to end user flow, any ideas how it can be used from end user perspective?

work-vv · 2025-02-08T05:14:29Z

Common usecase. Sync between mysql and ch failed in production at midnight) Binlog is huge and replicate it from the beginning is not effective way to restore and sync state again. But with suggested functionality we can restart replication and sync process from midnight.

laskoviymishka · 2025-02-08T09:26:32Z

Yeah, but how this shall be operated on ops level? As an extra option?

work-vv · 2025-02-08T10:18:16Z

As soon as incremenal snaphot might be differ or not available from provider to provider I would put time tag or pointer for now as an optional parameter for mysql connector and fill it in transfer config source connector options. Also it would be nice to show actual value in logs on data transfer finished or failed.

Poltoruhin · 2025-02-12T15:27:42Z

Hello gentlemens!
Just wanna say that this enhancement would be much appreciated. We trying to use CH for OLAP alongside with main MySQL db and the case described by @work-vv seems very possible.

laskoviymishka · 2025-02-12T17:42:06Z

@work-vv @Poltoruhin it's still unclear to me how it would work.
If the idea is to allow transfer to recover after binlog rotate - then, it's a case for increment-only transfer where SyncBinlog procedure will recover binlog on a restart.

work-vv · 2025-02-12T18:52:59Z

Suppose we have a MySQL to ClickHouse replication with real-time synchronization via binlog. If, for some reason, data desynchronization occurs (e.g., an error in the transfer process, ClickHouse crashing after a version update), it is necessary to restore synchronization shortly. During testing, creating a snapshot in ClickHouse from the binlog for 4 tables with 5 million records each took 1.5 hours. However, if there is a way to synchronize data from a specific point, we can retain the existing data and specify synchronization from the moment of failure. Alternatively, we could create a dump file and import it directly into ClickHouse (another optimization suggestion is to enable snapshot creation from MySQL to ClickHouse via table dumps), specifying the synchronization starting point at the moment the dump was created.

laskoviymishka · 2025-02-12T19:00:54Z

Let me rephrase, we have following config:

id: test
type: SNAPSHOT_AND_INCREMENT
src:
  type: mysql
  params:
    Host: mysql
    User: myuser
    Password: mypassword
    Database: mydb
    Port: 3306
dst:
  type: ch
  params:
    ....

Let's add a new field: startpos, as binlog position / gtid-id

id: test
type: INCREMENT_ONLY
src:
  type: mysql
  params:
    Host: mysql
    User: myuser
    Password: mypassword
    Database: mydb
    Port: 3306
    StartPos:
      type: binlog
      value: "binlog_0001/123"
dst:
  type: ch
  params:
    ....

So once transfer is started - we start not from a head of a binlog, but exactly from "binlog_0001/123".

There is a question: what to do if there is no such binlog file? (i.e. it's got rotated).

work-vv · 2025-02-12T19:04:17Z

this is optional value in config which is creator responsibility

work-vv · 2025-02-12T19:18:33Z

There is a question: what to do if there is no such binlog file? (i.e. it's got rotated). #
I would suggest to stop with error, same as you do for any other incorrectly defined option.

laskoviymishka · 2025-02-12T19:24:58Z

Okay, now it's a lot more clear, and seems kinda trivial to implement, it's enough to add a new property to model here and use it inside SyncBinlogPosition, if model field is presented - took value from it, otherwise - keep exist logic.

BorisTyshkevich · 2025-02-14T20:28:57Z

I don't think it's a good idea to place the binlog position into the yaml config mixing settings and data/metadata.

CDC position is already stored somewhere, and we need to get access to that data using several operations:

trcli cdc-position set "string"
trcli cdc-position get
trcli cdc-position reset

And, of course, all connectors should follow the same rules for position management.

laskoviymishka · 2025-02-14T20:57:21Z

That's also doable, but all connectors have different rules regards to position, some store position in source itself (like Kafka with consumer group), but some outside (like MySQL).

BorisTyshkevich · 2025-02-15T07:12:49Z

I expect that in the future, all of them will use KepperMap to support EoD. Till then - yes, some connectors won't allow offset manipulations.

laskoviymishka · 2025-02-15T07:48:15Z

That won't happen, since keeper map is only available in ClickHouse target, and some DBs doesn't allow you to directly manipulate offset position that stored outside (for example PostgreSQL). As alternative we could add keeper map coordinator implementation instead of s3, this will make management of transfer state a lot easier.
Offset Management is a source db specific concern, it's really hard to unify it somehow and I don't see any real profit from such unification.

BorisTyshkevich · 2025-02-15T08:01:01Z

Yes, coodinator=keepermap would be a great addition.

I agree that offsets are more related to the SRC, but probably to the coordinator. Anyway, I see it as an operation, not a setting in Yaml

trcli mysql-position set "...."
trcli postgres-position set "...."
trcli s3-position set "...."
trcli keepermap-position set "...."

trcli set-position mysql "...."
trcli set-position postgres "...."

laskoviymishka · 2025-02-15T20:02:43Z

I would say that a command that set position can be even simplified

trcli state set-position "" --config file.yaml

Since we still need a config to verify that position is valid, and config already contains information about type of source.

BorisTyshkevich · 2025-02-15T20:47:43Z

of course, we need config file and coordinator settings (like the bucket).

Commands for reset/get would also be very useful in real life.

laskoviymishka changed the title ~~Add Mysql binlog position/starttime logic~~ Mysql/source: binlog position/starttime logic Feb 8, 2025

laskoviymishka added the enhancement New feature or request label Feb 8, 2025

laskoviymishka added good first issue Good for newcomers help wanted Extra attention is needed labels Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mysql/source: binlog position/starttime logic #209

Mysql/source: binlog position/starttime logic #209

work-vv commented Feb 7, 2025

laskoviymishka commented Feb 7, 2025

work-vv commented Feb 8, 2025

laskoviymishka commented Feb 8, 2025

work-vv commented Feb 8, 2025 •

edited

Loading

Poltoruhin commented Feb 12, 2025

laskoviymishka commented Feb 12, 2025

work-vv commented Feb 12, 2025 •

edited

Loading

laskoviymishka commented Feb 12, 2025

work-vv commented Feb 12, 2025

work-vv commented Feb 12, 2025

laskoviymishka commented Feb 12, 2025

BorisTyshkevich commented Feb 14, 2025

laskoviymishka commented Feb 14, 2025

BorisTyshkevich commented Feb 15, 2025

laskoviymishka commented Feb 15, 2025

BorisTyshkevich commented Feb 15, 2025 •

edited

Loading

laskoviymishka commented Feb 15, 2025

BorisTyshkevich commented Feb 15, 2025

Mysql/source: binlog position/starttime logic #209

Mysql/source: binlog position/starttime logic #209

Comments

work-vv commented Feb 7, 2025

laskoviymishka commented Feb 7, 2025

work-vv commented Feb 8, 2025

laskoviymishka commented Feb 8, 2025

work-vv commented Feb 8, 2025 • edited Loading

Poltoruhin commented Feb 12, 2025

laskoviymishka commented Feb 12, 2025

work-vv commented Feb 12, 2025 • edited Loading

laskoviymishka commented Feb 12, 2025

work-vv commented Feb 12, 2025

work-vv commented Feb 12, 2025

laskoviymishka commented Feb 12, 2025

BorisTyshkevich commented Feb 14, 2025

laskoviymishka commented Feb 14, 2025

BorisTyshkevich commented Feb 15, 2025

laskoviymishka commented Feb 15, 2025

BorisTyshkevich commented Feb 15, 2025 • edited Loading

laskoviymishka commented Feb 15, 2025

BorisTyshkevich commented Feb 15, 2025

work-vv commented Feb 8, 2025 •

edited

Loading

work-vv commented Feb 12, 2025 •

edited

Loading

BorisTyshkevich commented Feb 15, 2025 •

edited

Loading