Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: temporal partition rotation automation #17427

Open
shlomi-noach opened this issue Dec 24, 2024 · 0 comments
Open

RFC: temporal partition rotation automation #17427

shlomi-noach opened this issue Dec 24, 2024 · 0 comments
Labels
Component: schema management schemadiff and schema changes Type: RFC Request For Comment

Comments

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Dec 24, 2024

I'm proposing that Vitess can automate temporal partition rotation based on user defined rules.

What are temporal partitioned tables?

Tables that are PARTITIONED BY RANGE and over temporal (time-based) values. This would be either:

  • PARTITION BY RANGE COLUMNS (col_name) over a single column, that is either DATE or DATETIME (technically MySQL also allows TIME but that is not so interesting to partition rotation).
  • PARTITION BY RANGE (func(col_name)) WHERE func is one of selected functions such as TO_DAYS, and the column is again either DATE or DATETIME.

Note: with MySQL 8.4 we can also support UNIX_TIMESTAMP(timestamp_column)

For practical reasons, we will have a predefined set of allowed functions/expressions. For example, we will support PARTITION BY RANGE (TO_DAYS(my_column)) or likewise YEAR(my_column) but we will not support ROUND(SQRT(TO_DAYS(my_column)+3.14)). We aim for practical operational scenarios. Most users will rotate hourly, daily, possibly weekly, monthly, yearly.

What is the proposal?

We will have a per-table rule:

  • name of table
  • rotation interval (hourly, daily, ...)
  • number of ahead-of-time partitions (prepare this many futuristic empty partitions)
  • retention (e.g. 30 days)
  • control flags such as enabled/disabled

Vitess will periodically look at all the rules, and check all referenced tables. For each table, if applicable, it will generate a sequence of Online DDL migrations, with internal UUID and in-order-execution, that ensure the table is in required state. Since it will do this periodically, most of the this will be a no-op since the table will already have all the required future partitions, and will have dropped expired partitions.

Where are rules to be stored?

I'm thinking as part of Keyspace record in topo, much like the throttler configuration. The config will be copied from Keyspace to SrvKeyspace as needed, again just like the throttler configuration.

What are expected problems?

I'm not sure how to handle errors. For example:

  • what if a table was dropped but the rule is still there?
  • what if the table is not partitioned?
  • what if the rule does not comply with the table definition? e.g. request to rotate hourly, but the table uses TO_DAYS expression, making the minimal interval 24h?

So where should these errors go? As this is a background operation, there's no occasion to respond to the user with the list of errors.

Alternative approaches

With rules still in place, maybe Vitess should not auto-rotate. Instead, maybe we should have vtctldclient RotatePartitionedTables command, that will:

  • analyze all the required changes
  • run them sequentially (either asynchronously via Online DDL or somehow directly)
  • report errors to the user

The partition analysis should still be made on a per-shard basis, to ensure independence of shards and eventual consistency, much like we delegate all Online DDL changes to shards.

Existing work

schemadiff PR to analyze temporal range partitioned tables and to generate required creation of ahead-of-time partitions, and purge expired partitions: #17426. This also validates intervals and other constraints.

Also related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: schema management schemadiff and schema changes Type: RFC Request For Comment
Projects
None yet
Development

No branches or pull requests

1 participant