Skip to content

Commit 07531bf

Browse files
committed
[FLINK-35152][pipeline-connector/doris] Support create doris auto partition table
1 parent 8815f2b commit 07531bf

File tree

11 files changed

+619
-6
lines changed

11 files changed

+619
-6
lines changed

docs/content.zh/docs/connectors/pipeline-connectors/doris.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,25 @@ pipeline:
182182
查看更多关于 <a href="https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE/"> Doris Table 的属性</a></td>
183183
</td>
184184
</tr>
185+
<tr>
186+
<td>table.create.auto-partition.properties.*</td>
187+
<td>optional</td>
188+
<td style="word-wrap: break-word;">(none)</td>
189+
<td>String</td>
190+
<td>创建自动分区表的配置。<br/>
191+
当前仅支持DATE/DATETIME类型列的AUTO RANGE PARTITION,分区函数为<code>date_trunc</code>,且Doris版本必须大于2.1.6,查看更多关于 <a href="https://doris.apache.org/docs/table-design/data-partitioning/auto-partitioning">Doris自动分区</a><br/>
192+
支持的属性有:<br/>
193+
<code> table.create.auto-partition.properties.include</code>包含的经过route后的表集合,用逗号分隔,支持正则表达式;<br/>
194+
<code> table.create.auto-partition.properties.exclude</code>排除的经过route后的表集合,用逗号分隔,支持正则表达式;<br/>
195+
<code> table.create.auto-partition.properties.default_partition_key</code>默认分区键;<br/>
196+
<code> table.create.auto-partition.properties.default_partition_unit</code>默认分区单位;<br/>
197+
<code> table.create.auto-partition.properties.DB.TABLE.partition_key</code>特定表的分区键,如未配置取默认分区键;<br/>
198+
<code> table.create.auto-partition.properties.DB.TABLE.partition_unit</code>特定表的分区单位,如未配置取默认分区单位。<br/>
199+
注意:<br/>
200+
1: 如果分区键不为DATE/DATETIME类型,则不会创建分区表。<br/>
201+
2: Doris AUTO RANGE PARTITION不支持NULLABLE列作为分区列,如果您配置的分区键的值为空或者表创建完成后新增了NULLABLE分区列,系统将自动填充默认值(DATE类型为<code>1970-01-01</code>,DATETIME类型为<code>1970-01-01 00:00:00</code>),请选择合适的分区键。
202+
</td>
203+
</tr>
185204
</tbody>
186205
</table>
187206
</div>

docs/content/docs/connectors/pipeline-connectors/doris.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,25 @@ pipeline:
182182
See more about <a href="https://doris.apache.org/docs/dev/sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE/"> Doris Table Properties</a></td>
183183
</td>
184184
</tr>
185+
<tr>
186+
<td>table.create.auto-partition.properties.*</td>
187+
<td>optional</td>
188+
<td style="word-wrap: break-word;">(none)</td>
189+
<td>String</td>
190+
<td>Create the auto partition Properties configuration of the table.<br/>
191+
Currently the partition function only supports date_trunc, and the partition column supports only DATE or DATETIME types, and the version of Doris must greater than 2.1.6. See more about <a href="https://doris.apache.org/docs/table-design/data-partitioning/auto-partitioning">Doris Auto Partitioning</a><br/>
192+
These properties are supported now:<br/>
193+
<code> table.create.auto-partition.properties.include</code>A collection of tables after route to include, separated by commas, supports regular expressions;<br/>
194+
<code> table.create.auto-partition.properties.exclude</code>A collection of tables after route to exclude, separated by commas, supports regular expressions;<br/>
195+
<code> table.create.auto-partition.properties.default_partition_key</code>The default partition key;<br/>
196+
<code> table.create.auto-partition.properties.default_partition_unit</code>The default partition unit;<br/>
197+
<code> table.create.auto-partition.properties.DB.TABLE.partition_key</code>The partition key of a specific table. If not set, the default partition key is used;<br/>
198+
<code> table.create.auto-partition.properties.DB.TABLE.partition_unit</code>The partition unit of a specific table. If not set, the default partition unit is used.<br/>
199+
Note:<br/>
200+
1: If the partition key is not DATE/DATETIME type, auto partition tables won't be created.<br/>
201+
2: Doris AUTO RANGE PARTITION does not support NULLABLE columns as partition key, if Flink CDC get a NULL value or a NULLABLE partition key was added after the table was created, will automatically fill it with a default value(DATE:<code>1970-01-01</code>, DATETIME:<code>1970-01-01 00:00:00</code>), chose a suitable partition key is very important.
202+
</td>
203+
</tr>
185204
</tbody>
186205
</table>
187206
</div>

flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ limitations under the License.
2727
<name>flink-cdc-pipeline-connector-doris</name>
2828

2929
<properties>
30-
<doris.connector.version>24.0.1</doris.connector.version>
30+
<doris.connector.version>24.1.0</doris.connector.version>
3131
<mysql.connector.version>8.0.26</mysql.connector.version>
3232
</properties>
3333

flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/src/main/java/org/apache/flink/cdc/connectors/doris/factory/DorisDataSinkFactory.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@
5858
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.SINK_MAX_RETRIES;
5959
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.SINK_USE_CACHE;
6060
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.STREAM_LOAD_PROP_PREFIX;
61+
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX;
6162
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_PROPERTIES_PREFIX;
6263
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.USERNAME;
6364

@@ -67,7 +68,10 @@ public class DorisDataSinkFactory implements DataSinkFactory {
6768
@Override
6869
public DataSink createDataSink(Context context) {
6970
FactoryHelper.createFactoryHelper(this, context)
70-
.validateExcept(TABLE_CREATE_PROPERTIES_PREFIX, STREAM_LOAD_PROP_PREFIX);
71+
.validateExcept(
72+
TABLE_CREATE_PROPERTIES_PREFIX,
73+
STREAM_LOAD_PROP_PREFIX,
74+
TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX);
7175

7276
Configuration config = context.getFactoryConfiguration();
7377
DorisOptions.Builder optionsBuilder = DorisOptions.builder();

flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/src/main/java/org/apache/flink/cdc/connectors/doris/sink/DorisDataSink.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,14 +62,14 @@ public EventSinkProvider getEventSinkProvider() {
6262
dorisOptions,
6363
readOptions,
6464
executionOptions,
65-
new DorisEventSerializer(zoneId)));
65+
new DorisEventSerializer(zoneId, configuration)));
6666
} else {
6767
return FlinkSinkProvider.of(
6868
new DorisBatchSink<>(
6969
dorisOptions,
7070
readOptions,
7171
executionOptions,
72-
new DorisEventSerializer(zoneId)));
72+
new DorisEventSerializer(zoneId, configuration)));
7373
}
7474
}
7575

flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/src/main/java/org/apache/flink/cdc/connectors/doris/sink/DorisDataSinkOptions.java

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,29 @@ public class DorisDataSinkOptions {
158158
public static final String STREAM_LOAD_PROP_PREFIX = "sink.properties.";
159159
// Prefix for Doris Create table.
160160
public static final String TABLE_CREATE_PROPERTIES_PREFIX = "table.create.properties.";
161+
// Prefix for Doris Create auto partition table.
162+
public static final String TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX =
163+
"table.create.auto-partition.properties.";
164+
public static final String TABLE_CREATE_PARTITION_KEY = "partition-key";
165+
public static final String TABLE_CREATE_PARTITION_UNIT = "partition-unit";
166+
167+
public static final String TABLE_CREATE_DEFAULT_PARTITION_KEY =
168+
"default-" + TABLE_CREATE_PARTITION_KEY;
169+
public static final String TABLE_CREATE_DEFAULT_PARTITION_UNIT =
170+
"default-" + TABLE_CREATE_PARTITION_UNIT;
171+
172+
public static final String TABLE_CREATE_AUTO_PARTITION_PROPERTIES_DEFAULT_PARTITION_KEY =
173+
TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX + TABLE_CREATE_DEFAULT_PARTITION_KEY;
174+
public static final String TABLE_CREATE_AUTO_PARTITION_PROPERTIES_DEFAULT_PARTITION_UNIT =
175+
TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX + TABLE_CREATE_DEFAULT_PARTITION_UNIT;
176+
177+
public static final String TABLE_CREATE_PARTITION_INCLUDE = "include";
178+
public static final String TABLE_CREATE_PARTITION_EXCLUDE = "exclude";
179+
180+
public static final String TABLE_CREATE_AUTO_PARTITION_PROPERTIES_INCLUDE =
181+
TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX + TABLE_CREATE_PARTITION_INCLUDE;
182+
public static final String TABLE_CREATE_AUTO_PARTITION_PROPERTIES_EXCLUDE =
183+
TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX + TABLE_CREATE_PARTITION_EXCLUDE;
161184

162185
public static Map<String, String> getPropertiesByPrefix(
163186
Configuration tableOptions, String prefix) {

flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/src/main/java/org/apache/flink/cdc/connectors/doris/sink/DorisEventSerializer.java

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717

1818
package org.apache.flink.cdc.connectors.doris.sink;
1919

20+
import org.apache.flink.api.java.tuple.Tuple2;
21+
import org.apache.flink.cdc.common.configuration.Configuration;
2022
import org.apache.flink.cdc.common.data.RecordData;
2123
import org.apache.flink.cdc.common.event.CreateTableEvent;
2224
import org.apache.flink.cdc.common.event.DataChangeEvent;
@@ -26,8 +28,14 @@
2628
import org.apache.flink.cdc.common.event.TableId;
2729
import org.apache.flink.cdc.common.schema.Column;
2830
import org.apache.flink.cdc.common.schema.Schema;
31+
import org.apache.flink.cdc.common.types.DataType;
32+
import org.apache.flink.cdc.common.types.DateType;
33+
import org.apache.flink.cdc.common.types.LocalZonedTimestampType;
34+
import org.apache.flink.cdc.common.types.TimestampType;
35+
import org.apache.flink.cdc.common.types.ZonedTimestampType;
2936
import org.apache.flink.cdc.common.utils.Preconditions;
3037
import org.apache.flink.cdc.common.utils.SchemaUtils;
38+
import org.apache.flink.cdc.connectors.doris.utils.DorisSchemaUtils;
3139

3240
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonProcessingException;
3341
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper;
@@ -42,6 +50,7 @@
4250
import java.util.HashMap;
4351
import java.util.List;
4452
import java.util.Map;
53+
import java.util.Objects;
4554

4655
import static org.apache.doris.flink.sink.util.DeleteOperation.addDeleteSign;
4756

@@ -61,8 +70,11 @@ public class DorisEventSerializer implements DorisRecordSerializer<Event> {
6170
/** ZoneId from pipeline config to support timestamp with local time zone. */
6271
public final ZoneId pipelineZoneId;
6372

64-
public DorisEventSerializer(ZoneId zoneId) {
73+
public final Configuration dorisConfig;
74+
75+
public DorisEventSerializer(ZoneId zoneId, Configuration config) {
6576
pipelineZoneId = zoneId;
77+
dorisConfig = config;
6678
}
6779

6880
@Override
@@ -108,6 +120,30 @@ private DorisRecord applyDataChangeEvent(DataChangeEvent event) throws JsonProce
108120
throw new UnsupportedOperationException("Unsupport Operation " + op);
109121
}
110122

123+
// get partition info from config
124+
Tuple2<String, String> partitionInfo =
125+
DorisSchemaUtils.getPartitionInfo(dorisConfig, schema, tableId);
126+
if (!Objects.isNull(partitionInfo)) {
127+
String partitionKey = partitionInfo.f0;
128+
Object partitionValue = valueMap.get(partitionKey);
129+
// fill partition column by default value if null
130+
if (Objects.isNull(partitionValue)) {
131+
schema.getColumn(partitionKey)
132+
.ifPresent(
133+
column -> {
134+
DataType dataType = column.getType();
135+
if (dataType instanceof DateType) {
136+
valueMap.put(partitionKey, DorisSchemaUtils.DEFAULT_DATE);
137+
} else if (dataType instanceof LocalZonedTimestampType
138+
|| dataType instanceof TimestampType
139+
|| dataType instanceof ZonedTimestampType) {
140+
valueMap.put(
141+
partitionKey, DorisSchemaUtils.DEFAULT_DATETIME);
142+
}
143+
});
144+
}
145+
}
146+
111147
return DorisRecord.of(
112148
tableId.getSchemaName(),
113149
tableId.getTableName(),
@@ -121,7 +157,6 @@ public Map<String, Object> serializerRecord(RecordData recordData, Schema schema
121157
Preconditions.checkState(
122158
columns.size() == recordData.getArity(),
123159
"Column size does not match the data size");
124-
125160
for (int i = 0; i < recordData.getArity(); i++) {
126161
DorisRowConverter.SerializationConverter converter =
127162
DorisRowConverter.createNullableExternalConverter(

flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/src/main/java/org/apache/flink/cdc/connectors/doris/sink/DorisMetadataApplier.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717

1818
package org.apache.flink.cdc.connectors.doris.sink;
1919

20+
import org.apache.flink.api.java.tuple.Tuple2;
2021
import org.apache.flink.cdc.common.configuration.Configuration;
2122
import org.apache.flink.cdc.common.event.AddColumnEvent;
2223
import org.apache.flink.cdc.common.event.AlterColumnTypeEvent;
@@ -39,6 +40,7 @@
3940
import org.apache.flink.cdc.common.types.TimestampType;
4041
import org.apache.flink.cdc.common.types.ZonedTimestampType;
4142
import org.apache.flink.cdc.common.types.utils.DataTypeUtils;
43+
import org.apache.flink.cdc.connectors.doris.utils.DorisSchemaUtils;
4244
import org.apache.flink.util.CollectionUtil;
4345

4446
import org.apache.flink.shaded.guava31.com.google.common.collect.Sets;
@@ -162,6 +164,10 @@ private void applyCreateTableEvent(CreateTableEvent event) throws SchemaEvolveEx
162164
DorisDataSinkOptions.getPropertiesByPrefix(
163165
config, TABLE_CREATE_PROPERTIES_PREFIX);
164166
tableSchema.setProperties(tableProperties);
167+
168+
Tuple2<String, String> partitionInfo =
169+
DorisSchemaUtils.getPartitionInfo(config, schema, tableId);
170+
tableSchema.setPartitionInfo(partitionInfo);
165171
schemaChangeManager.createTable(tableSchema);
166172
} catch (Exception e) {
167173
throw new SchemaEvolveException(event, e.getMessage(), e);
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.flink.cdc.connectors.doris.utils;
19+
20+
import org.apache.flink.api.java.tuple.Tuple2;
21+
import org.apache.flink.cdc.common.configuration.Configuration;
22+
import org.apache.flink.cdc.common.event.TableId;
23+
import org.apache.flink.cdc.common.schema.Schema;
24+
import org.apache.flink.cdc.common.schema.Selectors;
25+
import org.apache.flink.cdc.common.types.DataType;
26+
import org.apache.flink.cdc.common.types.DateType;
27+
import org.apache.flink.cdc.common.types.LocalZonedTimestampType;
28+
import org.apache.flink.cdc.common.types.TimestampType;
29+
import org.apache.flink.cdc.common.types.ZonedTimestampType;
30+
import org.apache.flink.cdc.common.utils.StringUtils;
31+
import org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions;
32+
33+
import java.util.Map;
34+
35+
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX;
36+
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_DEFAULT_PARTITION_KEY;
37+
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_DEFAULT_PARTITION_UNIT;
38+
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_PARTITION_EXCLUDE;
39+
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_PARTITION_INCLUDE;
40+
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_PARTITION_KEY;
41+
import static org.apache.flink.cdc.connectors.doris.sink.DorisDataSinkOptions.TABLE_CREATE_PARTITION_UNIT;
42+
43+
/** Utilities for doris schema. */
44+
public class DorisSchemaUtils {
45+
46+
public static final String DEFAULT_DATE = "1970-01-01";
47+
public static final String DEFAULT_DATETIME = "1970-01-01 00:00:00";
48+
49+
/**
50+
* Get partition info by config. Currently only supports DATE/TIMESTAMP AUTO RANGE PARTITION and
51+
* doris version should greater than 2.1.6
52+
*
53+
* @param config
54+
* @param schema
55+
* @param tableId
56+
* @return
57+
*/
58+
public static Tuple2<String, String> getPartitionInfo(
59+
Configuration config, Schema schema, TableId tableId) {
60+
Map<String, String> autoPartitionProperties =
61+
DorisDataSinkOptions.getPropertiesByPrefix(
62+
config, TABLE_CREATE_AUTO_PARTITION_PROPERTIES_PREFIX);
63+
String excludes = autoPartitionProperties.get(TABLE_CREATE_PARTITION_EXCLUDE);
64+
if (!StringUtils.isNullOrWhitespaceOnly(excludes)) {
65+
Selectors selectExclude =
66+
new Selectors.SelectorsBuilder().includeTables(excludes).build();
67+
if (selectExclude.isMatch(tableId)) {
68+
return null;
69+
}
70+
}
71+
72+
String includes = autoPartitionProperties.get(TABLE_CREATE_PARTITION_INCLUDE);
73+
if (!StringUtils.isNullOrWhitespaceOnly(includes)) {
74+
Selectors selectInclude =
75+
new Selectors.SelectorsBuilder().includeTables(includes).build();
76+
if (!selectInclude.isMatch(tableId)) {
77+
return null;
78+
}
79+
}
80+
81+
String partitionKey =
82+
autoPartitionProperties.get(
83+
tableId.identifier() + "." + TABLE_CREATE_PARTITION_KEY);
84+
String partitionUnit =
85+
autoPartitionProperties.get(
86+
tableId.identifier() + "." + TABLE_CREATE_PARTITION_UNIT);
87+
if (StringUtils.isNullOrWhitespaceOnly(partitionKey)) {
88+
partitionKey = autoPartitionProperties.get(TABLE_CREATE_DEFAULT_PARTITION_KEY);
89+
}
90+
if (StringUtils.isNullOrWhitespaceOnly(partitionUnit)) {
91+
partitionUnit = autoPartitionProperties.get(TABLE_CREATE_DEFAULT_PARTITION_UNIT);
92+
}
93+
94+
if (schema.getColumn(partitionKey).isPresent()
95+
&& !StringUtils.isNullOrWhitespaceOnly(partitionKey)) {
96+
97+
DataType dataType = schema.getColumn(partitionKey).get().getType();
98+
if (dataType instanceof LocalZonedTimestampType
99+
|| dataType instanceof TimestampType
100+
|| dataType instanceof ZonedTimestampType
101+
|| dataType instanceof DateType) {
102+
return new Tuple2<>(partitionKey, partitionUnit);
103+
}
104+
}
105+
return null;
106+
}
107+
}

0 commit comments

Comments
 (0)