[Feature] Simplify `migrate_table` and `migrate_iceberg_table` into one procedure for easier use #5074

liyubin117 · 2025-02-13T09:02:16Z

Search before asking

I searched in the issues and found nothing similar.

Motivation

I found that #4639 introduce a new procedure migrate_iceberg_table, and it is similar to migrate_table. we could use connector argument to distinguish the two scenarios in one procedure instead of introduce a new procedure.

CALL sys.migrate_table(connector => 'hive', source_table => 'default.hivetable', options => 'file.format=orc');
CALL sys.migrate_table(connector => 'iceberg', source_table => 'default.icebergtable', options => 'file.format=orc');

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

liyubin117 · 2025-02-14T03:03:01Z

@LsomeYeah What do you think? Looking forward your opinions

LsomeYeah · 2025-02-14T04:04:18Z

@liyubin117 Hi, thanks for inviting. In the first version of iceberg migration, we use connector argument in migrate_table to distinguish the two scenarios. The reason why I introduced a new procedure is as follows.

the arguments that the two scenarios are different. When migrating a hive table to paimon, a hive catalog in paimon is needed for accessing the origin hive table and the target paimon table, catalog in CALL catalog.sys.migrate_table must be a hive catalog in paimon, so we just need provide the source_table in procedure to get the origin hive table. But for migration of iceberg, we cannot use a paimon catalog to access the source iceberg table, so we have to provide some extra information about the origin iceberg table(such as the type of iceberg catalog for managing the iceberg table, the warehouse, the hive metastore uri , etc.).
for future scalability and usability. In the future, we may consider migrating tables from delta or hudi to paimon, the arguments may be different too, too many arguments in the same procedure may increase the complexity of usage.
**the migration for hive and for delta is separated in Iceberg too.**https://iceberg.apache.org/docs/1.6.0/table-migration/?h=migrati#migrating-from-different-table-formats. Migration for hive in iceberg also need an iceberg catalog to accessing origin table and target table, while migration for other data lake need special processing.

liyubin117 · 2025-02-14T06:43:50Z

@LsomeYeah Thanks for your explanation, I have a minor doubts, the only difference in arguements between two procedures is that the migrate_iceberg_table has iceberg_options, Could we use sys.migrate_table('iceberg','icebergCatalog.db.t1') to reuse the options defined in created catalog instead of declaring them in the procdure again?
I found that migration procedure in icerbeg is CALL catalog_name.system.migrate('spark_catalog.db.sample', map('foo', 'bar'));, catalog is included in table argument

MigrateTableProcedure

@ProcedureHint(
            argument = {
                @ArgumentHint(name = "connector", type = @DataTypeHint("STRING")),
                @ArgumentHint(name = "source_table", type = @DataTypeHint("STRING")),
                @ArgumentHint(name = "options", type = @DataTypeHint("STRING"), isOptional = true),
                @ArgumentHint(
                        name = "parallelism",
                        type = @DataTypeHint("Integer"),
                        isOptional = true)
            })

MigrateIcebergTableProcedure

@ProcedureHint(
            argument = {
                @ArgumentHint(name = "source_table", type = @DataTypeHint("STRING")),
                @ArgumentHint(
                        name = "iceberg_options",
                        type = @DataTypeHint("STRING"),
                        isOptional = true),
                @ArgumentHint(name = "options", type = @DataTypeHint("STRING"), isOptional = true),
                @ArgumentHint(
                        name = "parallelism",
                        type = @DataTypeHint("Integer"),
                        isOptional = true)
            })

LsomeYeah · 2025-02-14T10:35:43Z

@liyubin117 Happy to discuss. Currently, there is no catalog in Paimon that can access Iceberg tables. So the icebergCatalog in 'icebergCatalog.db.t1' should be an iceberg catalog. In fact, I had used an iceberg catalog to access iceberg tables for migration before, but the migration is written in paimon-core module and this will introduce iceberg dependencies to paimon-core module which is unexpected after discussing with some paimon committers.

As I know, Iceberg use the catalog in catalog.database.tablename as source catalog, and it must be a SparkCatalog or SparkSessionCatalog, SparkCatalog is a wrapped iceberg catalog which can only load iceberg table, SparkSessionCatalog wraps an iceberg catalog and a delegate catalog which implements some interfaces about spark catalog for loading non-Iceberg tables, and this may make iceberg migration can handle migrating csv or parquet etc. to Iceberg. And the procedure introduced now is in Flink, Paimon now has no catalog for flink which can load paimon tables and non-paimon tables.

liyubin117 added the enhancement New feature or request label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Simplify `migrate_table` and `migrate_iceberg_table` into one procedure for easier use #5074

[Feature] Simplify `migrate_table` and `migrate_iceberg_table` into one procedure for easier use #5074

liyubin117 commented Feb 13, 2025

liyubin117 commented Feb 14, 2025

LsomeYeah commented Feb 14, 2025

liyubin117 commented Feb 14, 2025 •

edited

Loading

LsomeYeah commented Feb 14, 2025 •

edited

Loading

[Feature] Simplify migrate_table and migrate_iceberg_table into one procedure for easier use #5074

[Feature] Simplify migrate_table and migrate_iceberg_table into one procedure for easier use #5074

Comments

liyubin117 commented Feb 13, 2025

Search before asking

Motivation

Solution

Anything else?

Are you willing to submit a PR?

liyubin117 commented Feb 14, 2025

LsomeYeah commented Feb 14, 2025

liyubin117 commented Feb 14, 2025 • edited Loading

LsomeYeah commented Feb 14, 2025 • edited Loading

[Feature] Simplify `migrate_table` and `migrate_iceberg_table` into one procedure for easier use #5074

[Feature] Simplify `migrate_table` and `migrate_iceberg_table` into one procedure for easier use #5074

liyubin117 commented Feb 14, 2025 •

edited

Loading

LsomeYeah commented Feb 14, 2025 •

edited

Loading