-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clone]refactor the clone action as we introduced external path #4844
[clone]refactor the clone action as we introduced external path #4844
Conversation
I feel that the current clone process needs to be refactored:
This hierarchical approach to copying is the correct solution. |
Thanks for your advice, I will try do this best. |
Hi, Jingsong, according to the original design[1] and the above discussion, I plan to refactore to the following Flink batch job.
Please help confirm if this refactoring is appropriate, Thanks. [1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Action+and+Procedure |
Hi @neuyilan , thanks for your design! The second stage, I think we can just pick manifests. We don't need to pick files here. |
Hi, @JingsongLi , The original design was to pick out all files and then copy the corresponding files according to the file type at each step. |
Yes, I think we can refactor it now. |
Hi, @JingsongLi , thanks again for advice, and I have refactored to the following Flink batch job, please review it again. Thanks.
|
@neuyilan |
Hi, @wwj6591812, Thanks for remind, I had a misunderstanding before. After this modification, both batch job and stream job will be affected. Is that right? |
@JingsongLi @wwj6591812 PTAL, Thanks. |
paimon-core/src/main/java/org/apache/paimon/io/DataFileMeta.java
Outdated
Show resolved
Hide resolved
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CloneFileInfo.java
Outdated
Show resolved
Hide resolved
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CloneFileInfo.java
Outdated
Show resolved
Hide resolved
...link-common/src/main/java/org/apache/paimon/flink/clone/PickSchemaFilesForCloneOperator.java
Outdated
Show resolved
Hide resolved
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CloneFileInfo.java
Outdated
Show resolved
Hide resolved
...aimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CopyManifestFileOperator.java
Show resolved
Hide resolved
...link/paimon-flink-common/src/test/java/org/apache/paimon/flink/action/CloneActionITCase.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Purpose
https://cwiki.apache.org/confluence/display/PAIMON/PIP-29%3A+Introduce+Table+Multi-Location++Management
refactor the clone action as we introduced the external path.
I want to point out that regardless of where the data in the source table is stored (warehouse path or external path). We will all copy the data to the warehouse path of the target table.
If we still use the external path of the source table as the data path in target table. In that case, the data from the source table and the target table will be merged together.
what's your opinion?
Tests
Add CloneActionITCase.testCloneTableWithSourceTableExternalPath
API and Format
no
Documentation