-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Support upper bound in dynamic bucket mode #4974
base: master
Are you sure you want to change the base?
Conversation
@JingsongLi CI passed, PTAL, thanks! |
paimon-core/src/main/java/org/apache/paimon/index/SimpleHashBucketAssigner.java
Outdated
Show resolved
Hide resolved
c7b7755
to
df576e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these modifications are effective, so let me give you my suggestion:
We only need to modify PartitionIndex.asign inside:
Only code:
// 3. create a new bucket
for (int i = 0; i < Short.MAX_VALUE; i++) {
if (bucketFilter.test(i) && !totalBucket.contains(i)) {
hash2Bucket.put(hash, (short) i);
nonFullBucketInformation.put(i, 1L);
totalBucket.add(i);
return i;
}
}
// 4. too many buckets, throw exception
@SuppressWarnings("OptionalGetWithoutIsPresent")
int maxBucket = totalBucket.stream().mapToInt(Integer::intValue).max().getAsInt();
throw new RuntimeException(
String.format(
"Too more bucket %s, you should increase target bucket row number %s.",
maxBucket, targetBucketRowNumber));
New code:
// 3. create a new bucket
for (int i = 0; i < max_buckets; i++) {
if (bucketFilter.test(i) && !totalBucket.contains(i)) {
hash2Bucket.put(hash, (short) i);
nonFullBucketInformation.put(i, 1L);
totalBucket.add(i);
return i;
}
}
// 4. exceed max_buckets, just pick a bucket for record.
pick a min bucket (belongs to this task) to the record.
After offline discussion with @JingsongLi , we have reached a consesus: We can't just update |
df576e2
to
054f1df
Compare
@JingsongLi The feature has been completed as discussed, Looking forward your review ! |
nonFullBucketInformation.put(i, 1L); | ||
totalBucket.add(i); | ||
return i; | ||
if (-1 == maxBucketsNum || totalBucket.size() < maxBucketsNum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here should be: max bucket should be smaller than maxBucketsNum.
If it is a job restarted from scratch, each task is increasing, and the previous judgment may be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wise consideration! done :)
paimon-core/src/main/java/org/apache/paimon/index/PartitionIndex.java
Outdated
Show resolved
Hide resolved
@@ -137,4 +153,42 @@ public static PartitionIndex loadIndex( | |||
} | |||
return new PartitionIndex(mapBuilder.build(), buckets, targetBucketRowNumber); | |||
} | |||
|
|||
public static int[] getMaxBucketsPerAssigner(int maxBuckets, int assigners) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot understand getMaxBucketsPerAssigner
and getSpecifiedMaxBuckets
. Why not just same to maxBucketId < maxBucketsNum - 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, After offline discussion, we have reached a consesus, done :)
1396cff
to
41f3370
Compare
41f3370
to
d419d2c
Compare
Purpose
In dynamic bucket mode, unlimited buckets lead to an unpredicable number of small files, which lead to stability problems. so we should support upper bound in dynamic bucket mode.
Linked issue: close #4942
Tests
HashBucketAssignerTest
SimpleHashBucketAssignerTest
API and Format
Documentation
docs/layouts/shortcodes/generated/core_configuration.html