-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fix divide by zero when adjust split size #12201
base: main
Are you sure you want to change the base?
Conversation
long splitCount = LongMath.divide(scanSize, splitSize, RoundingMode.CEILING); | ||
long adjustedSplitSize = Math.max(scanSize / parallelism, Math.min(MIN_SPLIT_SIZE, splitSize)); | ||
adjustedSplitSize = Math.max(adjustedSplitSize, Math.min(MIN_SPLIT_SIZE, splitSize)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to put all logic to compute adjustedSplitSize
in a single function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. I have refactored to make it more clear, please take a look when available, thanks.
66ac001
to
5d171d2
Compare
@@ -236,6 +236,9 @@ public static long adjustSplitSize(long scanSize, int parallelism, long splitSiz | |||
// use the configured split size if it produces at least one split per slot | |||
// otherwise, adjust the split size to target parallelism with a reasonable minimum | |||
// increasing the split size may cause expensive spills and is not done automatically | |||
if (splitSize <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we allowing this at all? Shouldn't we just throw an error on Split size less than or = to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @RussellSpitzer. I just saw that when trying to adjust the split size, negative values were adjusted to normal values, while the zero value threw a / by zero
exception due to being used as a divisor. So I tried to unify their behavior, whether it's uniformly adjusting to a normal split size or uniformly throwing an error.
If you think it's more appropriate to throw an error uniformly, I can adjust it. Please let me know your thoughts, thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I see for some other property, like METADATA_PREVIOUS_VERSIONS_MAX
, will be adjust to a normal value when it's less than or equal to 0,
int maxSize =
Math.max(
1,
PropertyUtil.propertyAsInt(
properties,
TableProperties.METADATA_PREVIOUS_VERSIONS_MAX,
TableProperties.METADATA_PREVIOUS_VERSIONS_MAX_DEFAULT));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general rule of thumb has been that we generally try not to throw errors on reading things out of metadata.json to avoid bricking on a metadata.json written by bad writer. This shouldn't be the case here since the metadata.json will parse correctly, but the invalid parameters would throw an error at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have this check immediately below,
private static void validatePlanningArguments(long splitSize, int lookback, long openFileCost) {
Preconditions.checkArgument(splitSize > 0, "Split size must be > 0: %s", splitSize);
Preconditions.checkArgument(lookback > 0, "Split planning lookback must be > 0: %s", lookback);
Preconditions.checkArgument(openFileCost >= 0, "File open cost must be >= 0: %s", openFileCost);
}
How is this being avoided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When querying in Spark, the adjustSplitSize(...)
is invoked earlier than TableScanUtil.planTaskGroups(...)
as follows, so the illegal split size will be handle in adjustSplitSize(...)
first and meet this problem. I believe in other engine, if they have the logic to adjust the split size, this would be the case as well.
TableScanUtil.planTaskGroups(
CloseableIterable.withNoopClose(tasks()),
adjustSplitSize(tasks(), scan.targetSplitSize()),
scan.splitLookback(),
scan.splitOpenFileCost());
According to my understanding of what you mean, we should simply add a checkArgument
for splitSize in adjustSplitSize(...)
to throw an error for illegal value, is that right?
Currently,
TableScanUtil.adjustSplitSize(...)
can appropriately handle the situations where parametersplitSize
is negative, however, it doesn't consider the situation wheresplitSize
equals 0. That is to say, in a calculation engine like Spark, if we query a table after set it's propertyread.split.target-size
to a negative value like-1
, we will succeed and get the result, since the defaultread.split.adaptive-size.enabled
is true. Meanwhile, if we query a table after set it's propertyread.split.target-size
to 0, we will fail with the error message/ by zero
.This PR fix the inappropriate handling of the 0 value of
splitSize
, ensure that both negative and zero values can be adjusted to appropriate split size.