-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it #8351
[GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it #8351
Conversation
Run Gluten Clickhouse CI on x86 |
3 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
9daf8db
to
d92a3f8
Compare
Run Gluten Clickhouse CI on x86 |
4 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
@@ -146,17 +156,18 @@ class FunctionCheckDecimalOverflow : public IFunction | |||
} | |||
|
|||
private: | |||
template <typename T, typename ToDataType> | |||
template <typename FromDataType, typename ToDataType, typename ColVecType, typename T = FromDataType::FieldType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not remove T and add "using T = typename FromDataType::FieldType" below
@@ -192,20 +203,61 @@ class FunctionCheckDecimalOverflow : public IFunction | |||
result_column = std::move(col_to); | |||
} | |||
|
|||
template <is_decimal FromFieldType, typename ToDataType> | |||
template <typename FromDataType, typename ToDataType, typename FromFieldType = FromDataType::FieldType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove FromFieldType, keep template parameters simple.
cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp
Outdated
Show resolved
Hide resolved
a7fc8fe
to
8d94e26
Compare
Run Gluten Clickhouse CI on x86 |
...c/test/scala/org/apache/gluten/execution/tpch/GlutenClickHouseTPCHSaltNullParquetSuite.scala
Outdated
Show resolved
Hide resolved
return true; | ||
} | ||
} | ||
else if constexpr (IsDataTypeNumber<FromDataType>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above if and else if should be merged. Remind: use using ColVecType = ColumnVectorOrDecimal<T>;
cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp
Outdated
Show resolved
Hide resolved
cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp
Outdated
Show resolved
Hide resolved
args.emplace_back(addConstColumn(actions_dag, std::make_shared<DataTypeInt32>(), substrait_type.decimal().scale())); | ||
result_node = toFunctionNode(actions_dag, "checkDecimalOverflowSparkOrNull", args); | ||
int decimal_precision = substrait_type.decimal().precision(); | ||
if (decimal_precision != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (decimal_precision)
Run Gluten Clickhouse CI on x86 |
端到端性能测试
|
else | ||
return convertDecimalsImpl<DataTypeDecimal<Decimal256>, ToDataType>(decimal, precision_to, scale_from, scale_to, result); | ||
{ | ||
if constexpr (std::is_same_v<FromFieldType, BFloat16>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove useless branch
{ | ||
if constexpr (exception_mode == CheckExceptionMode::Null) | ||
return false; | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove useless branch here and any other places.
Run Gluten Clickhouse CI on x86 |
|
||
template <typename FromDataType, typename ToDataType> | ||
requires(IsDataTypeNumber<FromDataType> && IsDataTypeDecimal<ToDataType>) | ||
static bool convertNumberToDecimalImpl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALWAYS_INLINE
: static_cast<int>(std::log10(std::fabs(int_part))) + 1; | ||
/// If the integer part's digits of the number is greater than (precision - scale), e.g. cast(55 as decimal(2, 1)), | ||
/// then we should return NULL or throw exceptions. | ||
if (int_part_digits > precision - scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if and else could be merged return int_part_digits > precision - scale && tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result);
|
||
int int_part_digits = int_part == 0 ? 1 : | ||
int_part > 0 ? static_cast<int>(std::log10(int_part)) + 1 | ||
: static_cast<int>(std::log10(std::fabs(int_part))) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess std::log10
and std::fabs
is too heavy for this function. Maybe it is better:
auto casted_int_part = static_cast<ToDataType::FieldType>(casted_int_part);
bool overflow = casted_int_part >= min_value && casted_int_value <= max_value;
min_value
/max_value
is the minimum/maximum value which could be represented in precision - scale
digits. They could be calculated outside for loop, which remove the cost of std::log10
and std::fabs
.
if constexpr (std::is_same_v<FromFieldType, BFloat16>) | ||
return tryConvertToDecimal<DataTypeFloat32, ToDataType>(static_cast<Float32>(value), scale, result); | ||
else | ||
return tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if (int_part_digits > precision - scale)
is true, will tryConvertToDecimal
returns false?
ToFieldType result; | ||
bool success = convertToDecimalImpl<T, ToDataType>(datas[i], precision, scale_from, scale_to, result); | ||
bool success = convertToDecimalImpl<FromDataType, ToDataType>(datas[i], precision, scale_from, scale_to, result); | ||
|
||
if (success) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove if else in loops if possible
vec_to[i] = static_cast<ToFieldType>(result);
(*vec_null_map_to)[i] = success;
Run Gluten Clickhouse CI on x86 |
端到端性能测试
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
UInt32 scale, | ||
Int64 decimal_int_part_max, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not enough to represent min/max value. Consider precision = 38 and scale = 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use NativeTypeToDataType::FieldType
Run Gluten Clickhouse CI on x86 |
f70615e
to
db3a9e2
Compare
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
2 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
0b64fdc
to
06ec1c3
Compare
Run Gluten Clickhouse CI on x86 |
What changes were proposed in this pull request?
Fixes: #8343 and improve performance(#8351 (comment))
How was this patch tested?
test by ut