Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it #8351

Merged
merged 3 commits into from
Jan 10, 2025

Conversation

KevinyhZou
Copy link
Contributor

@KevinyhZou KevinyhZou commented Dec 26, 2024

What changes were proposed in this pull request?

Fixes: #8343 and improve performance(#8351 (comment))

How was this patch tested?

test by ut

Copy link

#8343

Copy link

Run Gluten Clickhouse CI on x86

3 similar comments
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the fix_cast_number_to_decimal branch from 9daf8db to d92a3f8 Compare December 27, 2024 07:23
Copy link

Run Gluten Clickhouse CI on x86

4 similar comments
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

@@ -146,17 +156,18 @@ class FunctionCheckDecimalOverflow : public IFunction
}

private:
template <typename T, typename ToDataType>
template <typename FromDataType, typename ToDataType, typename ColVecType, typename T = FromDataType::FieldType>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not remove T and add "using T = typename FromDataType::FieldType" below

@@ -192,20 +203,61 @@ class FunctionCheckDecimalOverflow : public IFunction
result_column = std::move(col_to);
}

template <is_decimal FromFieldType, typename ToDataType>
template <typename FromDataType, typename ToDataType, typename FromFieldType = FromDataType::FieldType>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove FromFieldType, keep template parameters simple.

@KevinyhZou KevinyhZou force-pushed the fix_cast_number_to_decimal branch from a7fc8fe to 8d94e26 Compare January 6, 2025 09:48
Copy link

github-actions bot commented Jan 6, 2025

Run Gluten Clickhouse CI on x86

return true;
}
}
else if constexpr (IsDataTypeNumber<FromDataType>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above if and else if should be merged. Remind: use using ColVecType = ColumnVectorOrDecimal<T>;

args.emplace_back(addConstColumn(actions_dag, std::make_shared<DataTypeInt32>(), substrait_type.decimal().scale()));
result_node = toFunctionNode(actions_dag, "checkDecimalOverflowSparkOrNull", args);
int decimal_precision = substrait_type.decimal().precision();
if (decimal_precision != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (decimal_precision)

Copy link

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou
Copy link
Contributor Author

端到端性能测试

测试sql: select count(1) from test_tbl where cast(d as decimal(5,2)) > 1;
数据量:60000000
PR改动前:
2.18s 2.245s, 2.259s
PR改动后:
2.623s, 2.618s,2.659s;

valian耗时:
15.936s,15.011s,16.411s;

else
return convertDecimalsImpl<DataTypeDecimal<Decimal256>, ToDataType>(decimal, precision_to, scale_from, scale_to, result);
{
if constexpr (std::is_same_v<FromFieldType, BFloat16>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove useless branch

{
if constexpr (exception_mode == CheckExceptionMode::Null)
return false;
else
Copy link
Contributor

@taiyang-li taiyang-li Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove useless branch here and any other places.

Copy link

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86


template <typename FromDataType, typename ToDataType>
requires(IsDataTypeNumber<FromDataType> && IsDataTypeDecimal<ToDataType>)
static bool convertNumberToDecimalImpl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALWAYS_INLINE

: static_cast<int>(std::log10(std::fabs(int_part))) + 1;
/// If the integer part's digits of the number is greater than (precision - scale), e.g. cast(55 as decimal(2, 1)),
/// then we should return NULL or throw exceptions.
if (int_part_digits > precision - scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if and else could be merged return int_part_digits > precision - scale && tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result);


int int_part_digits = int_part == 0 ? 1 :
int_part > 0 ? static_cast<int>(std::log10(int_part)) + 1
: static_cast<int>(std::log10(std::fabs(int_part))) + 1;
Copy link
Contributor

@taiyang-li taiyang-li Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess std::log10 and std::fabs is too heavy for this function. Maybe it is better:

auto casted_int_part = static_cast<ToDataType::FieldType>(casted_int_part); 
bool overflow = casted_int_part >= min_value && casted_int_value <= max_value;   

min_value/max_value is the minimum/maximum value which could be represented in precision - scale digits. They could be calculated outside for loop, which remove the cost of std::log10 and std::fabs.

if constexpr (std::is_same_v<FromFieldType, BFloat16>)
return tryConvertToDecimal<DataTypeFloat32, ToDataType>(static_cast<Float32>(value), scale, result);
else
return tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if (int_part_digits > precision - scale) is true, will tryConvertToDecimal returns false?

ToFieldType result;
bool success = convertToDecimalImpl<T, ToDataType>(datas[i], precision, scale_from, scale_to, result);
bool success = convertToDecimalImpl<FromDataType, ToDataType>(datas[i], precision, scale_from, scale_to, result);

if (success)
Copy link
Contributor

@taiyang-li taiyang-li Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if else in loops if possible

vec_to[i] = static_cast<ToFieldType>(result);
(*vec_null_map_to)[i] = success;

Copy link

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou
Copy link
Contributor Author

端到端性能测试

测试sql: select count(1) from test_tbl where cast(d as decimal(5,2)) > 1;
数据量:60000000
PR改动前:
2.18s 2.245s, 2.259s
PR改动后:
1.988s, 1.891s, 1.933s

valian耗时:
15.936s,15.011s,16.411s;

Copy link

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

UInt32 scale,
Int64 decimal_int_part_max,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not enough to represent min/max value. Consider precision = 38 and scale = 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use NativeTypeToDataType::FieldType

Copy link

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the fix_cast_number_to_decimal branch from f70615e to db3a9e2 Compare January 9, 2025 02:30
Copy link

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

2 similar comments
Copy link

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the fix_cast_number_to_decimal branch from 0b64fdc to 06ec1c3 Compare January 9, 2025 10:05
Copy link

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

@taiyang-li taiyang-li merged commit 66e816f into apache:main Jan 10, 2025
8 checks passed
@taiyang-li taiyang-li changed the title [GLUTEN-8343][CH]Fix cast number to decimal [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Diff when cast Number to Decimal
2 participants